-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: API changes for the upcoming v1.0 #384
Conversation
Cast @4pao, @kruus, and @tensor-tang. |
Great to know. |
With the current framework, pretty much everything is ugly wrt. memory, and I welcome the efforts to fix things. The current memory descriptor zoo certainly could use some nice, logically organized cages. On the other hand, in the current disorganized situation, I could (with much work) define a new named format with whatever bizarre behaviors I wanted. So let me ask about a concrete example I've been thinking of recently, and at least one extension that I'd like to see supportable in the new framework. Ex. (VE chip) default alignment for float data is 4, but there's a nice packed vector load that reads 512 contiguous floats into a vector register... but requires the load address to be 8-byte aligned. (Otherwise I need a slow'n'ugly vec-read-upper, vec-read-lower, vec-merge.)
For my example, if the base tensor alignment were 8, and the innermost tension were even, then I could JIT for only the nice reads, because all internal pointers would maintain the extra alignment of 8.
Apart from the above minor questions, I really like what I see 👍 |
Firstly, sorry for the late feedback. I saw some API changes have been applied. I like the explicit way ! 👍
This should be more clear. 👍
|
Hi @kruus, Thanks a lot for the comments!
Let me think on that more... This indeed might be a good thing to have.
The padding field in Example. 1D tensor, with dim
Once you have defined logical dims and padded dims, you should also define a physical layout of the memory. The physical layout mostly relates to the padded dims (and hence logical dims, since they are folded into the physical dims).
The most typical layout is blocked. All plain layouts like Hence, the logical tensor Coming back to your question.
Yes. Say, you work with logical 4D tensor
Yes. See above.
Well, the library does't expose any functions that allow specifying the
Another way of doing the same would be introduce the format tag
No, view (that by the way, are replaced with regular memory descriptors now) mostly rely on strides. Padding is something more physical, since the library expects zeros in the padded area. There is no such thing as logical padding for memory. The logical padding might only be as a property of the operation, like convolution, where it means that even though the input has the size (H, W), please treat it as (H + 2, W + 2), where the border elements are assumed to be zero.
Padding is the property of the tensor with logical dimensions. It is not a property of some particular blocking strategy. For instance:
Here both |
Thanks for the comments! Appreciate your review!
Well, it turned out that enforcing explicit scratchpads might be a real pain for users who don't care about memory. So we decide to roll that back a little. The final decision is to have two modes:
The reason is that users might face with performance regressions if they would provide scratchpads on their own (e.g. if the scratchpad provided would be NUMA-unaware). If users are fine with that, they can go with explicit scratchpads. For those who don't want to invest effort into clever memory management can simply rely on our implicit scratchpads. I will update RFC to highlight this change.
Well, the example is indeed not shorter. But there was no intend to reduce the amount of code-lines. The intend was to make it simpler to understand and program in frameworks. Hope that would hold :)
I understand this point. But have explicit layouts like From a FWK side, I would expect it to know whether the data is in FWK native format or MKL-DNN's one. Once this is know, there shouldn't be any need to know what exact format Intel MKL-DNN is using.
Yeah, this works perfectly fine with the new API. You have two ways to do that:
This would not work with-out additional information. Consider the following:
As you can, w/o knowing the format you cannot distinguish between But if you know the format, it is pretty straightforward to recover the sizes. So I don't see why we would prefer having |
RFC updated. The changes:
|
Thanks for explanations --- you've convinced me that pretty much all useful padding/blocking cases can be dealt with. It helped clarify that the new formats cover pretty much everything and that padding is always physical. Your comment
was particularly useful, since I think I had been thinking of everything as client-visible when I wrote my questions. It's good that many things described by the RFC are actually "internal detail". Thanks. For alignment, I really need to download the RFC source tarball and look at the new headers. OTOH, maybe there are more clever ways to hide the alignment requirement by lying to the memory manager about the size. Then the client would not be able to assume tensor data really started at the pointer he gave to mkl-dnn. If client already, in general, needs to coerce the memory format before making any assumptions about where any tensor element is stored, then maybe it's OK to provide a rounded-up size and internally use next correctly aligned pointer, "alignment = bigger size + implicit initial padding"? |
Hi @kruus,
Well, strictly speaking the padding stuff is user visible (since memory descriptor is a transparent structure). But even so, I wouldn't expect a lot of users would directly look into the memory descriptor structure itself. The assumption is that users would typically use things like
The biggest issue with the alignment is that sometimes the frameworks pass their memory to MKL-DNN just as is (of course wrapping it into MKL-DNN memory object). If the alignment becomes a requirement that means the frameworks would sometime have to copy the data even if the only thing that differs is alignment. Also the comparison function would become a little bit trickier:
If the function returns So it seems that the alignment as a property of memory descriptor is too tough requirement from the user POV. It probably just makes sense handle that on the library side, by being ready to call a fallback if a pointer is bad (unaligned). In this case you have functional correctness, and you would have performance if the data is aligned (which is typically the case for most of the time). |
Thanks @emfomenk,
I may not clear of my point. I thought the saved dims should be exactly match the format, because it's simple to understand. Take reorder as an example,
or
The order is triggered by format difference, so r1 and r2 are not equal. And as for a conv example.
And maybe there's not need to compute the strides of saved dims because the dims is already straight forward, however it's fine for developer's convenience. Then if we wanna to view or debug some Just some thoughts, hope it would be helpful. |
Sigh, yes, I suppose reorder cannot be converted into an opportunistic no-op without a lot of trickery. OK, so I looked at the new headers in the tarball a bit more today.
|
Hi @tensor-tang,
No, this is not how Intel MKL-DNN handles memory. Let's consider 2D example: Notice, that no matter for the format is I don't change the notation of accessing an element -- it is always With MKL-DNN the situation is the same. The logical description doesn't depend on the physical one. // RowMajor
memory_desc_init_by_tag({dim0, dim1}, format_tag = ab); // ab -- natural order, b is the innermost
// ColumnMajor
memory_desc_init_by_tag({dim0, dim1}, format_tag = ba); // ba -- a becomes the innermost in memory
//
// the same, but using strides instead of the format tags
//
// RowMajor
memory_desc_init_by_strides({dim0, dim1}, strides = {dim1, 1}); // dim1 has stride 1
// ColumnMajor
memory_desc_init_by_strides({dim0, dim1}, strides = {1, dim0}); // dim0 has stride 1 The format tags now imply the strides only and they don't have any meaning other than that. This approach allows having simple checks for shape matching. It also allows deriving the physical layout pretty easily by looking at the strides and (if relevant) to the blocking structure. |
Yes, thanks for your kindly explanations @emfomenk . I know it's not how MKL-DNN actually handles memory. That just my suggestions. I think combining the format and dims should be clear. So maybe do not need distinguish logical and physical description.
I am not getting this point. |
@emfomenk Any thoughts about using labels while specifying
instead of implicitly assuming the order of dimensions? With that logic, the memory description above is same as:
|
Seems over-complicated: requires map and also non-clear how to deal with aliases like |
The initial message says that the inteded release date for v1.0 is somewhere mid-2019. I can see that there is a recent release v1.0-rc 10 days ago. Is there a more precise estimate now of how close we are to the v1.0 release? |
Hi @blueberry, I would expect the final release to appear somewhere next week. |
That's great news indeed! Thanks! |
Closing as v1.0 is released. |
The Intel(R) MKL-DNN team is planning to release v1.0 of the library in mid-2019. With this major version change, we would like to clean up the API to make Intel MKL-DNN easier to use.
This RFC describes user-visible and some important internal changes to help developers understand what to expect in the future. The Intel MKL-DNN team would appreciate your feedback and suggestions on the given proposal! Feel free to post them in this PR.
UPDATE 03/08/19: v1.0 Preview Candidate is released.
UPDATE 03/12/19: Update the RFC. See changes here.