-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plans for RNN #46
Comments
Hi @sbodenstein, The work is in progress. |
Fantastic, this will be super useful! |
Hi @emfomenk, when RNN feature will be released??? |
@emfomenk: is the planned RNN API going to be compatible with the cuDNN version? |
Hi @fightbob and @sbodenstein, RNN is slightly postponed -- some other urgent stuff appeared... Yeah, API is going to be very close to cuDNN one. |
Great. If you can drop any details about how the API might differ before you actually ship, that would be helpful to us for planning purposes. |
Hi @taliesinb, All the following changes are not finalized yet. mkldnn_types.h:
mkldnn.h:
|
@emfomenk Thanks so much, that's very useful to know! |
Any updates on this? |
@emfomenk: will there be support for variable-length sequences (ie. a batch of sequences with different lengths)? cuDNN has support, but don't see this in the above design. |
specifically, the concern is that because NVIDIA's design just outputs the final cell state (and not a sequence of cell states), you cannot accomplish variable length support after-the-fact, because all cell states corresponding to inputs that don't have the full batch length will be invalid. and so we simply can't use the optimization at all for variable-length problems unless it bakes variable length support into the design. |
@taliesinb @sbodenstein The current design can output all the outputs(h) at the last stack and all the cell state(c) at the last time seq. But, do you need all the cell states in the middle of the sequences? |
@ykim362 yes, but you have a choice. if the RNN layer wants to support variable-length operation†, it can either:
† to be clear what I mean by variable-length operation, I'm referring to the case where you have a batch that contains multiple unequal sequence lengths in it -- and most sequence problems are like this. older frameworks just pad the shorter sequences with zeros and expect the net work to learn to deal with the zeros, but this fundamentally changes the problem. by far the better approach is to pad with junk, and carefully make sure that you take the 'correct' outputs and states from just before the junk using pick operations etc. we want to make sure that the MKL implementation makes this possible. Option 1 does the pick externally, option 2 does the pick internally. |
@taliesinb @sbodenstein Thanks for the comments! I think I am missing something... I understand that cudnn current interface enables option 1 (parameters yDesc and y in cudnn doc) For option 2, I did not find any documentation related to that. The only element I see to accommodate for variable length in cudnn API is that the inputs for each time step can have different minibatch (in decreasing order). I guess this assumes that the user has to sort the sequences in the minibatch first (e.g. input from longest sequence first in each minibatch), but there is not much details in their doc. Could you please elaborate on the use case? |
This is correct for GRU and standard RNN, but untrue for LSTM, which has a second state (cell state) that is not returned in
The framework/user has to indeed sort the sequences by length, and pack them. This is annoying, and would be good if the Intel version could avoid it.
Frameworks that support variable length RNNs require this (eg PyTorch pytorch/pytorch#873), and we wish to add this support to MXNet as well. Including @apaszke and @jekbradbury, as this discussion about the MKL RNN design seems very relevant for variable length RNNs in PyTorch as well (I think PyTorch will also want to use this MKL RNN implementation). |
Or better yet provide that as an optional feature. |
@mgouicem: for us, the cleanest approach to supporting variable length sequences is a bit different to cuDNN approach. The approach is:
|
I think at this point we're pretty much stuck with packed sequence/padded inputs in PyTorch, so it would be cool if you supported something similar. cuDNN API is quite good, except for weight format management. Please, unless it is absolutely necessary, don't require frameworks to give you weights as a single chunk of memory, and if this is needed, then at least define a format openly. Right now cuDNN's answer is "use our API to query where to put each weight", which is terribly inconvenient. |
Yeah it really sucks. Made compilation so much more complicated for us. And the practice of not publicly defining properties, sizes, etc and putting them behind an API makes scratch memory much harder to share across buckets because the workspace size cannot be decided without querying CUDA at compile time, which MXNet does not support. EDIT: clarify my complaint. |
Thank you for the clarification and the input. We will take that into account when designing our API. |
@apaszke @taliesinb totally agree with you about the cudnn weight format, and i think the clear weight format is very important for the framework and users. |
Is there any update on the timeline for the release of the RNN primitives at this point? Just curious, but very much looking forward to it. |
@BenjaminJurke , unfortunately no precise timeline for the feature yet, but we are working on it. |
Are there any plans to add RNN layers (compatible with the cuDNN RNN layers)? This would be exceptionally useful, given the wide usage of RNN's.
The text was updated successfully, but these errors were encountered: