Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable-length sequence spaces support #909

Open
dwf opened this issue May 15, 2014 · 10 comments
Open

Variable-length sequence spaces support #909

dwf opened this issue May 15, 2014 · 10 comments

Comments

@dwf
Copy link
Contributor

dwf commented May 15, 2014

I think we need to centralize efforts on variable length sequences in pylearn2 and really get our act together. I know a lot of people (@bartvm @vdumoulin @laurent-dinh @pascanur) have thought about this problem, and I think we need a centralized record of

  • what is known about what is needed and how to implement it
  • any existing partial implementations
  • outstanding issues that are unsolved even in principle
  • current Theano limitations blocking us (@bartvm mentioned issues surrounding linear algebra issues surrounding matrices/tensors that are actually vectors, given that all axes except one are 1; I think this can easily be addressed by graph substitution optimizations)
  • What parts of the code need to change (How much of SGD?)

Obviously mini-batches where examples are of variable length are an issue. You can really only address this with lists of Theano variables at present.

Since one can get a lot done in sequential modelling with batch size 1 (provided your sequence is long) I would suggest structuring a first pass as being sufficiently general to support batch size > 1, but actually only implement as much as is needed to support batch size = 1.

I therefore propose using this ticket as that collaboration space.

@vdumoulin
Copy link
Member

Here's my view on it:

I wrote two new Space subclasses called VectorSequenceSpace and IndexSequenceSpace (see here) which represent single sequences as matrices whose first dimension is time.

For now minibatches aren't supported because of a Theano limitation: lists of variable-length arrays have no corresponding data structure in Theano. @nouiz is aware of the issue, and in a discussion with him today I learned he even has an intern currently working on fixing it.

This space could probably be extended to support minibatches of variable-length sequences, as long as sequences in a minibatch have the same length.

There is also a toy-ish implementation of a RNN which I wrote for Yoshua's class and which is used by Junyoung for speech synthesis. It uses XSequenceSpace for its input space, and the TIMIT dataset it is trained on (a class called TIMITSequences) also uses XSequenceSpace. With this setup SGD didn't need any modification. The only thing that might require changing SGD is gradient clipping and gradient norm monitoring, and even that could be implemented as a wrapper Cost.

@dwf
Copy link
Contributor Author

dwf commented May 15, 2014

With this setup SGD didn't need any modification.

That is so, so good to hear.

The only thing that might require changing SGD is gradient clipping and gradient norm monitoring, and even that could be implemented as a wrapper Cost.

Yeah, I could see both Cost and LearningRule having a role in implementing RNN tricks, depending. The situation seems less dire than it sounded earlier today. It'd be good to gather thoughts from @vdumoulin and @laurent-dinh regarding current blockers, etc.

@dwf
Copy link
Contributor Author

dwf commented May 15, 2014

Er, by that I mean @bartvm and @laurent-dinh. I thought that comment was by Bart (I'm really tired).

@bartvm
Copy link
Contributor

bartvm commented May 16, 2014

I had another quick look, and it looks like the problem I had with batch_size = 1 has been fixed. VectorSpace and IndexSpace will return a row tensor and not a matrix tensor in this case which should lead to Theano using GEMV instead of GEMM. Formerly this caused a bug in monitor.py, so the batch_size information isn't used there anymore (https://groups.google.com/forum/#!topic/pylearn-dev/M-f2YyGxS8c and ticket #634 ), but training time should work fine.

I haven't used @vdumoulin 's spaces, but I guess that they can be used quite flexibly. If you set the number of columns (dim) to 1 you could simply pass your sequence as a column vector, correct?

In the long-term I would love to see a solution that allows a batch size greater than 1 though. In my case, something I would like to do is create a fixed-length continuous representation of variable-length sequences, and then run that through a normal MLP, which isn't very efficient to do one sentence at a time.

Lists of Theano variables are one option, although I have no idea what the performance is like when you have e.g. 250 different Theano variables as an input and need to concatenate them in one of your MLP layers. Another option, although I have no idea if this is feasible at all, is to run the variable-length part of the network first, collect the outputs, and then run the fixed-length part of the network in batch. I'd also be interested to hear what @nouiz and his intern have in mind, and with what timetable. It would be a shame if we start to implement something and then need to rewrite everything once Theano support for variable length batches arrives.

@vdumoulin
Copy link
Member

@bartvm right now XSequenceSpace always expects matrices for sequences, but this is a constraint that could very easily be lifted, as well as the minibatch constraint (in this case, all sequences in a minibatch would need to have the same length, but this fixed length could vary across minibatches).

@bartvm
Copy link
Contributor

bartvm commented May 16, 2014

Well, I meant you could simply pass a n × 1 matrix.

I'm not very clear on what allowing mini-batches would entail though. I guess you'd have to be careful because you would be removing some of the stochasticity from the iterators i.e. sequences with the same length will always be part of the same batch. Also, I guess you would have to fix the mini-batch size to min(|{s in S : ||s|| = const.}|) for sequences S, or otherwise scale your cost function depending on the mini-batch size?

@nouiz
Copy link
Member

nouiz commented May 23, 2014

I think we Will merge the list type next week, without scan support or c
code support.

Fred

Le 16 mai 2014 11:28, "Bart" notifications@github.com a écrit :

Well, I meant you could simply pass a n × 1 matrix.

I'm not very clear on what allowing mini-batches would entail though. I
guess you'd have to be careful because you would be removing some of the
stochasticity from the iterators i.e. sequences with the same length will
always be part of the same batch. Also, I guess you would have to fix the
mini-batch size to min(|{s in S : ||s|| = const.}|) for sequences S, or
otherwise scale your cost function depending on the mini-batch size?


Reply to this email directly or view it on GitHub.

@nouiz
Copy link
Member

nouiz commented May 23, 2014

I was wrong. It is now merged!

On Fri, May 23, 2014 at 9:02 AM, Frédéric Bastien nouiz@nouiz.org wrote:

I think we Will merge the list type next week, without scan support or c
code support.

Fred

Le 16 mai 2014 11:28, "Bart" notifications@github.com a écrit :

Well, I meant you could simply pass a n × 1 matrix.

I'm not very clear on what allowing mini-batches would entail though. I
guess you'd have to be careful because you would be removing some of the
stochasticity from the iterators i.e. sequences with the same length will
always be part of the same batch. Also, I guess you would have to fix the
mini-batch size to min(|{s in S : ||s|| = const.}|) for sequences S, or
otherwise scale your cost function depending on the mini-batch size?


Reply to this email directly or view it on GitHub.

@bartvm
Copy link
Contributor

bartvm commented Jul 24, 2014

Pull request #1021 implements batches of variable length using padding and a mask. I think this is a better solution than typed lists because it allows us to loop over the time-axis in a batch.

@nouiz
Copy link
Member

nouiz commented Jul 24, 2014

Your way will be faster if the difference in length isn't too big. But in
the case with "big" (need to be defined, will depend of the overhead),
doing this PR will be faster.

As currently scan have a big overhead. The definition of big is probably
bigger then the sentence length. This also depend of the model.

On Wed, Jul 23, 2014 at 10:07 PM, Bart notifications@github.com wrote:

Pull request #1021 #1021
implements batches of variable length using padding and a mask. I think
this is a better solution than typed lists because it allows us to loop
over the time-axis in a batch.


Reply to this email directly or view it on GitHub
#909 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants