[WIP] Separate the MlpSoftmaxDecoder's components (re #289) #298

mbollmann · 2018-03-14T17:31:50Z

This is WIP for separating the decoder components as discussed in #289.

Currently, this code:

Implements multiple layers and state functionality for UniLSTMSeqTransducer
Refactors the MlpSoftmaxDecoder to take an UniLSTMSeqTransducer directly

What's missing:

Get default arguments to work (see below) so that the implementation can be continued
Refactor residual RNN to work as an RNN layer for the decoder
Refactor the MLP components into a separate object as well
Make sure all examples, tests etc. are adapted to the new architecture

As discussed in #289, default arguments don't work right now due to input feeding. That means a lot of tests fail simply because of that. I would strongly prefer finding a solution to this before continuing this WIP.

I was thinking to explicitly make UniLSTMSeqTransducer aware of input feeding during __init__, so that it can modify its default dims accordingly. However, the main roadblock I encountered is that I don't see a way to detect (within UniLSTMSeqTransducer) whether it's child of a decoder or not.

Anyway, comments/suggestions welcome.

msperber · 2018-03-14T17:48:24Z

That's a good idea to have UniLSTMSeqTransducer try to detect if it's a child of a decoder. It should be possible similar to how the vocab size is determined, i.e. by taking yaml_path in the constructor which will be set automatically when the YAML file is loaded: https://github.com/neulab/xnmt/blob/master/xnmt/embedder.py#L57

msperber · 2018-03-15T08:07:56Z

xnmt/decoder.py

+    self.rnn_layer = rnn_layer
+
+    #=== TODO: what does this do and why is it needed? how to preserve it with
+    #===       the refactoring?


You can safely delete this. The DyNet RNN builders are hard to configure regarding parameter initialization, so this was a kind of workaround, but with the XNMT builders parameter initialization is fully supported and already implemented.

mbollmann · 2018-03-15T15:29:59Z

Thanks @msperber, utilizing yaml_path works like a charm!

Caveat: I could not get the residual LSTM to work as a decoder component even on the master branch. I don't see how it could currently work at all, as the decoder attempts to set state (.set_s) etc. which the residual's PseudoState doesn't support.

mbollmann · 2018-03-15T18:41:14Z

This should be functional now!

Some notes:

The decoder instantiation in the tests has become quite verbose, as the yaml_path trick doesn't work there automatically to make the sub-components aware they're in a decoder. Currently, I've solved this by adding a yaml_path everywhere. This also means I had to explicitly instantiate the decoder components everywhere just for that. If you can think of a more elegant solution here, please go ahead.
The ResidualLSTMSeqTransducer is not working as a decoder component at the moment. However, I also couldn't make it work on the master branch, and there seems to be no working example or test for this scenario. The decoder expects some functionality (like .set_s()) that just isn't there on the PseudoState object. Since I'm not familiar enough with the residual LSTM implementation, I decided to leave this be for now.

From March 18 to 31, I'll be on vacation and unable to do anything with the code, but feel free to continue with the code or leave comments for me to work on when I return -- whatever you prefer.

msperber

Thanks, looking great! I have 2 small comments inline, otherwise it looks fine unless @neubig has any more comments. I think we can leave your 2 remarks as-is for now.

Before merging, we should probably test this on a benchmark to make sure there is no digression. Maybe an example with dropout, a bridge, and shared softmax/target embeddings would be best. The only minor difference to the master branch is that the decoder LSTM is initialized slightly differently now: with UniLSTMTransducer, the Glorot initializer treats the 4 weight matrices as separate matrices now instead of one bigger matrix as was the case with the dynet builder. Would you be able to run something like this? I'm also planning on adding some one-button recipes that could be used for this, and could also help with a benchmark, but won't get to it immediately.

msperber · 2018-03-16T08:26:52Z

xnmt/lstm.py

+
+  def __init__(self, exp_global=Ref(Path("exp_global")), layers=1, input_dim=None, hidden_dim=None,
+               dropout=None, weightnoise_std=None, param_init=None, bias_init=None,
+               yaml_path=None, decoder_input_dim=None, decoder_input_feeding=True):


The new arguments should be documented.

yaml_path as well, even though it's kind of for internal use? Because the decoder_... arguments only get interpreted when "decoder" in yaml_path ...

Hm, maybe you could add yaml_path as an argument but leave the description empty? And then somehow the information should be conveyed that decoder_... is only interpreted when yaml_path is given.

msperber · 2018-03-16T08:28:28Z

xnmt/mlp.py

+    vocab_size (int): vocab size or None; only relevant if MLP is used as a decoder component
+    vocab (Vocab): vocab or None; only relevant if MLP is used as a decoder component
+    trg_reader (InputReader): Model's trg_reader, if exists and unambiguous; only relevant if MLP is used as a decoder component
+  """


It should be mentioned that vocab size etc. will overwrite output_dim if given.

neubig · 2018-03-16T10:54:02Z

@msperber If you think it's OK, go ahead and merge. I trust your judgement and think we only need one reviewer per PR.

mbollmann · 2018-03-16T11:40:29Z

I think a benchmark is a good idea, but I don't have any prepared benchmarks or anything that I could easily run at the moment. Any immediate recommendations come to mind?

msperber · 2018-03-16T13:00:35Z

The easiest to use would probably be the Stanford benchmark because it's already preprocessed: https://nlp.stanford.edu/projects/nmt/
I often use TED because it's smaller and fast to train a model, but you'd have to do the preprocessing yourself: https://wit3.fbk.eu/mt.php?release=2017-01-trnted

If you can somehow guarantee that both models are initialized with the exact same random parameters, it would also be enough to run just one training epoch plus evaluation and make sure scores are exactly the same.

mbollmann · 2018-03-16T16:19:50Z

The only minor difference to the master branch is that the decoder LSTM is initialized slightly differently now: with UniLSTMTransducer, the Glorot initializer treats the 4 weight matrices as separate matrices now instead of one bigger matrix as was the case with the dynet builder.

I think this might actually be a good thing, as it makes the initialization consistent with the encoder LSTM. However, I'm not sure what the easiest way would be to get the exact same random initialization for both the old and new architecture, as simply fixing the random seed won't suffice then.

Either way, I don't think I'll get around to this (and the benchmarks in general) before my vacation. If you want to take this up, please go ahead, otherwise I can continue that in April.

msperber · 2018-03-16T16:41:22Z

Yes, it's definitely a good thing! Getting the parameters to be identical would probably some custom parameter saving / loading code, so I'd probably start with the complete benchmark as it involves less human labor (although more computer labor) :) In any case, I might also give it a shot depending on when I get around to the recipes.

msperber · 2018-04-03T10:50:04Z

To follow up on this, I haven't had a chance to actually run it, but I would suggest the new stanford-iwslt recipe which uses a small data set and requires minimal effort to run (basically run a download script and then start training).

mbollmann · 2018-04-03T11:18:56Z

Oh, that's cool, I actually downloaded that very dataset an hour ago and am looking for a machine to run it on now. ;)

msperber · 2018-04-03T13:35:33Z

Sorry also for the merge conflicts, let me know if you need help resolving them.

mbollmann · 2018-04-04T12:51:25Z

Results for a single run of the IWSLT recipe on master:

Experiment                    | Final Scores
-----------------------------------------------------------------------
iwslt-experiment              | BLEU4: 0.2475964750860351, 0.586186/0.332474/0.197394/0.120693 (BP = 0.948510, ratio=0.95, hyp_len=32575, ref_len=34297)
                              | BLEU4: 0.27212110133216566, 0.623154/0.360932/0.221823/0.140138 (BP = 0.941058, ratio=0.94, hyp_len=31753, ref_len=33682)

And on this feature branch:

Experiment                    | Final Scores
-----------------------------------------------------------------------
iwslt-experiment              | BLEU4: 0.24962836908499383, 0.584297/0.329941/0.194742/0.118162 (BP = 0.967257, ratio=0.97, hyp_len=33192, ref_len=34297)
                              | BLEU4: 0.2732730662351511, 0.618037/0.357961/0.219796/0.138445 (BP = 0.954025, ratio=0.96, hyp_len=32168, ref_len=33682)

msperber · 2018-04-04T13:02:17Z

Great, thanks! Would you mind adding these numbers (the new ones) to the recipe's README file? Otherwise, I think we can merge once the remaining failing test is resolved.

neubig · 2018-04-04T13:03:59Z

Looks good! Could we set the the following:

rnn_layer: !UniLSTMSeqTransducer
  layers: 1

to be the default and remove it from the config files, unless it would be useful to have for illustrative purposes? Maybe similarly for the MLP layer?

Also, this could maybe be for another commit, but we could potentially generalize UniLSTMSeqTransducer to UnidirectionalSeqTransducer that defines the interface for sequence transducers that can add elements one-by-one left to right.

mbollmann · 2018-04-04T13:44:42Z

@neubig, that is the default already, so it's actually safe to remove. But then again, there's a lot of redundancy in the example configs anyway with the explicit dimensionality definitions (even though they usually match default_layer_dim).

I noticed that there are a few new examples that I hadn't adapted yet, so I'm also taking care of those now.

mbollmann · 2018-04-05T09:26:42Z

Unless I missed some config file to adapt, or we want to remove more of the rnn_layer definitions from them, this should be ready for merging from my side.

msperber · 2018-04-05T09:44:16Z

xnmt/decoder.py

-    bias_init_context (ParamInitializer): how to initialize context bias vectors
-    param_init_output (ParamInitializer): how to initialize output weight matrices
-    bias_init_output (ParamInitializer): how to initialize output bias vectors
+    rnn_layer (SeqTransducer): recurrent layer of the decoder; defaults to UniLSTMSeqTransducer


SeqTransducer is not the correct type here, as SeqTransducers can't consume inputs one by one. For lack of an appropriate abstract class, the requested type should probably be UniLSTMSeqTransducer.

Yeah, that's where @neubig's suggestion would probably come in handy. Also, it's not like you should really use the BiLSTMSeqTransducer either, so...

msperber · 2018-04-05T09:46:27Z

I took another look and had one mini comment but other than that we can merge this.

msperber · 2018-04-05T10:02:06Z

Cool, will merge!

Marcel Bollmann added 3 commits March 12, 2018 15:46

Add 'layer' parameter for UniLSTMSeqTransducer

287f5e8

Implementing state functionality for UniLSTMSeqTransducer

bfb4db6

Refactor decoder to take rnn_layer object

a10e1d4

msperber reviewed Mar 15, 2018

View reviewed changes

mbollmann mentioned this pull request Mar 15, 2018

Separate the MlpSoftmaxDecoder's components #289

Closed

Make default arguments work with input-feeding decoder

f869c3a

Marcel Bollmann added 7 commits March 15, 2018 16:47

Merge branch 'master' into separate-decoder

2921e96

Refactor MLP into separate object

031c730

Update example files

4651a05

Fix a wrong path in shared_params

b0871d7

Update all tests

9073936

Attempt to fix translator_loss test

5ec2359

msperber reviewed Mar 16, 2018

View reviewed changes

Update class documentation

73713f3

Marcel Bollmann added 4 commits April 3, 2018 16:34

Merge branch 'master' into separate-decoder

7df26cc

Fix import in MLP

d999a2d

Fix imports neulab#2

180a53f

Fix various issues related to changes in 'master'

2c9ace2

Update recipes

2c324a2

Marcel Bollmann added 4 commits April 4, 2018 15:52

Fixed examples

c3620a8

Update IWSLT recipe

f18ccdf

Merge branch 'master' into separate-decoder

044b863

Fix new score test

04c3780

msperber reviewed Apr 5, 2018

View reviewed changes

Change rnn_layer documentation

da40538

msperber merged commit e0faea6 into neulab:master Apr 5, 2018

mbollmann deleted the separate-decoder branch April 5, 2018 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Separate the MlpSoftmaxDecoder's components (re #289) #298

[WIP] Separate the MlpSoftmaxDecoder's components (re #289) #298

mbollmann commented Mar 14, 2018

msperber commented Mar 14, 2018

msperber Mar 15, 2018

mbollmann commented Mar 15, 2018

mbollmann commented Mar 15, 2018 •

edited

Loading

msperber left a comment

msperber Mar 16, 2018

mbollmann Mar 16, 2018

msperber Mar 16, 2018

msperber Mar 16, 2018

neubig commented Mar 16, 2018

mbollmann commented Mar 16, 2018

msperber commented Mar 16, 2018

mbollmann commented Mar 16, 2018

msperber commented Mar 16, 2018

msperber commented Apr 3, 2018

mbollmann commented Apr 3, 2018

msperber commented Apr 3, 2018

mbollmann commented Apr 4, 2018

msperber commented Apr 4, 2018

neubig commented Apr 4, 2018

mbollmann commented Apr 4, 2018

mbollmann commented Apr 5, 2018

msperber Apr 5, 2018

mbollmann Apr 5, 2018

msperber commented Apr 5, 2018

msperber commented Apr 5, 2018

[WIP] Separate the MlpSoftmaxDecoder's components (re #289) #298

[WIP] Separate the MlpSoftmaxDecoder's components (re #289) #298

Conversation

mbollmann commented Mar 14, 2018

msperber commented Mar 14, 2018

msperber Mar 15, 2018

Choose a reason for hiding this comment

mbollmann commented Mar 15, 2018

mbollmann commented Mar 15, 2018 • edited Loading

msperber left a comment

Choose a reason for hiding this comment

msperber Mar 16, 2018

Choose a reason for hiding this comment

mbollmann Mar 16, 2018

Choose a reason for hiding this comment

msperber Mar 16, 2018

Choose a reason for hiding this comment

msperber Mar 16, 2018

Choose a reason for hiding this comment

neubig commented Mar 16, 2018

mbollmann commented Mar 16, 2018

msperber commented Mar 16, 2018

mbollmann commented Mar 16, 2018

msperber commented Mar 16, 2018

msperber commented Apr 3, 2018

mbollmann commented Apr 3, 2018

msperber commented Apr 3, 2018

mbollmann commented Apr 4, 2018

msperber commented Apr 4, 2018

neubig commented Apr 4, 2018

mbollmann commented Apr 4, 2018

mbollmann commented Apr 5, 2018

msperber Apr 5, 2018

Choose a reason for hiding this comment

mbollmann Apr 5, 2018

Choose a reason for hiding this comment

msperber commented Apr 5, 2018

msperber commented Apr 5, 2018

mbollmann commented Mar 15, 2018 •

edited

Loading