diff --git a/README.md b/README.md index f24c885..c1ce7b2 100644 --- a/README.md +++ b/README.md @@ -32,7 +32,6 @@ See below for more details on how to use them. This project is maintained by [Yoon Kim](http://people.fas.harvard.edu/~yoonkim). Feel free to post any questions/issues on the issues page. - ### Dependencies #### Python @@ -181,7 +180,7 @@ For seq2seq I've found vanilla SGD to work well but feel free to experiment. * `learning_rate`: Starting learning rate. For 'adagrad', 'adadelta', and 'adam', this is the global learning rate. Recommended settings vary based on `optim`: sgd (`learning_rate = 1`), adagrad (`learning_rate = 0.1`), adadelta (`learning_rate = 1`), adam (`learning_rate = 0.1`). -* `layer_lrs`: Comma-separated learning rates for encoder, decoder, and generator when using 'adagrad', 'adadelta', or 'adam' for 'optim' option. Layer-specific learning rates cannot currently be used with sgd. +* `layer_lrs`: Comma-separated learning rates for encoder, decoder, and generator when using 'adagrad', 'adadelta', or 'adam' for 'optim' option. Layer-specific learning rates cannot currently be used with sgd. * `max_grad_norm`: If the norm of the gradient vector exceeds this, renormalize to have its norm equal to `max_grad_norm`. * `dropout`: Dropout probability. Dropout is applied between vertical LSTM stacks. * `lr_decay`: Decay learning rate by this much if (i) perplexity does not decrease on the validation diff --git a/train.lua b/train.lua index 92b5da4..e87a20a 100644 --- a/train.lua +++ b/train.lua @@ -946,7 +946,7 @@ function main() -- parse input params opt = cmd:parse(arg) - torch.manualSeed(opt.seed); + torch.manualSeed(opt.seed) if opt.gpuid >= 0 then print('using CUDA on GPU ' .. opt.gpuid .. '...')