Skip to content

Neural machine translation implementation using dynet's python bindings

Notifications You must be signed in to change notification settings

roeeaharoni/dynmt-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dynmt-py

Neural machine translation implementation using dynet's python bindings.

Example Usage:

python dynmt.py --dynet-autobatch 0 --dynet-devices GPU:1 --dynet-mem 12000 \
--input-dim=500 --hidden-dim=1024 --epochs=100 --lstm-layers=1 --optimization=ADADELTA \
--batch-size=60 --beam-size=5 --vocab 30000 --plot --eval-after=10000  \
train_source.txt train_target.txt dev_source.txt dev_target.txt test_source.txt test_target.txt path/to/model/dir

Options:

Name Description
-h --help shows a help message and exits
--dynet-mem MEM allocates MEM bytes for dynet (see dynet's documentation for more details)
--dynet-gpus GPUS how many gpus to use (see dynet's documentation for more details)
--dynet-devices DEV CPU/GPU ids to use (see dynet's documentation for more details)
--dynet-autobatch AUTO switch auto-batching on (see dynet's documentation for more details)
--input-dim=INPUT input embeddings dimension [default: 300]
--hidden-dim=HIDDEN LSTM hidden layer dimension [default: 100]
--epochs=EPOCHS amount of training epochs [default: 1]
--layers=LAYERS amount of layers in LSTM [default: 1]
--optimization=OPTIMIZATION chosen optimization method (ADAM/SGD/ADAGRAD/MOMENTUM/ADADELTA) [default: ADADELTA]
--reg=REGULARIZATION regularization parameter for optimization [default: 0]
--learning=LEARNING learning rate parameter for optimization [default: 0.0001]
--batch-size=BATCH batch size [default: 1]
--beam-size=BEAM beam size in beam search [default: 5]
--vocab-size=VOCAB max vocabulary size [default: 99999]
--eval-after=EVALAFTER amount of train batches to wait before evaluation [default: 1000]
--max-len=MAXLEN max train sequence length [default: 50]
--max-pred=MAXPRED max predicted sequence length [default: 50]
--grad-clip=GRADCLIP gradient clipping threshold [default: 5.0]
--max-patience=MAXPATIENCE amount of checkpoints without improvement on dev before early stopping [default: 100]
--plot plot a learning curve while training each model
--override override existing model with the same name, if exists
--ensemble=ENSEMBLE ensemble model paths separated by a comma
--last-state only use last encoder state
--eval skip training, do only evaluation

Arguments (must be given in this order):

Name Description
TRAIN_INPUTS_PATH train inputs path
TRAIN_OUTPUTS_PATH train outputs path
DEV_INPUTS_PATH development inputs path
DEV_OUTPUTS_PATH development outputs path
TEST_INPUTS_PATH test inputs path
TEST_OUTPUTS_PATH test outputs path
RESULTS_PATH results file path