Skip to content

Preprocess step is completing but pth files are not generating as expected in XLM folder #4

@Prathameshwar

Description

@Prathameshwar

Hello,
After running preprocess steps from below command:

python -m codegen_sources.preprocessing.preprocess /path/data/mydata2 --langs java cpp --mode monolingual_functions --bpe_mode=fast --local=True --train_splits=1 --fastbpe_code_path=/path/data/bpe/cpp-java-python/ --fastbpe_vocab_path=/path/data/bpe/cpp-java-python/

XLM-syml folder is getting generated with file name like:
test.cpp_cl.pth
test.cpp_sa.pth
test.java_cl.pth
....
train.cpp_cl.0.pth
train.cpp_sa.0.pth
...

But when using these folder/files in Training step(MLM) is giving error like:

XLM-syml/train.java.pth not found
XLM-syml/valid.java.pth not found
XLM-syml/test.java.pth not found
XLM-syml/train.cpp.pth not found
XLM-syml/valid.cpp.pth not found
XLM-syml/test.cpp.pth not found

Train command:
python train.py --exp_name mlm --dump_path '/path/CodeGen/data/models' --data_path '/path/data/mydata2/XLM-syml' --split_data_accross_gpu local --mlm_steps 'java,cpp' --add_eof_to_stream true --word_mask_keep_rand '0.8,0.1,0.1' --word_pred '0.15' --encoder_only true --n_layers 12 --emb_dim 768 --n_heads 12 --lgs 'java-cpp' --max_vocab 64000 --gelu_activation true --roberta_mode false --amp 2 --fp16 true --batch_size 8 --bptt 512 --epoch_size 1000 --max_epoch 2000 --split_data_accross_gpu global --optimizer 'adam_inverse_sqrt,warmup_updates=10000,lr=0.0001,weight_decay=0.01' --save_periodic 0 --validation_metrics _valid_mlm_ppl --stopping_criterion '_valid_mlm_ppl,10'

I think files are getting generated with suffix like _cl.pth or _sa.pth which is not being considered in training step? OR I am doing something wrong?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions