-
Notifications
You must be signed in to change notification settings - Fork 146
Description
This is a follow up to issue #5. I followed the directions there to to use the provided BPE codes and vocab and am still having issues. Specifically, I am getting this
outputing hypotheses in ./model_1/transcoder/q3nn5plz4y/hypotheses/hyp0.cpp_sa-java_sa.valid_beam0.txt compute_comp_acc Traceback (most recent call last): File "codegen_sources/model/train.py", line 701, in <module> main(params) File "codegen_sources/model/train.py", line 665, in main scores = evaluator.run_all_evals(trainer) File "/content/CodeGen/codegen_sources/model/src/evaluation/evaluator.py", line 299, in run_all_evals spans, File "/content/CodeGen/codegen_sources/model/src/evaluation/evaluator.py", line 939, in evaluate_mt roberta_mode=params.roberta_mode, File "/content/CodeGen/codegen_sources/model/src/evaluation/evaluator.py", line 1051, in compute_comp_acc roberta_mode, File "/content/CodeGen/codegen_sources/model/src/utils.py", line 392, in eval_function_output ids = read_file_lines(id_path) File "/content/CodeGen/codegen_sources/model/src/utils.py", line 455, in read_file_lines with open(hyp_path, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: './model_1/transcoder/q3nn5plz4y/hypotheses/ids.cpp_sa-java_sa.valid.txt'
after I run the following commands
python -m codegen_sources.preprocessing.preprocess /content/CodeGen/data/test_dataset/ --langs cpp java python --mode=monolingual_functions --local=True --fastbpe_vocab_path=/content/CodeGen/data/bpe/cpp-java-python/vocab --fastbpe_code_path=/content/CodeGen/data/bpe/cpp-java-python/codes --bpe_mode=fast --train_splits=1 --percent_test_valid=20
python codegen_sources/model/train.py --exp_name transcoder --dump_path './model_1' --data_path '/content/CodeGen/data/test_dataset/XLM-syml' --split_data_accross_gpu local --bt_steps 'cpp_sa-java_sa-cpp_sa,java_sa-cpp_sa-java_sa' --ae_steps 'cpp_sa,java_sa' --lambda_ae '0:1,30000:0.1,100000:0' --word_shuffle 3 --word_dropout '0.1' --word_blank '0.3' --encoder_only False --n_layers 0 --n_layers_encoder 12 --n_layers_decoder 6 --emb_dim 768 --n_heads 12 --lgs 'java_sa-cpp_sa' --max_vocab 64000 --gelu_activation true --roberta_mode False --lgs_mapping 'java_sa:java_obfuscated,cpp_sa:cpp_obfuscated' --amp 2 --fp16 true --tokens_per_batch 3000 --group_by_size true --max_batch_size 128 --epoch_size 50000 --max_epoch 10000000 --split_data_accross_gpu global --optimizer 'adam_inverse_sqrt,warmup_updates=10000,lr=0.0001,weight_decay=0.01' --eval_bleu true --eval_computation true --generate_hypothesis true --save_periodic 1 --validation_metrics 'valid_cpp_sa-java_sa_mt_comp_acc'
At this point I am using the provided data in all places and am following the directions in the read me, I am not sure what the issue is. I am experimenting with different flag variables but am struggling to find the issue