Skip to content

Latest commit

 

History

History
34 lines (28 loc) · 12.5 KB

Debug.md

File metadata and controls

34 lines (28 loc) · 12.5 KB

Shell commands for debugging

Debugging on cycles.princeton.edu

Masked language modeling

  1. MLM Monolingual - python examples/language-modeling/run_mlm.py --train_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --output_dir=../../data/model_outputs/wikitext/debug --model_type=roberta --config_name=roberta-base --tokenizer_name=roberta-base --learning_rate 1e-4 --num_train_epochs 2 --warmup_steps 10000 --do_train --save_steps 10000 --per_device_train_batch_size 2 --overwrite_output_dir
  2. MLM vocabulary permute - python -m pdb examples/language-modeling/run_mlm_synthetic.py --train_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --validation_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --output_dir=../../data/model_outputs/wikitext/debug --model_type=roberta --config_name=roberta-base --tokenizer_name=roberta-base --learning_rate 1e-4 --num_train_epochs 2 --warmup_steps 10000 --do_train --do_eval --save_steps 10000 --per_device_train_batch_size 2 --overwrite_output_dir --permute_vocabulary --vocab_permutation_file /n/fs/nlp-asd/asd/asd/Projects/Multilingual/Multilingual/synthetic_language_files/word_based/configuration_files/permuted_vocab_seed_42_size_50265.json --word_modification add
  3. Random word modification - python -m pdb examples/language-modeling/run_mlm_synthetic.py --train_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --validation_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --output_dir=../../data/model_outputs/wikitext/debug --model_type=roberta --config_name=roberta-base --tokenizer_name=roberta-base --learning_rate 1e-4 --num_train_epochs 2 --warmup_steps 10000 --do_train --do_eval --save_steps 10000 --per_device_train_batch_size 2 --overwrite_output_dir --modify_words --word_modification add
  4. Inverting sentence - python -m pdb examples/language-modeling/run_mlm_synthetic.py --train_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --validation_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --output_dir=../../data/model_outputs/wikitext/debug --model_type=roberta --config_name=roberta-base --tokenizer_name=roberta-base --learning_rate 1e-4 --num_train_epochs 2 --warmup_steps 10000 --do_train --do_eval --save_steps 10000 --per_device_train_batch_size 2 --overwrite_output_dir --invert_word_order --word_modification add
  5. Inverting sentence with cache - python -m pdb examples/language-modeling/run_mlm_synthetic.py --train_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --validation_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --output_dir=../../data/model_outputs/wikitext/debug --model_type=roberta --config_name=roberta-base --tokenizer_name=roberta-base --learning_rate 1e-4 --num_train_epochs 2 --warmup_steps 10000 --do_train --do_eval --save_steps 10000 --per_device_train_batch_size 2 --overwrite_output_dir --invert_word_order --word_modification add
  6. One-to-one mapping for vocabulary - python -m pdb examples/language-modeling/run_mlm_synthetic.py --train_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --validation_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --output_dir=../../data/model_outputs/wikitext/debug --model_type=roberta --config_name=roberta-base --tokenizer_name=roberta-base --learning_rate 1e-4 --num_train_epochs 2 --warmup_steps 10000 --do_train --do_eval --save_steps 10000 --per_device_train_batch_size 2 --overwrite_output_dir --one_to_one_mapping --shift_special --word_modification add
  7. One-to-one mapping for vocabulary (with file) - python -m pdb examples/language-modeling/run_mlm_synthetic.py --train_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --validation_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --output_dir=../../data/model_outputs/wikitext/debug --model_type=roberta --config_name=roberta-base --tokenizer_name=roberta-base --learning_rate 1e-4 --num_train_epochs 2 --warmup_steps 10000 --do_train --do_eval --save_steps 10000 --per_device_train_batch_size 2 --overwrite_output_dir --one_to_one_mapping --shift_special --one_to_one_file ../synthetic_language_files/word_based/configuration_files/one_to_one_mapping_random_50265_fraction_70.npy --word_modification add
  8. Permutation language modeling - python -m pdb examples/language-modeling/run_mlm_synthetic.py --train_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --validation_file=../../../../BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/wikitext-103-raw/wiki.valid.txt --output_dir=../../data/model_outputs/wikitext/debug --model_type=roberta --config_name=roberta-base --tokenizer_name=roberta-base --learning_rate 1e-4 --num_train_epochs 2 --warmup_steps 10000 --do_train --do_eval --save_steps 10000 --per_device_train_batch_size 2 --overwrite_output_dir --permute_words --word_modification add

NER

  1. Baseline - python -m pdb run_ner_synthetic.py --model_name_or_path bert-base-uncased --train_file /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/multilingual_nlu/xtreme/panx_dataset/en/train.json --validation_file /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/multilingual_nlu/xtreme/panx_dataset/en/dev.json --output_dir ../../../../data/model_outputs/ner/debug --do_train --do_eval --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --overwrite_output_dir
  2. Word modification with random sampling - python -m pdb run_ner_synthetic.py --model_name_or_path bert-base-uncased --train_file /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/multilingual_nlu/xtreme/panx_dataset/en/train.json --validation_file /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/multilingual_nlu/xtreme/panx_dataset/en/dev.json --output_dir ../../../../data/model_outputs/ner/debug --save_steps -1 --do_train --do_eval --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --overwrite_output_dir --modify_words --word_modification replace
  3. Inverted order - Might have to use the --label_all_tokens flag - python -m pdb run_ner_synthetic.py --model_name_or_path bert-base-uncased --train_file /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/multilingual_nlu/xtreme/panx_dataset/en/train.json --validation_file /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/multilingual_nlu/xtreme/panx_dataset/en/dev.json --output_dir ../../../../data/model_outputs/ner/debug --save_steps -1 --do_train --do_eval --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --overwrite_output_dir --invert_word_order --word_modification replace

POS

  1. Baseline - python -m pdb run_ner_synthetic.py --task_name pos --model_name_or_path bert-base-uncased --train_file /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/multilingual_nlu/xtreme/udpos/train-en.json --validation_file /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/multilingual_nlu/xtreme/udpos/dev-en.json --output_dir ../../../../data/model_outputs/pos/debug --do_train --do_eval --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --overwrite_output_dir
  2. Inverted order - python -m pdb run_ner_synthetic.py --task_name pos --model_name_or_path bert-base-uncased --train_file /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/multilingual_nlu/xtreme/udpos/train-en.json --validation_file /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/multilingual_nlu/xtreme/udpos/dev-en.json --output_dir ../../../../data/model_outputs/pos/debug --do_train --do_eval --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --overwrite_output_dir --invert_word_order --word_modification replace
  3. Permutation language modeling - python -m pdb run_ner_synthetic.py --task_name pos --model_name_or_path bert-base-uncased --train_file /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/multilingual_nlu/xtreme/udpos/dev-en.json --validation_file /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/multilingual_nlu/xtreme/udpos/dev-en.json --output_dir ../../../../data/model_outputs/pos/debug --do_train --do_eval --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --overwrite_output_dir --permute_words --word_modification replace

Sentence retrieval

  1. Inverted-order run - python -m pdb examples/sentence_retrieval/run_sentence_retrieval_synthetic.py --model_name_or_path bert-base-multilingual-cased --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --train_file=../../data/tatoeba/en/en.json --output_dir=../../data/model_outputs/wikitext/debug --overwrite_output_dir --do_train --invert_word_order --word_modification replace --pool_type middle
  2. Word modification - python -m pdb examples/sentence_retrieval/run_sentence_retrieval_synthetic.py --model_name_or_path bert-base-multilingual-cased --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --train_file=../../data/tatoeba/en/en.json --output_dir=../../data/model_outputs/wikitext/debug --overwrite_output_dir --do_train --modify_words --modify_words_probability 0.9 --word_modification replace --pool_type cls
  3. Bilingual evaluation - python -m pdb examples/sentence_retrieval/run_sentence_retrieval_synthetic.py --model_name_or_path bert-base-multilingual-cased --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --train_file=../../data/tatoeba/en/en_hi.json --output_dir=../../data/model_outputs/wikitext/debug --overwrite_output_dir --do_train --modify_words --modify_words_probability 0.9 --word_modification replace --pool_type cls --bilingual
  4. Syntax modification: Use the --bilingual flag for this as well. Use train_file as something like /n/fs/nlp-asd/asd/asd/Projects/Multilingual/data/tatoeba/en/dep/synthetic_dep_flattened_en-en~hi@N~hi@V.json.

QA

  1. Baseline - python -m pdb run_qa_synthetic.py --task_name qa --model_name_or_path roberta-base --train_file /n/fs/nlp-asd/asd/asd/Projects/Multilingual/data/xquad/en/dev_en.json --validation_file /n/fs/nlp-asd/asd/asd/Projects/Multilingual/data/xquad/en/dev_en.json --output_dir ../../../../data/model_outputs/pos/debug --do_train --do_eval --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --doc_stride 128 --overwrite_output_dir --num_train_epochs 2
  2. Word modif - python -m pdb run_qa_synthetic.py --task_name qa --model_name_or_path roberta-base --train_file /n/fs/nlp-asd/asd/asd/Projects/Multilingual/data/xquad/en/dev_en.json --validation_file /n/fs/nlp-asd/asd/asd/Projects/Multilingual/data/xquad/en/dev_en.json --output_dir ../../../../data/model_outputs/pos/debug --do_train --do_eval --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --doc_stride 128 --overwrite_output_dir --num_train_epochs 2 --modify_words --modify_words_probability 0.9 --word_modification replace
  3. Inverted order - python -m pdb run_qa_synthetic.py --task_name qa --model_name_or_path roberta-base --train_file /n/fs/nlp-asd/asd/asd/Projects/Multilingual/data/xquad/en/dev_en.json --validation_file /n/fs/nlp-asd/asd/asd/Projects/Multilingual/data/xquad/en/dev_en.json --output_dir ../../../../data/model_outputs/pos/debug --do_train --do_eval --cache_dir /n/fs/nlp-asd/asd/asd/BERT_Embeddings_Test/BERT_Embeddings_Test/global_data/transformer_models --doc_stride 128 --overwrite_output_dir --num_train_epochs 2 --invert_word_order --word_modification replace