Enhancement: Please add options for incremental training. (Code2Text) #23

Manas-Embold · 2020-12-03T07:34:26Z

Hi
Please add option for incremental training, so that its possible to train on colab or similar platforms.

guoday · 2020-12-03T08:13:38Z

Do you mean gradient_accumulation_steps? The code has implemented. You can add option --gradient_accumulation_steps n for incremental training

Manas-Embold · 2020-12-03T08:16:12Z

Alright, thanks much !
I come from tensorflow background, therefore i am unaware of how its done in pytorch.
I would be thankful, if you can let me know what exactly i need to do
Say i run training for 2 epoch, save checkpoint and want to start again from saved check point.

guoday · 2020-12-03T08:19:53Z

Change "pretrained_model=microsoft/codebert-base" to "pretrained_model=saved_checkpoint_path"

Manas-Embold · 2020-12-03T08:20:44Z

Alright.
Thanks Much !!

Manas-Embold · 2020-12-03T08:28:27Z

One more question, just to be sure.
My calls should look like following:

Do i need to use --gradient_accumulation_steps somewhere now ? or just --pretrained_model should be fine

Call 1 for first two epochs:
python run.py --do_train --do_eval --model_type roberta --model_name_or_path "microsoft/codebert-base" --train_filename "../dataset/java/valid.jsonl" --dev_filename "../dataset/java/valid.jsonl" --output_dir "model/java" --max_source_length 256 --max_target_length 128 --beam_size 10 --train_batch_size 8 --eval_batch_size 8 --learning_rate 5e-5 --num_train_epochs 2

Call 2 for training for next two epoch

python run.py --do_train --do_eval --model_type roberta --model_name_or_path "saved_checkpoint_path" --train_filename "../dataset/java/valid.jsonl" --dev_filename "../dataset/java/valid.jsonl" --output_dir "model/java" --max_source_length 256 --max_target_length 128 --beam_size 10 --train_batch_size 8 --eval_batch_size 8 --learning_rate 5e-5 --num_train_epochs 2

guoday · 2020-12-03T08:29:57Z

just --pretrained_model is fine

Manas-Embold · 2020-12-03T08:30:31Z

Thanks

guoday · 2020-12-03T08:31:07Z

python run.py --do_train --do_eval --model_type roberta --model_name_or_path "saved_checkpoint_path" --train_filename "../dataset/java/valid.jsonl" --dev_filename "../dataset/java/valid.jsonl" --output_dir "model/java" --max_source_length 256 --max_target_length 128 --beam_size 10 --train_batch_size 8 --eval_batch_size 8 --learning_rate 5e-5 --num_train_epochs 2

guoday · 2020-12-03T08:41:36Z

Sorry, the option should be --load_model_path.

python run.py --do_train --do_eval --model_type roberta --model_name_or_path microsoft/codebert-base --train_filename "../dataset/java/valid.jsonl" --dev_filename "../dataset/java/valid.jsonl" --output_dir "model/java" --max_source_length 256 --max_target_length 128 --beam_size 10 --train_batch_size 8 --eval_batch_size 8 --learning_rate 5e-5 --num_train_epochs 2 --load_model_path $output_dir/checkpoint-best-bleu/pytorch_model.bin

Manas-Embold · 2020-12-03T08:42:48Z

Alright,
Thanks once again.

Manas-Embold · 2020-12-03T09:12:10Z

Hi,
Just to test the flow, I started training for 1 epoch, and Model was saved.
python run.py --do_train --do_eval --model_type roberta --model_name_or_path "microsoft/codebert-base" --train_filename "../dataset/javascript/valid.jsonl" --dev_filename "../dataset/javascript/valid.jsonl" --output_dir "model/javascript" --max_source_length 256 --max_target_length 128 --beam_size 10 --train_batch_size 16 --eval_batch_size 16 --learning_rate 5e-5 --num_train_epochs 1

Again I started the training from the trained model for next 2 epochs
python run.py --do_train --do_eval --model_type roberta --model_name_or_path "microsoft/codebert-base" --train_filename "../dataset/javascript/valid.jsonl" --dev_filename "../dataset/javascript/valid.jsonl" --output_dir "model/javascript" --max_source_length 256 --max_target_length 128 --beam_size 10 --train_batch_size 16 --eval_batch_size 16 --learning_rate 5e-5 --num_train_epochs 2 --load_model_path "/content/code/model/javascript/checkpoint-best-bleu/pytorch_model.bin"

Training has started again, but in console it says "Epoch 0" again instead of "Epoch 1"
Is this normal for script to say Epoch 0 again ? Is it actually Epoch 1 as essentially I am incrementally training on last checkpoint model.

Log for first iteration (Epoch 1)
12/03/2020 08:34:08 - INFO - main - Num examples = 3885
12/03/2020 08:34:08 - INFO - main - Batch size = 16
12/03/2020 08:34:08 - INFO - main - Num epoch = 1
epoch 0 loss 6.5622: 100% 243/243 [08:22<00:00, 2.07s/it]
12/03/2020 08:42:34 - INFO - main -
***** Running evaluation *****
12/03/2020 08:42:34 - INFO - main - Num examples = 3885
12/03/2020 08:42:34 - INFO - main - Batch size = 16
12/03/2020 08:45:32 - INFO - main - eval_ppl = 306.69674
12/03/2020 08:45:32 - INFO - main - global_step = 244
12/03/2020 08:45:32 - INFO - main - train_loss = 6.5622
12/03/2020 08:45:32 - INFO - main - ********************
12/03/2020 08:45:34 - INFO - main - Best ppl:306.69674
12/03/2020 08:45:34 - INFO - main - ********************
Total: 1000
12/03/2020 08:53:21 - INFO - main - bleu-4 = 7.58
12/03/2020 08:53:21 - INFO - main - ********************
12/03/2020 08:53:21 - INFO - main - Best bleu:7.58
12/03/2020 08:53:21 - INFO - main - ********************

Log for second iteration (Epoch 2)

12/03/2020 08:58:29 - INFO - main - ***** Running training *****
12/03/2020 08:58:29 - INFO - main - Num examples = 3885
12/03/2020 08:58:29 - INFO - main - Batch size = 16
12/03/2020 08:58:29 - INFO - main - Num epoch = 2
epoch 0 loss 5.4316: 100% 243/243 [08:22<00:00, 2.07s/it]
12/03/2020 09:06:54 - INFO - main -
***** Running evaluation *****
12/03/2020 09:06:54 - INFO - main - Num examples = 3885
12/03/2020 09:06:54 - INFO - main - Batch size = 16
12/03/2020 09:09:50 - INFO - main - eval_ppl = 117.87884
12/03/2020 09:09:50 - INFO - main - global_step = 244
12/03/2020 09:09:50 - INFO - main - train_loss = 5.4316
12/03/2020 09:09:50 - INFO - main - ********************
12/03/2020 09:09:52 - INFO - main - Best ppl:117.87884
12/03/2020 09:09:52 - INFO - main - ********************

Manas-Embold · 2020-12-03T09:15:43Z

Since loss has decreased in subsequent epochs, shall I assume that it is actually Epoch 1 and not epoch 0
In simple terms i want to be sure, that it is not training from scratch again.

Manas-Embold · 2020-12-03T09:16:38Z

Note that i am training on valid.jsonl just to quickly test the flow

guoday · 2020-12-03T09:21:02Z

--load_model_path just only re-load the model from checkpoint, but optimizer and logs will be reset. Maybe for implementing incremental training, we also need to save optimizer and logs.

Manas-Embold · 2020-12-03T09:23:40Z

Alrights
Resetting of logger is fine.
But not optimizer. right ?

guoday · 2020-12-03T10:00:52Z

Replace run.py with run.txt. You just need to re-run the following command and the program will restore the last checkpoint for incremental training.

lang=ruby #programming language
lr=5e-5
batch_size=32
beam_size=10
source_length=256
target_length=128
data_dir=../dataset
output_dir=model/$lang
train_file=$data_dir/$lang/train.jsonl
dev_file=$data_dir/$lang/valid.jsonl
epochs=10 
pretrained_model=microsoft/codebert-base #Roberta: roberta-base

python run.py --do_train --do_eval --model_type roberta --model_name_or_path $pretrained_model --train_filename $train_file --dev_filename $dev_file --output_dir $output_dir --max_source_length $source_length --max_target_length $target_length --beam_size $beam_size --train_batch_size $batch_size --eval_batch_size $batch_size --learning_rate $lr --num_train_epochs $epochs

Manas-Embold · 2020-12-03T10:04:06Z

Many Thanks for prompt response !

Manas-Embold closed this as completed Dec 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhancement: Please add options for incremental training. (Code2Text) #23

Enhancement: Please add options for incremental training. (Code2Text) #23

Manas-Embold commented Dec 3, 2020

guoday commented Dec 3, 2020

Manas-Embold commented Dec 3, 2020 •

edited

guoday commented Dec 3, 2020 •

edited

Manas-Embold commented Dec 3, 2020

Manas-Embold commented Dec 3, 2020 •

edited

guoday commented Dec 3, 2020 •

edited

Manas-Embold commented Dec 3, 2020

guoday commented Dec 3, 2020

guoday commented Dec 3, 2020 •

edited

Manas-Embold commented Dec 3, 2020

Manas-Embold commented Dec 3, 2020 •

edited

Manas-Embold commented Dec 3, 2020 •

edited

Manas-Embold commented Dec 3, 2020

guoday commented Dec 3, 2020

Manas-Embold commented Dec 3, 2020 •

edited

guoday commented Dec 3, 2020 •

edited

Manas-Embold commented Dec 3, 2020

Enhancement: Please add options for incremental training. (Code2Text) #23

Enhancement: Please add options for incremental training. (Code2Text) #23

Comments

Manas-Embold commented Dec 3, 2020

guoday commented Dec 3, 2020

Manas-Embold commented Dec 3, 2020 • edited

guoday commented Dec 3, 2020 • edited

Manas-Embold commented Dec 3, 2020

Manas-Embold commented Dec 3, 2020 • edited

guoday commented Dec 3, 2020 • edited

Manas-Embold commented Dec 3, 2020

guoday commented Dec 3, 2020

guoday commented Dec 3, 2020 • edited

Manas-Embold commented Dec 3, 2020

Manas-Embold commented Dec 3, 2020 • edited

Manas-Embold commented Dec 3, 2020 • edited

Manas-Embold commented Dec 3, 2020

guoday commented Dec 3, 2020

Manas-Embold commented Dec 3, 2020 • edited

guoday commented Dec 3, 2020 • edited

Manas-Embold commented Dec 3, 2020

Manas-Embold commented Dec 3, 2020 •

edited

guoday commented Dec 3, 2020 •

edited

Manas-Embold commented Dec 3, 2020 •

edited

guoday commented Dec 3, 2020 •

edited

guoday commented Dec 3, 2020 •

edited

Manas-Embold commented Dec 3, 2020 •

edited

Manas-Embold commented Dec 3, 2020 •

edited

Manas-Embold commented Dec 3, 2020 •

edited

guoday commented Dec 3, 2020 •

edited