lm_fine-tuning on small dataset of 3 documents #1907

vr25 · 2019-11-21T14:07:44Z

❓ Questions & Help

Hi,

I am trying to use run_lm_finetuning.py on a sample dataset here.

I am running the script with following arguments but I get the exact identical pytorch_model.bin [440.5 MB] saved in the output_dir=op:
python run_lm_finetuning.py --train_data_file=sample_text.txt --output_dir=op --mlm --do_train --overwrite_output_dir --do_lower_case --save_steps=50

I was wondering if this dataset of 3 documents is too small to fine-tune on or if I can modify some arguments to get a domain-fine-tuned model.

Thanks!

iedmrc · 2019-11-22T14:50:51Z

How do you know you have exact identical pytorch_model.bin files? Do you just compare file sizes? IF so, it is not a qualified method just because weights usually are just float numbers and they (almost) always occupy same size on the disk. You can compare the hashes of files to make sure.

vr25 · 2019-11-22T14:54:18Z

Yes, I just thought of comparing the files naively by comparing their sizes.

I see, yes, "hashes" sounds a much better way of comparing files, thanks. I'll post here if that works.

Also, do you have any beginner suggestions on generating the hashes quickly and efficiently?

vr25 · 2019-11-22T15:06:24Z

I used md5sum pytorch_model.bin to generate the hashes of the files and both are different. Anyway, thanks, again!

vr25 closed this as completed Nov 21, 2019

vr25 reopened this Nov 21, 2019

This was referenced Nov 22, 2019

utils.py [output_mode = regression] ThilinaRajapakse/pytorch-transformers-classification#34

Open

example data for task-fine-tuning and domain-fine-tuning xhan77/AdaptaBERT#4

Open

lm_fine_tuning ProsusAI/finBERT#6

Closed

vr25 closed this as completed Nov 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lm_fine-tuning on small dataset of 3 documents #1907

lm_fine-tuning on small dataset of 3 documents #1907

vr25 commented Nov 21, 2019 •

edited

Loading

iedmrc commented Nov 22, 2019

vr25 commented Nov 22, 2019 •

edited

Loading

vr25 commented Nov 22, 2019 •

edited

Loading

lm_fine-tuning on small dataset of 3 documents #1907

lm_fine-tuning on small dataset of 3 documents #1907

Comments

vr25 commented Nov 21, 2019 • edited Loading

❓ Questions & Help

iedmrc commented Nov 22, 2019

vr25 commented Nov 22, 2019 • edited Loading

vr25 commented Nov 22, 2019 • edited Loading

vr25 commented Nov 21, 2019 •

edited

Loading

vr25 commented Nov 22, 2019 •

edited

Loading

vr25 commented Nov 22, 2019 •

edited

Loading