-
Notifications
You must be signed in to change notification settings - Fork 25.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lm_fine-tuning on small dataset of 3 documents #1907
Comments
How do you know you have exact identical |
Yes, I just thought of comparing the files naively by comparing their sizes. I see, yes, "hashes" sounds a much better way of comparing files, thanks. I'll post here if that works. Also, do you have any beginner suggestions on generating the hashes quickly and efficiently? |
I used md5sum pytorch_model.bin to generate the hashes of the files and both are different. Anyway, thanks, again! |
❓ Questions & Help
Hi,
I am trying to use run_lm_finetuning.py on a sample dataset here.
I am running the script with following arguments but I get the exact identical pytorch_model.bin [440.5 MB] saved in the output_dir=op:
python run_lm_finetuning.py --train_data_file=sample_text.txt --output_dir=op --mlm --do_train --overwrite_output_dir --do_lower_case --save_steps=50
I was wondering if this dataset of 3 documents is too small to fine-tune on or if I can modify some arguments to get a domain-fine-tuned model.
Thanks!
The text was updated successfully, but these errors were encountered: