-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow validation during training #80
Comments
Hi @kobeyy thanks so much! That is indeed a weird phenomenon and I am not sure where it originates from. One difference is that during validation greedy decoding instead of beam search is used. But that should only speed things up. I'll look into it! |
I've only recently discovered your repository so it was with the following code Since them I've been doing more tests and discovered something else that could have something to do with this. When training the same dataset for 50 epochs the time to validate the dev dataset changes dramatically. In the beginning it takes more than 600s to validate 926 inputs. After some training it suddenly goes down to 50s to validate the same inputs. Has this something to do with initialization of the weights? Is this a specific property of a transformer maybe?
|
Hi @kobeyy |
Closing this due to inactivity. |
I've been using the transformer_small.yaml configuration to train a model.
During training the validate_on_data() method takes 5 times longer than in 'test' mode. I did adapt the test mode code a bit to load all the lines from a file and batch them equally as in training mode.
I can't find a good explanation for it since i'm using the same config file and same validation data.
Data set sizes:
train 90727,
valid 926,
test 926
Expected behavior
I would suspect it to take more or less the same time since the metric calculation is only done after that method.
System:
As my knowledge about transformers is rather limited i was hoping someone had some insight into this.
Thank you for this really nice code base!
The text was updated successfully, but these errors were encountered: