-
Notifications
You must be signed in to change notification settings - Fork 305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train with M40 card but got OOM message #8
Comments
Hi, here are the hyperparameters used in the original paper which I haven't tried due to memory issue.
Please change those lines above in |
ok, i'm trying to use smaller batch size to avoid OOM case. Also , using default char_dim in config.py instead of 200 here, will update results after 60000 steps. |
After trained with 60000 steps , run with testing case, here is results. python config.py --mode test |
here is my default config. flags.DEFINE_integer("char_dim", 64, "Embedding dimension for char") flags.DEFINE_integer("para_limit", 400, "Limit length for paragraph") flags.DEFINE_integer("capacity", 15000, "Batch size of dataset shuffle") flags.DEFINE_integer("batch_size", 32, "Batch size") |
@chesterkuo nice! Could you share with us the training curve from tensorboard? It could possibly keep improving until 150k steps just like the paper's results. |
It seems the model is overfitting. We need more dropouts and better regularization. |
Did you see similar issue on your training env ? |
Overfitting does occur even with hidden size = 96, however it is not as bad and the dev loss stays quite low (around 3.1). There are a few possible reasons to why our model is performance lower than the original paper (by about 2~3%):
|
after 150000 step, it seem overfitting as well. Exact Match: 69.0066225166, F1: 78.594759759 |
Try to change dropout to 0.2 in config.py as following , after run with 160000 step , here is eval results. flags.DEFINE_float("dropout", 0.2, "Dropout prob across the layers") {"f1": 79.11957637723766, "exact_match": 69.49858088930937} |
Hi @chesterkuo , thanks for sharing your results. Could it be possible to share your tensorboard plot? I'm trying different ways to apply dropouts to the network to decrease the overfitting even further. Will push new commits soon. The goal is to achieve best EM/F1 performance out of all opensource repositories on github. |
Made a new issue for this. #13 |
i'm checking this model with M40 device , which is 24G memory on this board.
What's you default batch size used on 1080 card ?? as it seem tf show OOM when i increase batch size to 64 ?
The text was updated successfully, but these errors were encountered: