Train with M40 card but got OOM message #8

chesterkuo · 2018-03-28T13:18:02Z

i'm checking this model with M40 device , which is 24G memory on this board.

What's you default batch size used on 1080 card ?? as it seem tf show OOM when i increase batch size to 64 ?

ghost · 2018-03-28T22:41:43Z

Hi, here are the hyperparameters used in the original paper which I haven't tried due to memory issue.

flags.DEFINE_integer("char_dim", 200, "Embedding dimension for char")
flags.DEFINE_integer("batch_size", 32, "Batch size")
flags.DEFINE_integer("num_steps", 150000, "Number of steps")
flags.DEFINE_integer("hidden", 128, "Hidden size")
flags.DEFINE_integer("num_heads", 8, "Number of heads in self attention")
flags.DEFINE_boolean("q2c", True, "Whether to use query to context attention or not")

Please change those lines above in config.py file and let us know the results. Thanks for your contribution!

chesterkuo · 2018-03-29T05:09:42Z

ok, i'm trying to use smaller batch size to avoid OOM case.

Also , using default char_dim in config.py instead of 200 here, will update results after 60000 steps.

chesterkuo · 2018-03-29T10:02:56Z

After trained with 60000 steps , run with testing case, here is results.

python config.py --mode test
Exact Match: 69.9716177862, F1: 79.4625328804

chesterkuo · 2018-03-29T10:04:06Z

here is my default config.

flags.DEFINE_integer("char_dim", 64, "Embedding dimension for char")

flags.DEFINE_integer("para_limit", 400, "Limit length for paragraph")
flags.DEFINE_integer("ques_limit", 50, "Limit length for question")
flags.DEFINE_integer("ans_limit", 30, "Limit length for answers")
flags.DEFINE_integer("test_para_limit", 1000, "Limit length for paragraph in test file")
flags.DEFINE_integer("test_ques_limit", 100, "Limit length for question in test file")
flags.DEFINE_integer("char_limit", 16, "Limit length for character")
flags.DEFINE_integer("word_count_limit", -1, "Min count for word")
flags.DEFINE_integer("char_count_limit", -1, "Min count for char")

flags.DEFINE_integer("capacity", 15000, "Batch size of dataset shuffle")
flags.DEFINE_integer("num_threads", 4, "Number of threads in input pipeline")
flags.DEFINE_boolean("is_bucket", False, "build bucket batch iterator or not")
flags.DEFINE_list("bucket_range", [40, 401, 40], "the range of bucket")

flags.DEFINE_integer("batch_size", 32, "Batch size")
flags.DEFINE_integer("num_steps", 60000, "Number of steps")
flags.DEFINE_integer("checkpoint", 1000, "checkpoint to save and evaluate the model")
flags.DEFINE_integer("period", 100, "period to save batch loss")
flags.DEFINE_integer("val_num_batches", 150, "Number of batches to evaluate the model")
flags.DEFINE_float("dropout", 0.1, "Dropout prob across the layers")
flags.DEFINE_float("grad_clip", 5.0, "Global Norm gradient clipping rate")
flags.DEFINE_float("learning_rate", 0.001, "Learning rate")
flags.DEFINE_float("decay", 0.9999, "Exponential moving average decay")
flags.DEFINE_float("l2_norm", 3e-7, "L2 norm scale")
flags.DEFINE_integer("hidden", 128, "Hidden size")
flags.DEFINE_integer("num_heads", 8, "Number of heads in self attention")
flags.DEFINE_boolean("q2c", True, "Whether to use query to context attention or not")

ghost · 2018-03-29T10:05:07Z

@chesterkuo nice! Could you share with us the training curve from tensorboard? It could possibly keep improving until 150k steps just like the paper's results.

chesterkuo · 2018-03-29T10:32:15Z

ghost · 2018-03-29T23:58:59Z

It seems the model is overfitting. We need more dropouts and better regularization.

chesterkuo · 2018-03-30T03:04:58Z

hi @minsangkim142

Did you see similar issue on your training env ?

ghost · 2018-03-30T06:29:10Z

Overfitting does occur even with hidden size = 96, however it is not as bad and the dev loss stays quite low (around 3.1). There are a few possible reasons to why our model is performance lower than the original paper (by about 2~3%):

The dropouts are not placed in right places. From the first author:
The dropout is applied between every two sub layers, and also between every two blocks. I would say we applied dropout whenever there is a new layer.
The dropouts are too low: The original paper suggests 0.1 as dropouts but I find this too low at times. Maybe increasing the dropouts to 0.15 or 0.2 and training longer might help.
The model architecture is different and we forgot an important feature that helps regularize: This may be hard to find but some architectures help with better regularization such as depthwise-separable convolution vs normal convolution.

chesterkuo · 2018-03-30T16:38:57Z

after 150000 step, it seem overfitting as well.

Exact Match: 69.0066225166, F1: 78.594759759

chesterkuo · 2018-04-02T07:40:05Z

Hi @minsangkim142

Try to change dropout to 0.2 in config.py as following , after run with 160000 step , here is eval results.

flags.DEFINE_float("dropout", 0.2, "Dropout prob across the layers")

{"f1": 79.11957637723766, "exact_match": 69.49858088930937}

ghost · 2018-04-04T23:34:24Z

Hi @chesterkuo , thanks for sharing your results. Could it be possible to share your tensorboard plot?

I'm trying different ways to apply dropouts to the network to decrease the overfitting even further. Will push new commits soon. The goal is to achieve best EM/F1 performance out of all opensource repositories on github.

chesterkuo · 2018-04-09T09:08:22Z

ghost · 2018-04-26T05:33:53Z

Made a new issue for this. #13

chesterkuo changed the title ~~try with M40~~ Train with M40 card but got OOM message Mar 28, 2018

ghost closed this as completed Mar 30, 2018

ghost reopened this Mar 30, 2018

ghost added help wanted and removed help wanted labels Apr 26, 2018

ghost closed this as completed Apr 26, 2018

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train with M40 card but got OOM message #8

Train with M40 card but got OOM message #8

chesterkuo commented Mar 28, 2018

ghost commented Mar 28, 2018 •

edited by ghost

Loading

chesterkuo commented Mar 29, 2018

chesterkuo commented Mar 29, 2018

chesterkuo commented Mar 29, 2018

ghost commented Mar 29, 2018

chesterkuo commented Mar 29, 2018

ghost commented Mar 29, 2018

chesterkuo commented Mar 30, 2018

ghost commented Mar 30, 2018 •

edited by ghost

Loading

chesterkuo commented Mar 30, 2018

chesterkuo commented Apr 2, 2018 •

edited

Loading

ghost commented Apr 4, 2018 •

edited by ghost

Loading

chesterkuo commented Apr 9, 2018

ghost commented Apr 26, 2018

Train with M40 card but got OOM message #8

Train with M40 card but got OOM message #8

Comments

chesterkuo commented Mar 28, 2018

ghost commented Mar 28, 2018 • edited by ghost Loading

chesterkuo commented Mar 29, 2018

chesterkuo commented Mar 29, 2018

chesterkuo commented Mar 29, 2018

ghost commented Mar 29, 2018

chesterkuo commented Mar 29, 2018

ghost commented Mar 29, 2018

chesterkuo commented Mar 30, 2018

ghost commented Mar 30, 2018 • edited by ghost Loading

chesterkuo commented Mar 30, 2018

chesterkuo commented Apr 2, 2018 • edited Loading

ghost commented Apr 4, 2018 • edited by ghost Loading

chesterkuo commented Apr 9, 2018

ghost commented Apr 26, 2018

ghost commented Mar 28, 2018 •

edited by ghost

Loading

ghost commented Mar 30, 2018 •

edited by ghost

Loading

chesterkuo commented Apr 2, 2018 •

edited

Loading

ghost commented Apr 4, 2018 •

edited by ghost

Loading