Report the results #15

jasonwbw · 2018-04-26T07:48:18Z

Model	Training Steps	Size	Attention Heads	Data Size (aug)	EM	F1
My Model	60,000	128	1	87k (no aug)	70.7	79.8

The results are obtained on a K80 machine. I modify the trilinear function for memory efficiency, but the results are the same with the current version of this repository.

I'm not sure about the overfitting, the model is the last checkpoint after training 60,000 steps.

ghost · 2018-04-26T07:54:43Z

Great, thanks! Updating README.md now. As for overfitting, you can observe the dev loss going up after about 60k steps from the Tensorboard summary.

jasonwbw · 2018-04-26T08:05:23Z

Ok, the F1 and EM curves on the dev set oscillate after 30k steps, but the loss curve keeps on a rise after that. Unfortunately, I have just saved the last five checkpoints.

raisudeen · 2018-04-27T07:30:42Z

@jasonwbw Can you please share the trained model if possible? So that, it will be useful for all. Thanks in advance.
-Raisudeen.

ghost · 2018-04-29T05:29:40Z

Empirically, I found early stopping to be not beneficial to gaining maximum performance (possibly due to early stopping not letting exponential moving average to stabilize) so I am increasing the maximum patience from 3 to 10.

chesterkuo · 2018-05-05T01:56:56Z

Exact Match: 70.8136234626, F1: 80.0760742304

with following parameter.
--hidden=128 --num_heads=8 --checkpoint=2000 --num_steps=60000 --learning_rate=0.001 --batch_size=32

ajay-sreeram · 2018-05-05T11:06:49Z

EM: 68.60, F1: 78.16

Results obtained with CPU(8GB ram), ran for 4 and 1/2 days
Using all default parameters but Glove42B and 35K steps instead

jasonwbw · 2018-05-07T06:01:24Z

@raisudeen Sorry for the late replay due to the vacation. Checkpoints obtained by TF 1.4

localminimum · 2018-05-07T11:27:57Z

Hi @chesterkuo , is your result above obtained after this commit? f0c79cc
Thanks for reporting your results on CPU @ajay-sreeram! That is some impressive patience. I talked to the original paper authors and they saw a performance increase after using Glove800B over 42B. Is there a reason why you used 42B?

ajay-sreeram · 2018-05-07T12:41:57Z

Hi @localminimum , I am happy to see such a good performance results from re-implementation of QANet paper. To know the model performance with weaker embedding, I tried using Glove42B. Yes the results show that there is a decrease in the performance but that is not much, this proves the strength of QANet again.

Word Embedding	Training Steps	EM	F1
Glove840B	35,000	69.0	78.6
Glove42B	35,000	68.6	78.1

chesterkuo · 2018-05-08T04:10:05Z

@localminimum , yes, based on new changed with Glove "glove.840B.300d"

PhungVanDuy · 2018-05-08T12:38:02Z

@jasonwbw I tried to use your checkpoint but I got an error following:

Can you tell me how to use your checkpoint?
Thanks you!
@ajay-sreeram Can you please share the checkpoint? Thanks.

jasonwbw · 2018-05-09T02:16:08Z

@PhungVanDuy Firstly, I feel sorry that I have forgot to report some details, I obtain this results upon TF 1.4 instead of 1.5, and I report the results before commit f0c79cc.
Then I wonder that you have forgot to set save_path in the config.py according to your error.
And the path should be train_dir/model_name/model by default.

PhungVanDuy · 2018-05-09T05:20:09Z

@jasonwbw Thanks for your support. I have used your checkpoint, however, the result not makes sense, here is my screenshot!

EM: 0.331122
F1: 7.059
Can you explain to me about it!
Thanks

localminimum · 2018-05-09T05:23:17Z

Hi @PhungVanDuy , it seems like you are using the latest commit. @jasonwbw mentioned in the previous comment that he used the model before commit f0c79cc. You could revert back to the previous commit and try again. The performance fluctuation would be most likely due to my latest commit, removing the activation function from the highway layers. The input distribution will be very different without the relu activation and the highway layers won't have a single clue how to deal with the unseen distribution.

PhungVanDuy · 2018-05-09T06:40:09Z

@localminimum Thanks for your reply, I tried going back commit bb5769d (git reset --hard bb5769d) and run test. But I got an error following:

I save checkpoint in the default folder (train/FRC/model and checkpoint file: model_checkpoint_path: "model_60000_ckpt") and using tensorflow 1.4 as @jasonwbw mentioned.

@localminimum can you share for me lastest checkpoint of this repos?

jasonwbw · 2018-05-09T07:00:40Z

@PhungVanDuy maybe hidden size should be set to 128 in config.

PhungVanDuy · 2018-05-09T07:36:09Z

@jasonwbw I knew it but still error! Can you share your source code stable for the checkpoint? I would be very grateful for that. I am looking forward to your reply.

localminimum · 2018-05-09T07:39:32Z

@PhungVanDuy can you post the error you get here?

PhungVanDuy · 2018-05-09T08:00:21Z

@localminimum I reproduce the checkpoint of @jasonwbw with commit bb5769d (git reset --hard bb5769d).
I put the checkpoint to default folder: train/FRC/model.
And then I got an error:

Loading model...
Total number of trainable parameters: 1295938
2018-05-09 00:32:32.496830: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
2018-05-09 00:32:36.431273: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key Context_to_Query_Attention_Layer/trilinear/trilinear/linear_bias/ExponentialMovingAverage not found in checkpoint
2018-05-09 00:32:36.432023: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key Context_to_Query_Attention_Layer/trilinear/trilinear/linear_kernel/ExponentialMovingAverage not found in checkpoint
2018-05-09 00:32:36.432842: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key Context_to_Query_Attention_Layer/trilinear/trilinear/linear_kernel not found in checkpoint
2018-05-09 00:32:36.434867: W tensorflow/core/framework/op_kernel.cc:1192] Not found: Key Context_to_Query_Attention_Layer/trilinear/trilinear/linear_bias not found in checkpoint
Traceback (most recent call last):
File "config.py", line 145, in <module>
tf.app.run()
File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "config.py", line 136, in main
test(config)
File "/mnt/hdd1/pvduy/QA/Try/QANet/main.py", line 159, in test
saver.restore(sess, tf.train.latest_checkpoint(config.save_dir))
File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1666, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
	run_metadata_ptr)
	File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run
		feed_dict_tensor, options, run_metadata)
		File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
		options, run_metadata)
			File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
			raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key Context_to_Query_Attention_Layer/trilinear/trilinear/linear_bias/ExponentialMovingAverage not found in checkpoint
	[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]

	Caused by op u'save/RestoreV2_9', defined at:
	File "config.py", line 145, in <module>
	tf.app.run()
File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "config.py", line 136, in main
test(config)
File "/mnt/hdd1/pvduy/QA/Try/QANet/main.py", line 158, in test
saver = tf.train.Saver()
File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1218, in __init__
self.build()
File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1227, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1263, in _build
build_save=build_save, build_restore=build_restore)
File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 751, in _build_internal
	restore_sequentially, reshape)
	File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 427, in _AddRestoreOps
		tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 267, in restore_op
[spec.tensor.dtype])[0])
File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1021, in restore_v2
	shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
		File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
			op_def=op_def)
			File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
			op_def=op_def)
				File "/mnt/hdd1/pvduy/vlen2new/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
				self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

				NotFoundError (see above for traceback): Key Context_to_Query_Attention_Layer/trilinear/trilinear/linear_bias/ExponentialMovingAverage not found in checkpoint
										[[Node: save/RestoreV2_9 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2_9/tensor_names, save/RestoreV2_9/shape_and_slices)]]

My system:

Python: 2.7
Tensorflow: 1.4.0

localminimum · 2018-05-09T08:03:36Z

@jasonwbw looking at the error above, did you use the old trilinear function for the pretrained model above? It seems like the optimised trilinear function exponential moving average is missing.

localminimum · 2018-05-09T08:10:03Z

@PhungVanDuy comment out the old trilinear function as it is in the latest commit. Only use the optimied trilinear function and try again. It seems like you are trying to load weights that are not present in the checkpoints which belong to the old trilinear function.

PhungVanDuy · 2018-05-09T09:25:47Z

@localminimum Thank you for your supports I solved it!

gowthamrang · 2018-06-26T00:21:39Z

Great work folks! :)This information is very helpful. One other piece of information that would help some audience, I guess, is the time taken for inference. @jasonwbw , @ajay-sreeram (CPU), @chesterkuo ?

thank you :)

GuoYL36 · 2018-07-19T08:58:50Z

Hi, I use all default parameters and get results worse than that you presented.
EM: 67.975, F1: 78.015 with following parameter.
--hidden=96 --num_heads=1 --num_steps=35000
I don't know why it happens?

GuoYL36 · 2018-07-20T02:57:03Z

I try to modify the character embedding size to train again with following parameter
--hidden=96 --num_heads=1 --num_steps=35000 --char_emb_size=200(original paper using)
And I get the results: EM: 69.196, F1:78.66.

localminimum · 2018-07-20T02:59:10Z

Hi @webdaren , is there a reason why you've trained specifically for 35000 steps? All the listed results are based on models trained for 60k steps or longer.

GuoYL36 · 2018-07-20T03:49:21Z

@localminimum Thank you for your answer! I just want to compare with the result of the first row in listed results. I get the result (EM: 67.975, F1: 78.015) worse than that in listed results when I used these parameters of the first row in listed results(specially, the value of char_emb_size is 64 in config.py).
Of course, I also train the model for 60K steps and get the result (EM: 67.8, F1: 77.82) worst than all results. And I will train the model for 60K steps using the modified value of char_emb_size (original paper using).

ghost mentioned this issue Apr 26, 2018

TODOs #13

Open

7 tasks

jasonwbw changed the title ~~Report the results with 128 hidden units~~ Report the results May 8, 2018

localminimum mentioned this issue Oct 26, 2018

Train Models With Macs #42

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report the results #15

Report the results #15

jasonwbw commented Apr 26, 2018

ghost commented Apr 26, 2018

jasonwbw commented Apr 26, 2018

raisudeen commented Apr 27, 2018

ghost commented Apr 29, 2018

chesterkuo commented May 5, 2018

ajay-sreeram commented May 5, 2018 •

edited

Loading

jasonwbw commented May 7, 2018 •

edited

Loading

localminimum commented May 7, 2018 •

edited

Loading

ajay-sreeram commented May 7, 2018

chesterkuo commented May 8, 2018

PhungVanDuy commented May 8, 2018

jasonwbw commented May 9, 2018

PhungVanDuy commented May 9, 2018

localminimum commented May 9, 2018 •

edited

Loading

PhungVanDuy commented May 9, 2018 •

edited

Loading

jasonwbw commented May 9, 2018

PhungVanDuy commented May 9, 2018

localminimum commented May 9, 2018

PhungVanDuy commented May 9, 2018 •

edited

Loading

localminimum commented May 9, 2018

localminimum commented May 9, 2018 •

edited

Loading

PhungVanDuy commented May 9, 2018 •

edited

Loading

gowthamrang commented Jun 26, 2018 •

edited

Loading

GuoYL36 commented Jul 19, 2018

GuoYL36 commented Jul 20, 2018

localminimum commented Jul 20, 2018 •

edited

Loading

GuoYL36 commented Jul 20, 2018 •

edited

Loading

Report the results #15

Report the results #15

Comments

jasonwbw commented Apr 26, 2018

ghost commented Apr 26, 2018

jasonwbw commented Apr 26, 2018

raisudeen commented Apr 27, 2018

ghost commented Apr 29, 2018

chesterkuo commented May 5, 2018

ajay-sreeram commented May 5, 2018 • edited Loading

jasonwbw commented May 7, 2018 • edited Loading

localminimum commented May 7, 2018 • edited Loading

ajay-sreeram commented May 7, 2018

chesterkuo commented May 8, 2018

PhungVanDuy commented May 8, 2018

jasonwbw commented May 9, 2018

PhungVanDuy commented May 9, 2018

localminimum commented May 9, 2018 • edited Loading

PhungVanDuy commented May 9, 2018 • edited Loading

jasonwbw commented May 9, 2018

PhungVanDuy commented May 9, 2018

localminimum commented May 9, 2018

PhungVanDuy commented May 9, 2018 • edited Loading

localminimum commented May 9, 2018

localminimum commented May 9, 2018 • edited Loading

PhungVanDuy commented May 9, 2018 • edited Loading

gowthamrang commented Jun 26, 2018 • edited Loading

GuoYL36 commented Jul 19, 2018

GuoYL36 commented Jul 20, 2018

localminimum commented Jul 20, 2018 • edited Loading

GuoYL36 commented Jul 20, 2018 • edited Loading

ajay-sreeram commented May 5, 2018 •

edited

Loading

jasonwbw commented May 7, 2018 •

edited

Loading

localminimum commented May 7, 2018 •

edited

Loading

localminimum commented May 9, 2018 •

edited

Loading

PhungVanDuy commented May 9, 2018 •

edited

Loading

PhungVanDuy commented May 9, 2018 •

edited

Loading

localminimum commented May 9, 2018 •

edited

Loading

PhungVanDuy commented May 9, 2018 •

edited

Loading

gowthamrang commented Jun 26, 2018 •

edited

Loading

localminimum commented Jul 20, 2018 •

edited

Loading

GuoYL36 commented Jul 20, 2018 •

edited

Loading