training: IO is slow #2

monajalal · 2018-10-03T20:18:47Z

during training I keep getting the following:
data reader: waiting for data loading (IO slow)

also in the beginning I go the following message:
imdb does not contain bounding boxes

Do you get these messages too?
Or how should I improve it?

python exp_clevr_snmn/train_net_vqa.py --cfg exp_clevr_snmn/cfgs/vqa_gt_layout.yaml

here is the output of nvidia-smi while it is being trained

[jalal@goku snmn]$ nvidia-smi
Wed Oct  3 16:17:54 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:05:00.0  On |                  N/A |
|  0%   31C    P2   159W / 250W |   5644MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:06:00.0 Off |                  N/A |
|  0%   26C    P8    11W / 250W |      2MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2214      G   /usr/bin/X                                   159MiB |
|    0      2946      C   python                                      4459MiB |
|    0      7217      G   ...quest-channel-token=3890773986348045915   935MiB |
|    0     12989      G   /usr/bin/gnome-shell                          86MiB |
+-----------------------------------------------------------------------------+

please let me know if you have any suggestion.

The text was updated successfully, but these errors were encountered:

monajalal · 2018-10-04T15:57:34Z

It was slow but eventually the training finished with the following results under ~30hrs:

exp: vqa_gt_layout, iter = 200000
	loss (vqa) = 0.050071, loss (layout) = 0.000070, loss (rec) = 0.000000, loss (sharpen) = 0.000000, sharpen_scale = 1.000000
	accuracy (cur) = 0.968750, accuracy (avg) = 0.977994
snapshot saved to ./exp_clevr_snmn/tfmodel/vqa_gt_layout/00200000

mynameischaos · 2019-01-04T10:07:35Z

how to solve this problem.

monajalal · 2019-01-04T10:12:39Z

This might sound weird but for us training it on Tesla P100 reduced it to 7hrs which is pretty good. Though depending on what you want out of snmn 30hrs on 1080Ti is not that much of a big deal unless you want to change code frequently and retrain.

mynameischaos · 2019-01-04T10:17:12Z

how many gpus and prefetch-num, I also use Tesla P100

mynameischaos · 2019-01-07T13:41:07Z

This might sound weird but for us training it on Tesla P100 reduced it to 7hrs which is pretty good. Though depending on what you want out of snmn 30hrs on 1080Ti is not that much of a big deal unless you want to change code frequently and retrain.

Hello~ I want to learn about more details. Did you just git clone this code and run in a single GPU mode(P100)? it just spent about 7h.

monajalal closed this as completed Oct 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training: IO is slow #2

training: IO is slow #2

monajalal commented Oct 3, 2018

monajalal commented Oct 4, 2018

mynameischaos commented Jan 4, 2019

monajalal commented Jan 4, 2019

mynameischaos commented Jan 4, 2019

mynameischaos commented Jan 7, 2019

training: IO is slow #2

training: IO is slow #2

Comments

monajalal commented Oct 3, 2018

monajalal commented Oct 4, 2018

mynameischaos commented Jan 4, 2019

monajalal commented Jan 4, 2019

mynameischaos commented Jan 4, 2019

mynameischaos commented Jan 7, 2019