Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training: IO is slow #2

Closed
monajalal opened this issue Oct 3, 2018 · 5 comments
Closed

training: IO is slow #2

monajalal opened this issue Oct 3, 2018 · 5 comments

Comments

@monajalal
Copy link

during training I keep getting the following:
data reader: waiting for data loading (IO slow)

also in the beginning I go the following message:
imdb does not contain bounding boxes

Do you get these messages too?
Or how should I improve it?

python exp_clevr_snmn/train_net_vqa.py --cfg exp_clevr_snmn/cfgs/vqa_gt_layout.yaml
screenshot from 2018-10-03 16-15-51

here is the output of nvidia-smi while it is being trained

[jalal@goku snmn]$ nvidia-smi
Wed Oct  3 16:17:54 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:05:00.0  On |                  N/A |
|  0%   31C    P2   159W / 250W |   5644MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:06:00.0 Off |                  N/A |
|  0%   26C    P8    11W / 250W |      2MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2214      G   /usr/bin/X                                   159MiB |
|    0      2946      C   python                                      4459MiB |
|    0      7217      G   ...quest-channel-token=3890773986348045915   935MiB |
|    0     12989      G   /usr/bin/gnome-shell                          86MiB |
+-----------------------------------------------------------------------------+

please let me know if you have any suggestion.

@monajalal
Copy link
Author

It was slow but eventually the training finished with the following results under ~30hrs:

exp: vqa_gt_layout, iter = 200000
	loss (vqa) = 0.050071, loss (layout) = 0.000070, loss (rec) = 0.000000, loss (sharpen) = 0.000000, sharpen_scale = 1.000000
	accuracy (cur) = 0.968750, accuracy (avg) = 0.977994
snapshot saved to ./exp_clevr_snmn/tfmodel/vqa_gt_layout/00200000

@mynameischaos
Copy link

how to solve this problem.

@monajalal
Copy link
Author

This might sound weird but for us training it on Tesla P100 reduced it to 7hrs which is pretty good. Though depending on what you want out of snmn 30hrs on 1080Ti is not that much of a big deal unless you want to change code frequently and retrain.

@mynameischaos
Copy link

how many gpus and prefetch-num, I also use Tesla P100

@mynameischaos
Copy link

This might sound weird but for us training it on Tesla P100 reduced it to 7hrs which is pretty good. Though depending on what you want out of snmn 30hrs on 1080Ti is not that much of a big deal unless you want to change code frequently and retrain.

Hello~ I want to learn about more details. Did you just git clone this code and run in a single GPU mode(P100)? it just spent about 7h.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants