Training too slow #2

ZhuFengdaaa · 2018-06-22T04:46:07Z

My machine has 3 Titan 1080 Ti, 12 Intel i7 CPUs. Its total memory is 65GB. However, the program cost me more than 5800s to run an epoch.
My command is python3 main.py --use_both True --use_vg True --batch_size 128 because batch size 256 will out of memory.

epoch 1, time: 5844.42
        train_loss: 3.32, norm: 4.2468, score: 51.21
gradual warmup lr: 0.0010
epoch 2, time: 5844.72
        train_loss: 3.05, norm: 2.5201, score: 55.44
gradual warmup lr: 0.0014
epoch 3, time: 5839.73
        train_loss: 2.90, norm: 1.7370, score: 58.02
lr: 0.0014
epoch 4, time: 5835.09
        train_loss: 2.75, norm: 1.3749, score: 60.45
lr: 0.0014
epoch 5, time: 5837.11
        train_loss: 2.64, norm: 1.2232, score: 62.33
lr: 0.0014
epoch 6, time: 5829.90
        train_loss: 2.54, norm: 1.1545, score: 63.88
lr: 0.0014
epoch 7, time: 5832.88
        train_loss: 2.46, norm: 1.1238, score: 65.32
lr: 0.0014
epoch 8, time: 5834.77
        train_loss: 2.39, norm: 1.1157, score: 66.59

The text was updated successfully, but these errors were encountered:

jnhwkim · 2018-06-22T04:50:40Z

We used 4 Titan XPs with the batch size of 256 (4000s/epoch). It seems that your 5800s is within the expected range of running time. For the (possibly) degraded performance if the batch size is altered, please refer to this link.
p.s. Our training log is here. Please refer it to compare the training curve.

tchaton · 2019-01-09T22:21:47Z

Hello guys,

I have added some code there to be rewieded.
I am trying to optimize the code using torch.einsum
However, i am not sure of everything there.
#15

Best,
T.C

jnhwkim closed this as completed Jun 22, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training too slow #2

Training too slow #2

ZhuFengdaaa commented Jun 22, 2018

jnhwkim commented Jun 22, 2018 •

edited

Loading

tchaton commented Jan 9, 2019

Training too slow #2

Training too slow #2

Comments

ZhuFengdaaa commented Jun 22, 2018

jnhwkim commented Jun 22, 2018 • edited Loading

tchaton commented Jan 9, 2019

jnhwkim commented Jun 22, 2018 •

edited

Loading