Benchmarks of servers to show reduction in deep learning training time
Reducing Deep Learning Training Time

In my blog (xxx), I showed how much deep learning training time could be reduced by using enterprise level servers rather than laptops or workstations. Here, I show the specs of the 2 servers (S1 and S2) and workstation (WS) used in the comparisons. I also list the actual training times each BERT model for 1 epoch.

Server Specs

S1 2x Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz, 20 cores 512 GB 8x Nvidia V100 32GB
S2 2x Intel(R) Xeon(R) Bronze 3106 CPU @ 1.70GHz, 8 cores 64 GB 1x Nvidia V100 16GB
WS 1x AMD Ryzen 7 2700X @ 3.70 GHz, 16 cores 32 GB 1x Nvidia Titan RTX

Model Training Times - Times shown for S1 and S2 are the average of 3 training iterations. Times shown for WS are from Thilina Rajapakse's Medium article (

Model Machine Training Time for 1 Epoch (min:sec)
bert-base-cased S1 19:32
S2 23:16
WS 22:17
roberta-base S1 19:40
S2 23:23
WS 29:59
distilbert-base-uncased S1 10:37
S2 12:26
WS 15:34
xlnet-base-cased S1 58:17
S2 64:57
WS 102:25
distilroberta-base S1 11:02
S2 12:47
WS 15:59
bert-base-multilingual-cased S1 20:36
S2 23:56
WS 24:38
distilbert-base-multilingual-cased S1 11:46
S2 13:08
WS 18:49
