Reducing Deep Learning Training Time

In my blog (xxx), I showed how much deep learning training time could be reduced by using enterprise level servers rather than laptops or workstations. Here, I show the specs of the 2 servers (S1 and S2) and workstation (WS) used in the comparisons. I also list the actual training times each BERT model for 1 epoch.

Server Specs

Machine	CPU	RAM	GPU
S1	2x Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz, 20 cores	512 GB	8x Nvidia V100 32GB
S2	2x Intel(R) Xeon(R) Bronze 3106 CPU @ 1.70GHz, 8 cores	64 GB	1x Nvidia V100 16GB
WS	1x AMD Ryzen 7 2700X @ 3.70 GHz, 16 cores	32 GB	1x Nvidia Titan RTX

Model Training Times - Times shown for S1 and S2 are the average of 3 training iterations. Times shown for WS are from Thilina Rajapakse's Medium article (https://towardsdatascience.com/to-distil-or-not-to-distil-bert-roberta-and-xlnet-c777ad92f8)

Model	Machine	Training Time for 1 Epoch (min:sec)
bert-base-cased	S1	19:32
	S2	23:16
	WS	22:17
roberta-base	S1	19:40
	S2	23:23
	WS	29:59
distilbert-base-uncased	S1	10:37
	S2	12:26
	WS	15:34
xlnet-base-cased	S1	58:17
	S2	64:57
	WS	102:25
distilroberta-base	S1	11:02
	S2	12:47
	WS	15:59
bert-base-multilingual-cased	S1	20:36
	S2	23:56
	WS	24:38
distilbert-base-multilingual-cased	S1	11:46
	S2	13:08
	WS	18:49

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reducing Deep Learning Training Time

About

Releases

Packages

pacejohn/Reducing_Deep_Learning_Training_Time

Folders and files

Latest commit

History

Repository files navigation

Reducing Deep Learning Training Time

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages