[DeepSpeed] restore memory for evaluation #10114

stas00 · 2021-02-10T05:43:36Z

I spent some time trying to see if we could gain from DeepSpeed during inference - and while in the future there will be goodies to make it useful at the moment we don't need it, so let's make DeepSpeed cleanly contained to train for now.

This PR has a few small tweaks:

frees up all the memory used by DeepSpeed at the end of training
makes a clean way of not switching model.to() - only for when --do_train is used with deepspeed (so this is the case where you @sgugger were concerned about eval before train - no problem now)
adds a warning if a user tries to use --deepspeed without --do_train
re-works the test suite
applies consistent json config formatting

@sgugger, @LysandreJik

sgugger

Looks good to me, apart from the auto-clean that is a bit too magic to my taste. Thanks!

sgugger · 2021-02-10T16:17:12Z

src/transformers/trainer.py

+            self.deepspeed = None
+            self.optimizer = None
+            self.lr_scheduler = None
+            self.model_wrapped = None


I'm not super fond of having this done automatically. It should be in a free_memory method of the Trainer (or a name you like better) that is explicitly called by the user between training and evaluation IMO.
This is also useful beyond deepspeed.

These are 2 different things.

in the case of DeepSpeed this is a clean cut. We explicitly init all these at the beginning of train:

transformers/src/transformers/trainer.py

Lines 765 to 771 in 22a32cf

if self.args.deepspeed:

model, optimizer, lr_scheduler = init_deepspeed(self, num_training_steps=max_steps)

self.model = model.module

self.model_wrapped = model # will get further wrapped in DDP

self.deepspeed = model # DeepSpeedEngine object

self.optimizer = optimizer

self.lr_scheduler = lr_scheduler

so this PR explicitly cleans these up at the end of train - this is completely opaque. A user had no way to init those explicitly and thus has no need to do anything special.

wrt generic case, the story is different because the user may supply her own optimizer/lr_scheduler and in such case, yes, they need to have control over whether to clean up or not.

As you pointed out this would be useful to the user, but it's a different situation, so let's solve it separately?

though I do think I need to fix this:

self.model_wrapped = None

to restore this to self.model

Ah yes, it's true that in this case they are instantiated inside the train method so this makes sense.

stas00 added 3 commits February 9, 2021 21:06

free up memory at the end of train

89ea8eb

rework tests

726e8f2

consistent formatting

3546056

stas00 added the DeepSpeed label Feb 10, 2021

sgugger approved these changes Feb 10, 2021

View reviewed changes

correction

00ec779

stas00 merged commit 77b8628 into huggingface:master Feb 10, 2021

stas00 deleted the ds-inference branch February 10, 2021 17:09

stas00 mentioned this pull request Feb 12, 2021

Issue using num_beams parameter for T5 / DeepSpeed #10149

Closed

PeterAJansen mentioned this pull request Feb 13, 2021

Seq2seq now has larger memory requirements, OOM w/Deepspeed on previously runnable models #10161

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DeepSpeed] restore memory for evaluation #10114

[DeepSpeed] restore memory for evaluation #10114

stas00 commented Feb 10, 2021 •

edited

sgugger left a comment

sgugger Feb 10, 2021

stas00 Feb 10, 2021 •

edited

stas00 Feb 10, 2021

sgugger Feb 10, 2021

	if self.args.deepspeed:
	model, optimizer, lr_scheduler = init_deepspeed(self, num_training_steps=max_steps)
	self.model = model.module
	self.model_wrapped = model # will get further wrapped in DDP
	self.deepspeed = model # DeepSpeedEngine object
	self.optimizer = optimizer
	self.lr_scheduler = lr_scheduler

[DeepSpeed] restore memory for evaluation #10114

[DeepSpeed] restore memory for evaluation #10114

Conversation

stas00 commented Feb 10, 2021 • edited

sgugger left a comment

Choose a reason for hiding this comment

sgugger Feb 10, 2021

Choose a reason for hiding this comment

stas00 Feb 10, 2021 • edited

Choose a reason for hiding this comment

stas00 Feb 10, 2021

Choose a reason for hiding this comment

sgugger Feb 10, 2021

Choose a reason for hiding this comment

stas00 commented Feb 10, 2021 •

edited

stas00 Feb 10, 2021 •

edited