handle torch_dtype in low cpu mem usage #16580

patil-suraj · 2022-04-04T10:58:37Z

What does this PR do?

The torch_dtype argument in from_pretrained is not respected when loading the model with low_cpu_mem_usage=True. This is because _load_pretrained_model_low_mem creates and assigns new Paramter rather than loading directly in model state_dict, so the torch_dtype is ignored. as can be seen here.

transformers/src/transformers/modeling_utils.py

Line 2168 in 013a7db

new_val = state_dict[k]

This PR casts each tensor in the state_dict by retrieving the data_type from the model's meta parameters.

Should be merged after #16548

HuggingFaceDocBuilderDev · 2022-04-04T11:11:27Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

LGTM but make sure to have @stas00 review before merging as he knows this code better than I do!

stas00 · 2022-04-04T15:07:07Z

The first chunk is definitely the right fix, unrelated to dtype

The second chunk is different. what you want is to move this code:

transformers/src/transformers/modeling_utils.py

Lines 1846 to 1849 in be9474b

    
           if from_pt: 
        
               # restore default dtype 
        
               if dtype_orig is not None: 
        
                   torch.set_default_dtype(dtype_orig)

to before this code:

transformers/src/transformers/modeling_utils.py

Lines 1893 to 1894 in be9474b

    
           # make sure token embedding weights are still tied if needed 
        
           model.tie_weights()

which would restore the scope of non-default dtype. can merge the 2 if from_pt.

Once this is done pytorch will automatically operate with torch_dtype for tensor allocation.

Please let me know if what I suggested is easy to follow. The first snipped I quoted is the closing of the special scope.

And the obvious question - should we need to add a new test?

patil-suraj · 2022-04-04T16:49:40Z

@stas00 I already tried the approach that you suggested. But it didn't work. It seems torch.set_default_dtype doesn't affect torch.load. So all params in loaded state_dict will have the dtype with which they were saved, irrespective of torch.set_default_dtype.

The first chunk is definitely the right fix, unrelated to dtype

This is actually fixed in #16548, I added it in this PR just to be able to test the changes.

And the obvious question - should we need to add a new test?

We should, IMO.

stas00 · 2022-04-05T00:44:33Z

I run some experiments and you're correct, Suraj, on all accounts.

It's a bummer since someone who is low on CPU memory will have enough to be able to start the model training in lower precision but then not be able to finetune from the single precision checkpoint from the hub, targeting the same lower precision, since torch.load will force a full fp32 model copy, requiring 3x memory of the half-precision model normally and 2x with the _load_pretrained_model_low_mem hack.

I'm pretty sure we will have to abandon torch.load altogether as the models get bigger unless it becomes more flexible. There is no reason whatsoever to load all of the model's weights into memory at once. And more so when different GPUs should load different chunks of the model. It should be possible to load them in segments. But, of course, this is not the right forum to discuss that.

I made a feature request: pytorch/pytorch#75242

stas00

Thank you for fixing this, Suraj.

Perhaps let's make a call to community to add tests for all these new features and combinations of those where they are lacking? i.e. First Good Issue?

* 📝 add image/vision classification and asr * 🖍 minor formatting fixes * Fixed a typo in legacy seq2seq_trainer.py (#16531) * Add ONNX export for BeiT (#16498) * Add beit onnx conversion support * Updated docs * Added cross reference to ViT ONNX config * call on_train_end when trial is pruned (#16536) * Type hints added (#16529) * Fix Bart type hints (#16297) * Add type hints to PLBart PyTorch * Remove pending merge conflicts * Fix PLBart Type Hints * Add changes from review * Add VisualBert type hints (#16544) * Adding missing type hints for mBART model (PyTorch) (#16429) * added type hints for mbart tensorflow tf implementation * Adding missing type hints for mBART model Tensorflow Implementation model added with missing type hints * Missing Type hints - correction For TF model * Code fixup using make quality tests * Hint types - typo error * make fix-copies and make fixup * type hints * updated files * type hints update * making dependent modesls coherent Co-authored-by: matt <rocketknight1@gmail.com> * Remove MBart subclass of XLMRoberta in tokenzier docs (#16546) * Remove MBart subclass of XLMRoberta in tokenzier * Fix style * Copy docs from MBart50 tokenizer * Use random_attention_mask for TF tests (#16517) * use random_attention_mask for TF tests * Fix for TFCLIP test (for now). Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Improve code example (#16450) Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home> * Pin tokenizers version <0.13 (#16539) * Pin tokenizers version <0.13 * Style * Add code samples for TF speech models (#16494) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * [FlaxSpeechEncoderDecoder] Fix dtype bug (#16581) * [FlaxSpeechEncoderDecoder] Fix dtype bug * more fixes * Making the impossible to connect error actually report the right URL. (#16446) * Fix flax import in __init__.py: modeling_xglm -> modeling_flax_xglm (#16556) * Add utility to find model labels (#16526) * Add utility to find model labels * Use it in the Trainer * Update src/transformers/utils/generic.py Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Quality Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Enable doc in Spanish (#16518) * Reorganize doc for multilingual support * Fix style * Style * Toc trees * Adapt templates * Add use_auth to load_datasets for private datasets to PT and TF examples (#16521) * fix formatting and remove use_auth * Add use_auth_token to Flax examples * add a test checking the format of `convert_tokens_to_string`'s output (#16540) * add new tests * add comment to overridden tests * TF: Finalize `unpack_inputs`-related changes (#16499) * Add unpack_inputs to remaining models * removed kwargs to `call()` in TF models * fix TF T5 tests * [SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output (#16586) * initialize the default rank set on TrainerState (#16530) * initialize the default rank set on TrainerState * fix style * Trigger doc build * Fix CI: test_inference_for_pretraining in ViTMAEModelTest (#16591) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * add a template to add missing tokenization test (#16553) * add a template to add missing tokenization test * add cookiecutter setting * improve doc * Update templates/adding_a_missing_tokenization_test/README.md Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * made _load_pretrained_model_low_mem static + bug fix (#16548) * handle torch_dtype in low cpu mem usage (#16580) * [Doctests] Correct filenaming (#16599) * [Doctests] Correct filenaming * improve quicktour * make style * Adding new train_step logic to make things less confusing for users (#15994) * Adding new train_step logic to make things less confusing for users * DO NOT ASK WHY WE NEED THAT SUBCLASS * Metrics now working, at least for single-output models with type annotations! * Updates and TODOs for the new train_step * Make fixup * Temporary test workaround until T5 has types * Temporary test workaround until T5 has types * I think this actually works! Needs a lot of tests though * MAke style/quality * Revert changes to T5 tests * Deleting the aforementioned unmentionable subclass * Deleting the aforementioned unmentionable subclass * Adding a Keras API test * Style fixes * Removing unneeded TODO and comments * Update test_step too * Stop trying to compute metrics with the dummy_loss, patch up test * Make style * make fixup * Docstring cleanup * make fixup * make fixup * Stop expanding 1D input tensors when using dummy loss * Adjust T5 test given the new compile() * make fixup * Skipping test for convnext * Removing old T5-specific Keras test now that we have a common one * make fixup * make fixup * Only skip convnext test on CPU * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/modeling_tf_utils.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Avoiding TF import issues * make fixup * Update compile() to support TF 2.3 * Skipping model.fit() on template classes for now * Skipping model.fit() on template class tests for now * Replace ad-hoc solution with find_labels * make fixup Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Adding missing type hints for BigBird model (#16555) * added type hints for mbart tensorflow tf implementation * Adding missing type hints for mBART model Tensorflow Implementation model added with missing type hints * Missing Type hints - correction For TF model * Code fixup using make quality tests * Hint types - typo error * make fix-copies and make fixup * type hints * updated files * type hints update * making dependent modesls coherent * Type hints for BigBird * removing typos Co-authored-by: matt <rocketknight1@gmail.com> * [deepspeed] fix typo, adjust config name (#16597) * 🖍 apply feedback Co-authored-by: Cathy <815244047@qq.com> Co-authored-by: Jim Rohrer <jrohrer1@gmail.com> Co-authored-by: Ferdinand Schlatt <fschlatt@gmail.com> Co-authored-by: Dahlbomii <101373053+Dahlbomii@users.noreply.github.com> Co-authored-by: Gunjan Chhablani <chhablani.gunjan@gmail.com> Co-authored-by: Rishav Chandra Varma <rishavchandra.v16@iiits.in> Co-authored-by: matt <rocketknight1@gmail.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Niels Rogge <nielsrogge@nielss-mbp.home> Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Daniel Stancl <46073029+stancld@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Karim Foda <35491698+KMFODA@users.noreply.github.com> Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com> Co-authored-by: Joao Gante <joao@huggingface.co> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: Andres Codas <andrescodas@users.noreply.github.com> Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com> Co-authored-by: Francesco Saverio Zuppichini <francesco.zuppichini@gmail.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

patil-suraj requested review from sgugger and stas00 April 4, 2022 12:38

sgugger approved these changes Apr 4, 2022

View reviewed changes

stas00 approved these changes Apr 5, 2022

View reviewed changes

handle torch_dtype in low cpu mem usage

5fd6c8a

patil-suraj force-pushed the fix-dtype-for-low-mem-usage branch from ab4d0b9 to 5fd6c8a Compare April 5, 2022 09:59

patil-suraj merged commit 21decb7 into huggingface:main Apr 5, 2022

patil-suraj deleted the fix-dtype-for-low-mem-usage branch April 5, 2022 10:26

stevhliu pushed a commit to stevhliu/transformers that referenced this pull request Apr 5, 2022

handle torch_dtype in low cpu mem usage (huggingface#16580)

8336282

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

handle torch_dtype in low cpu mem usage #16580

handle torch_dtype in low cpu mem usage #16580

patil-suraj commented Apr 4, 2022 •

edited

HuggingFaceDocBuilderDev commented Apr 4, 2022 •

edited

sgugger left a comment

stas00 commented Apr 4, 2022 •

edited

patil-suraj commented Apr 4, 2022

stas00 commented Apr 5, 2022 •

edited

stas00 left a comment •

edited

handle torch_dtype in low cpu mem usage #16580

handle torch_dtype in low cpu mem usage #16580

Conversation

patil-suraj commented Apr 4, 2022 • edited

What does this PR do?

HuggingFaceDocBuilderDev commented Apr 4, 2022 • edited

sgugger left a comment

Choose a reason for hiding this comment

stas00 commented Apr 4, 2022 • edited

patil-suraj commented Apr 4, 2022

stas00 commented Apr 5, 2022 • edited

stas00 left a comment • edited

Choose a reason for hiding this comment

patil-suraj commented Apr 4, 2022 •

edited

HuggingFaceDocBuilderDev commented Apr 4, 2022 •

edited

stas00 commented Apr 4, 2022 •

edited

stas00 commented Apr 5, 2022 •

edited

stas00 left a comment •

edited