Deal with shared memory scenarios #2136

muellerzr · 2023-11-09T17:49:42Z

What does this PR do?

We need to deal with shared memory scenarios when saving with safetensors and a model. Since the safetensor version relies on passing the entire model instead of the state dict directly, see here, to deal with this, for now a raw copy/paste of the code is done instead to remove duplicate names.

(All state_dict in torch are OrderedDict, hence the check)

Fixes # (issue)
Two failures on the transformers CI

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@SunMarc

HuggingFaceDocBuilderDev · 2023-11-09T17:54:55Z

The documentation is not available anymore as the PR was closed or merged.

SunMarc

LGTM ! I know that in the save_model in accelerate and in save_pretrained in transformers, we remove the shared tensors in a different way. I don't know how comparable it is to this method in safetensors but my guess it that it is pretty similar.

muellerzr · 2023-11-09T18:34:14Z

Good call, I’ll refactor this into a function so that it only warns, doesn’t actually raise an err :)

I think logically it should work the same, bar the bnb explicit part

SunMarc

Thx for iterating ! LGTM !

Deal with duplicates

5f20885

muellerzr requested a review from SunMarc November 9, 2023 17:49

SunMarc approved these changes Nov 9, 2023

View reviewed changes

refactor

96a7698

muellerzr requested a review from SunMarc November 10, 2023 15:05

muellerzr added 2 commits November 10, 2023 10:12

Merge branch 'main' into shared-mem

edd2236

Keep false for save

aff488b

SunMarc approved these changes Nov 10, 2023

View reviewed changes

muellerzr added 2 commits November 10, 2023 10:23

Clean

2ca0973

Better test for logs

32f7bd0

muellerzr merged commit fc0a43c into main Nov 10, 2023
26 checks passed

muellerzr deleted the shared-mem branch November 10, 2023 15:49

muellerzr mentioned this pull request Nov 20, 2023

RuntimeError "Some tensors share memory" occurred when saving checkpoints during LoRA huggingface/transformers#27613

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deal with shared memory scenarios #2136

Deal with shared memory scenarios #2136

muellerzr commented Nov 9, 2023

HuggingFaceDocBuilderDev commented Nov 9, 2023 •

edited

Loading

SunMarc left a comment

muellerzr commented Nov 9, 2023

SunMarc left a comment

Deal with shared memory scenarios #2136

Deal with shared memory scenarios #2136

Conversation

muellerzr commented Nov 9, 2023

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Nov 9, 2023 • edited Loading

SunMarc left a comment

Choose a reason for hiding this comment

muellerzr commented Nov 9, 2023

SunMarc left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Nov 9, 2023 •

edited

Loading