Skip to content

Improve information about group offloading and layerwise casting#11101

Merged
a-r-r-o-w merged 6 commits into
mainfrom
improve-info-layerwise-and-group
Mar 24, 2025
Merged

Improve information about group offloading and layerwise casting#11101
a-r-r-o-w merged 6 commits into
mainfrom
improve-info-layerwise-and-group

Conversation

@a-r-r-o-w
Copy link
Copy Markdown
Contributor

No description provided.

@a-r-r-o-w a-r-r-o-w requested a review from DN6 March 18, 2025 06:42
Comment thread docs/source/en/optimization/memory.md Outdated
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread docs/source/en/optimization/memory.md Outdated
Comment thread docs/source/en/optimization/memory.md Outdated
Comment thread docs/source/en/optimization/memory.md Outdated

<Tip>

- Layerwise casting may not work with all models out-of-the-box. Sometimes, the forward implementations of the model contain weight-dependent typecasting of inputs. Such implementations are not supported due to the currently simplistic implementation of layerwise casting, which assumes that the forward pass is independent of the weight precision and that the input dtypes are always in `compute_dtype`. An example of an incompatible implementation can be found [here](https://github.com/huggingface/transformers/blob/7f5077e53682ca855afc826162b204ebf809f1f9/src/transformers/models/t5/modeling_t5.py#L294-L299).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also mention that it can be disabled on modules with the skip patterns.


<Tip>

- Group offloading may not work with all models out-of-the-box. If the forward implementations of the model contain weight-dependent device-casting of inputs, it may clash with the offloading mechanism's handling of device-casting.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also mention that it can be disabled on modules with the skip patterns.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't support skipping in group offloading. Will mention for layerwise casting though

a-r-r-o-w and others added 2 commits March 18, 2025 14:33
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
@a-r-r-o-w a-r-r-o-w requested a review from DN6 March 18, 2025 09:11
@a-r-r-o-w
Copy link
Copy Markdown
Contributor Author

Failing test looks unrelated

@a-r-r-o-w a-r-r-o-w merged commit 1ddf3f3 into main Mar 24, 2025
@a-r-r-o-w a-r-r-o-w deleted the improve-info-layerwise-and-group branch March 24, 2025 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants