Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update transformers #539

Merged
merged 13 commits into from
Dec 19, 2023
Merged

Update transformers #539

merged 13 commits into from
Dec 19, 2023

Conversation

maxjeblick
Copy link
Contributor

This PR updates transformers version to 4.36.

By this, flash attention is supported (and enabled by default) natively, see https://twitter.com/efxmarty/status/1734931075367850385

Copy link
Collaborator

@psinger psinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good, thx.

But let's discuss proper setting for config.

@@ -677,17 +677,18 @@ def create_nlp_backbone(cfg, model_class=AutoModel) -> Any:
try:
import flash_attn # noqa: F401

use_flash_attention_2 = cfg.training.use_flash_attention_2
# see https://github.com/fxmarty/transformers/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this correct way to split URL for char limit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it manually, happy to split it differently. Not sure if there's a commonly used convention for this.

# see https://github.com/fxmarty/transformers/
# blob/3f06a3a0aec8cc1ec3ad6bf66ebe277392c5ab37/
# src/transformers/configuration_utils.py#L380
config._attn_implementation_internal = "flash_attention_2"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_attn_implementation_internal is an internal variable to store the implementation to use:

        # Attention implementation to use, if relevant.
        self._attn_implementation_internal = kwargs.pop("attn_implementation", None)

As you noticed, _attn_implementation is used within the model classes.
_attn_implementation itself is actually a property that cannot be set. I guess it was designed that way to ensure backwards compatibility (although it could also have been unified in the init method).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thx.
It seems one can pass it via the model constructor though:
https://huggingface.co/docs/transformers/perf_infer_gpu_one

Copy link
Collaborator

@psinger psinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxjeblick if you are confident this _internal setting is sufficient, let's merge
thx!!!

@maxjeblick maxjeblick merged commit 2b0b0c1 into main Dec 19, 2023
5 checks passed
@maxjeblick maxjeblick deleted the max/update_transformers branch December 19, 2023 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants