-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update transformers #539
Update transformers #539
Conversation
This reverts commit f65728f.
…dio into max/update_transformers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks good, thx.
But let's discuss proper setting for config.
@@ -677,17 +677,18 @@ def create_nlp_backbone(cfg, model_class=AutoModel) -> Any: | |||
try: | |||
import flash_attn # noqa: F401 | |||
|
|||
use_flash_attention_2 = cfg.training.use_flash_attention_2 | |||
# see https://github.com/fxmarty/transformers/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct way to split URL for char limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it manually, happy to split it differently. Not sure if there's a commonly used convention for this.
# see https://github.com/fxmarty/transformers/ | ||
# blob/3f06a3a0aec8cc1ec3ad6bf66ebe277392c5ab37/ | ||
# src/transformers/configuration_utils.py#L380 | ||
config._attn_implementation_internal = "flash_attention_2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldnt we rather set attn_implementation
here?
This is also used there: https://github.com/huggingface/transformers/blob/v4.36.1/src/transformers/models/llama/modeling_llama.py#L756
Where exactly is _internal
used then?
https://github.com/huggingface/transformers/blob/main/src/transformers/modeling_utils.py#L1277
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_attn_implementation_internal
is an internal variable to store the implementation to use:
# Attention implementation to use, if relevant.
self._attn_implementation_internal = kwargs.pop("attn_implementation", None)
As you noticed, _attn_implementation
is used within the model classes.
_attn_implementation
itself is actually a property that cannot be set. I guess it was designed that way to ensure backwards compatibility (although it could also have been unified in the init method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thx.
It seems one can pass it via the model constructor though:
https://huggingface.co/docs/transformers/perf_infer_gpu_one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maxjeblick if you are confident this _internal setting is sufficient, let's merge
thx!!!
This PR updates transformers version to 4.36.
By this, flash attention is supported (and enabled by default) natively, see https://twitter.com/efxmarty/status/1734931075367850385