Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantization for FSDPA #967

Closed
wants to merge 4 commits into from

Conversation

dudilester
Copy link
Contributor

Added use_flash_attention, flash_attention_causal_mask and flash_attention_recompute to run_lm_eval
Enforce recompute flag on fsdpa quantization
Allow quantization using HQT

@dudilester
Copy link
Contributor Author

These commits are related to version 1.16.0
@libinta pleased add the synapse 1.16 dependency tag. thx.

@libinta libinta added the synapse 1.16_dependency synapse 1.16 dependency label May 8, 2024
@wszczurekhabana wszczurekhabana mentioned this pull request May 10, 2024
@ssarkar2 ssarkar2 self-requested a review May 16, 2024 21:51
Copy link
Collaborator

@ssarkar2 ssarkar2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dudilester can we close this PR? since this looks like an older version of #976

@dudilester dudilester closed this May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
synapse 1.16_dependency synapse 1.16 dependency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants