Quantization for FSDPA #967

dudilester · 2024-05-08T13:17:15Z

Added use_flash_attention, flash_attention_causal_mask and flash_attention_recompute to run_lm_eval
Enforce recompute flag on fsdpa quantization
Allow quantization using HQT

…at matches its scale method (#92)

* Done to allow quantization using HQT * Added use_flash_attention and flash_attention_recompute to run_lm_eval

dudilester · 2024-05-08T15:11:11Z

These commits are related to version 1.16.0
@libinta pleased add the synapse 1.16 dependency tag. thx.

examples/text-generation/quantization_config/act_maxabs_pow2_weights_pcs_opt_pow2_quant.json

ssarkar2

@dudilester can we close this PR? since this looks like an older version of #976

dudilester added 4 commits May 8, 2024 16:10

added text-generation quantization_config example file with a name th…

83b3605

…at matches its scale method (#92)

Encapsulate FSDPA in GaudiLlamaAttention (#129)

e231aa5

* Done to allow quantization using HQT * Added use_flash_attention and flash_attention_recompute to run_lm_eval

enforce recompute flag on fsdpa quantization (#133)

86fa5b6

add flash_attention_causal_mask to run_lm_eval.py (#142)

659b2d1

dudilester requested review from mandy-li, libinta, dvarshney-habana and regisss as code owners May 8, 2024 13:17

dudilester mentioned this pull request May 8, 2024

Encapsulate FSDPA in GaudiLlamaAttention #882

Closed

libinta added the synapse 1.16_dependency synapse 1.16 dependency label May 8, 2024

libinta reviewed May 9, 2024

View reviewed changes

examples/text-generation/quantization_config/act_maxabs_pow2_weights_pcs_opt_pow2_quant.json Show resolved Hide resolved

wszczurekhabana mentioned this pull request May 10, 2024

Fast softmax #972

Merged

dudilester mentioned this pull request May 13, 2024

Quantization for FSDPA #976

Merged

ssarkar2 self-requested a review May 16, 2024 21:51

ssarkar2 reviewed May 16, 2024

View reviewed changes

dudilester closed this May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization for FSDPA #967

Quantization for FSDPA #967

dudilester commented May 8, 2024

dudilester commented May 8, 2024

ssarkar2 left a comment

Quantization for FSDPA #967

Quantization for FSDPA #967

Conversation

dudilester commented May 8, 2024

dudilester commented May 8, 2024

ssarkar2 left a comment

Choose a reason for hiding this comment