You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use FusedMovingAvgObsFakeQuantize instead of FakeQuantize for faster QAT (#14740)
Summary:
FusedMovingAvgObsFakeQuantize speeds up by fusing FakeQuantize and MovingAverageMinMaxObserver into one CUDA op. Using it should give good speedups. This change updates the QAT qconfigs to accordingly.
Tested on llama model on HTP and got ~4x QAT speedup.
Reviewed By: billmguo
Differential Revision: D83583655
0 commit comments