v1.16.0: Transformers 4.36 compatibility, extended ONNX support, Mixtral GPTQ

fxmarty released this 13 Dec 18:23

· 108 commits to main since this release

Transformers 4.36 compatiblity

Notably, the ONNX exports aten::scaled_dot_product_attention in a standardized way for the compatible models.

Compatibility with Transformers 4.36 by @fxmarty in #1590

Extended ONNX support: timm, sentence-transformers, Phi, ESM

Add ONNX export for phi models by @xenova in #1579
Add ESM onnx support by @xenova in #1581
Add timm models export by @mht-sharma in #1587
Proper sentence-transformers ONNX export support by @fxmarty in #1589

GPTQ for Mixtral

Work in progress.

add modules_in_block_to_quantize arg for gptq by @SunMarc in #1585

What's Changed

Update version to 1.16.0.dev0 by @fxmarty in #1571
Use doc links in the README for subpackages by @fxmarty in #1572
Fix GPTQ compatibility with AutoGPTQ by @fxmarty in #1574
Refactoring EC2 CIs by @JingyaHuang in #1575
Remove inputs from sentence-transformers ONNX output by @fxmarty in #1593
Gptq tokenized dataset by @SunMarc in #1584
Run timm ONNX CI only once per day by @fxmarty in #1594
Run timm ONNX CI nightly v2 by @fxmarty in #1595

Full Changelog: v1.15.0...v1.16.0

Contributors

fxmarty, mht-sharma, and 3 other contributors

Assets 2