Version 2.32.0
- Bug fixes and Improvements
-
ONNX
- Add C++ support for bfloat16 quantization (ca7d3e0)
- Fix large model support with protobuf 7.x (9ef2251)
- Skip QDQ pair scale/zp in duplicate_shared_initializers (05e8332)
- Handle Identity passthrough in duplicate_shared_initializers (1b27d98)
- Fix SpinQuant embed_tokens filter to exclude non-embedding Gathers (81b8041)
- Inline fused supergroups after encoding propagation (68fdcb6)
-
Torch
-