Skip to content

Intel® Extension for Transformers v1.4 Release

Compare
Choose a tag to compare
@kevinintel kevinintel released this 03 Apr 16:33
· 121 commits to main since this release
346211c

Highlights
Features
Productivity
Examples
Bug Fixing

Highlights

  • AutoRound is SOTA weight-only quantization (WOQ) algorithm for low-bit LLM inference on typical LLMs. This release includes support for AutoRound quantization and inference with INT4 models quantized by AutoRound.

Features

Productivity

  • Add bm25 algorithm into retrievers (a19467d0 )
  • Add evaluation perplexity during training (2858ed1 )
  • Enhance embedding to support jit model (588c60 )
  • Update the character checking function to enable the Chinese character (0da63fe1 )
  • Enlarge the context window for HPU graph recompile (dcaf17ac )
  • Support IPEX bf16 & fp32 optimization for emebedding model (b51552 )
  • Enable lm_eval during training. (2de883 )
  • Refine setup.py and requirements.txt (436847 )
  • Improve WOQ model saving and loading (30d9d10, 1065d81c )
  • Add layerwise for WOQ RTN & GPTQ (15a848f3 )
  • Update sparseGPT example (3ae0cd0 )
  • Changed regular expression to add support of the unicode characters (fd2516b )
  • Check and convert contiguous tensor when model saving (d21bb3e )
  • Support load model from modelscope using NeuralSpeed (20ae00 )

Examples

Bug Fixing

  • Fix CLM tasks when transformers >= 4.38.1 (98bfcf8 )
  • Fix distilgpt2 TF signature issue (a7c15a9f )
  • Add User input + max tokens requested exceeds model context window error response (ae91bf8 )
  • Fix audio plugin sample code issue and provide a way to set tts/asr model path (db7da09 )
  • Fix modeling_auto trust_remote_code issue (3a0987 )
  • Fix lm-eval neuralspeed loading model (cd6e488 )
  • Fixed weight-only config save issue (5c92fe31 )
  • Fix index error in Child-parent retriever (8797cfe )
  • Fix WOQ int8 unpack weight (edede4 )
  • Fix gptq desc_act and static_group (528d7de )
  • Fix request.client=None issue (494a571 )
  • Fix WOQ huggingface model loading (01b1a44 )
  • Fix SQ model restore loading (1e00f29 )
  • Remove redundant parameters for WOQ saving config and fix GPTQ issue (ef0882f6 )
  • Fixed exmple error for Intel GPU WOQ (8fdde06 )
  • Fix woq autoround last layer quant issue (d21bb3e )
  • Fix code-generation params (ab2fd05 )

Validated Configurations

  • Python 3.8, 3.9, 3.10, 3.11
  • Ubuntu 20.04 & Windows 10
  • Intel® Extension for TensorFlow 2.13.0, 2.14.0
  • PyTorch 2.2.0+cpu 2.1.0+cpu
  • Intel® Extension for PyTorch 2.2.0+cpu, 2.1.0+cpu

Thanks to these Contributors
Thanks for the contribution from dillonalaird, igeni, sramakintel, alexsin368 and huiyan2021
Welcome to contribute to our project and report issues to us.