Release Intel® Extension for Transformers v1.4 Release · intel/intel-extension-for-transformers

Highlights
Features
Productivity
Examples
Bug Fixing

Highlights

AutoRound is SOTA weight-only quantization (WOQ) algorithm for low-bit LLM inference on typical LLMs. This release includes support for AutoRound quantization and inference with INT4 models quantized by AutoRound.

Features

Productivity

Add bm25 algorithm into retrievers (a19467d0 )
Add evaluation perplexity during training (2858ed1 )
Enhance embedding to support jit model (588c60 )
Update the character checking function to enable the Chinese character (0da63fe1 )
Enlarge the context window for HPU graph recompile (dcaf17ac )
Support IPEX bf16 & fp32 optimization for emebedding model (b51552 )
Enable lm_eval during training. (2de883 )
Refine setup.py and requirements.txt (436847 )
Improve WOQ model saving and loading (30d9d10, 1065d81c )
Add layerwise for WOQ RTN & GPTQ (15a848f3 )
Update sparseGPT example (3ae0cd0 )
Changed regular expression to add support of the unicode characters (fd2516b )
Check and convert contiguous tensor when model saving (d21bb3e )
Support load model from modelscope using NeuralSpeed (20ae00 )

Examples

Support microsoft/biogpt model (3e7e35 )
Add finetuning example for gemma-2b on ARC. (ffa8f3c6 )
Add example to use RAG+OpenAI LLM (3c5959 )
Enable mistralai /Mixtral-8x7B-v0.1 LORA finetuning on Gaudi2 (7539c35 )
Enable image2text finetuning example on CPU (ef94aeaa )
Add LLaVA-NeXT (feff1ec0 )

Bug Fixing

Fix CLM tasks when transformers >= 4.38.1 (98bfcf8 )
Fix distilgpt2 TF signature issue (a7c15a9f )
Add User input + max tokens requested exceeds model context window error response (ae91bf8 )
Fix audio plugin sample code issue and provide a way to set tts/asr model path (db7da09 )
Fix modeling_auto trust_remote_code issue (3a0987 )
Fix lm-eval neuralspeed loading model (cd6e488 )
Fixed weight-only config save issue (5c92fe31 )
Fix index error in Child-parent retriever (8797cfe )
Fix WOQ int8 unpack weight (edede4 )
Fix gptq desc_act and static_group (528d7de )
Fix request.client=None issue (494a571 )
Fix WOQ huggingface model loading (01b1a44 )
Fix SQ model restore loading (1e00f29 )
Remove redundant parameters for WOQ saving config and fix GPTQ issue (ef0882f6 )
Fixed exmple error for Intel GPU WOQ (8fdde06 )
Fix woq autoround last layer quant issue (d21bb3e )
Fix code-generation params (ab2fd05 )

Validated Configurations

Thanks to these Contributors
Thanks for the contribution from dillonalaird, igeni, sramakintel, alexsin368 and huiyan2021
Welcome to contribute to our project and report issues to us.

Provide feedback