Skip to content

Intel® Extension for Transformers v1.1.1 Release

Compare
Choose a tag to compare
@kevinintel kevinintel released this 06 Sep 16:37
· 1143 commits to main since this release
49336d3
  • Highlights
  • Bug Fixing & Improvements
  • Tests & Tutorials

Highlights
In this release, we improved NeuralChat, a customizable chatbot framework under Intel® Extension for Transformers. NeuralChat is now available for you to create your own chatbot within minutes on multiple architectures.

Bug Fixing & Improvements

  • Fix the code structure and the plugin in NeuralChat (commit 486e9e)
  • Fix bug in retrieval chat (commit d2cee0)
  • NeuralChat Inference return correct input len without pad to user (commit 18be4c)
  • Fix MPT not support left padding issue (commit 24ae58)
  • Fix double remove dataset columns when concatenation (commit 67ce6e)
  • Fix DeepSpeed and use cache issue (commit 4675d4)
  • Fix bugs in predict_stream (commit e1da7e)
  • Fix docker CPU issues (commit 8fa0dc)
  • Fix read HuggingFaceH4/oasst1_en dataset issue (commit 76ee68)
  • Modify Dockerfile for finetuning (commit 797aa2)
  • Fix the perf of LLaMA2 by static_shape in optimum Habana (commit 481f38)
  • Remove NeuralChat redundant code and hard codes. (commit 0e1e4d, 037ce8, 10af3c)
  • Refined NeuralChat finetuning config (commit e372cf)

Tests & Tutorials

  • Add inference test for LLaMA2 and MPT with HPU (commit 5c4f5e)
  • Add inference test for LLaMA2 and MPT with Intel CPUs (commit ad4bec, 2f6188)
  • Add finetuning test for MPT (commit 72d81e, 423242)
  • Add GHA Unit Tests (commit 49336d)
  • NeuralChat finetuning tutorial for LLaMA2 and MPT (commit d156e9)
  • NeuralChat deployment on Intel CPU/ Habana HPU/ Nvidia tutorial (commit b36711)

Validated Configurations

  • Centos 8.4 & Ubuntu 22.04
  • Python 3.9
  • PyTorch 2.0.0
  • TensorFlow 2.12.0

Acknowledgements
Thanks for the contributions from sywangyi, jiafuzha and itayariel. Thanks to all the participants to Intel Extension for Transformers.