Skip to content

Releases: intel/intel-extension-for-tensorflow

Intel® Extension for TensorFlow* 2.15.0.0

28 Mar 08:26
Compare
Choose a tag to compare

Major Features and Improvements

Intel® Extension for TensorFlow* extends the official TensorFlow capabilities, allowing TensorFlow workloads to run on Intel® Data Center GPU Max Series, Intel® Data Center GPU Flex Series, and Intel® Xeon® Scalable Processors. This release includes the following major features and improvements:

  • Updated Support: The Intel® Extension for TensorFlow* has been upgraded to support TensorFlow 2.15, the version released by Google and required for this release.

  • Toolkit Support: Supports Intel® oneAPI Base Toolkit 2024.1.

  • NextPluggableDevice integration: Integrates NextPluggableDevice (an advanced generation of the PluggableDevice mechanism) as a new device type to enable seamless integration of new accelerator plugin. For more details, see the NextPluggableDevice Overview.

  • Experimental support: Provides experimental support for Intel GPU backend for OpenXLA, enabling OpenXLA GPU backend in Intel® Extension for TensorFlow* via PJRT plugin. For more details, see the OpenXLA.

  • Compiler enablement: Enables Clang compiler to build Intel® Extension for TensorFlow* CPU wheels starting with this release. The currently supported version is LLVM/clang 17. The official Wheels, published on PyPI, will be based on Clang; however, users can choose to build wheels using the GCC compiler by following the steps in the Configure For CPU guide.

  • Performance optimization: Enables weight pre-pack support for Intel® Extension for TensorFlow* CPU to provide better performance and reduce memory footprint of _ITEXMatMul and _ITEXFusedMatMul. For more details, see the Weight Pre-Pack.

  • Package redefinition: Re-defines XPU package to support GPU backend only starting with this release. The official XPU wheels published on PyPI will support only the GPU backend, and the GPU wheels will be deprecated.

  • New Operations: Supports new OPs to cover the majority of TensorFlow 2.15 OPs.

  • Expreimental Support: Continues to provide experimental support for Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.

Known Issues

  • TensorList limitation: TensorList is not supported with NextPluggableDevice by TensorFlow 2.15.
  • Allocation limitation of WSL: A maximum size of single allocation allowed on a single device is set on the Windows Subsystem for Linux (WSL2), which may cause Out-of-Memory error. Users can remove the limitation with environment variable UR_L0_ENABLE_RELAXED_ALLOCATION_LIMITS=1
  • FP64 support: FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload with the FP64 kernel on that platform, the workload will exit with an exception as 'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.
  • GLIBC++ mismatch: A GLIBC++ version mismatch may cause a workload exit with the exception, Can not find any devices. To check runtime environment on your host, please run itex/tools/python/env_check.py. Try running env_check.py script to confirm.

Other Information

  • Performance Data: Provides a Performance Data document to demonstrate the training and inference performance as well as accuracy results on several popular AI workloads with Intel® Extension for TensorFlow* benchmarked on Intel GPUs.

Documentations

Intel® Extension for TensorFlow* 2.14.0.1

24 Nov 02:03
Compare
Choose a tag to compare

Major Features and Improvements

Intel® Extension for TensorFlow* extends official TensorFlow capabilities to run TensorFlow workloads on Intel® Data Center GPU Max Series, Intel® Data Center GPU Flex Series, and Intel® Xeon® Scalable Processors. This release contains the following major features and improvement:

  • The Intel® Extension for TensorFlow* supported TensorFlow version is successfully upgraded to Google released TensorFlow 2.14, which is the required TensorFlow version for this release.

  • Supports Intel® oneAPI Base Toolkit 2024.0.

  • Provides experimental support for selecting CPU thread pools using either OpenMP thread pool (default) or Eigen thread pool. You can select the more efficient thread pool based on the workload and hardware configuration. Refer to Selecting Thread Pool in Intel® Extension for TensorFlow* CPU for more details.

  • Enables FP8 functionality support for Transformer-like training models. Refer to FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU for more details.

  • Provides experimental support for quantization front-end python API, based on Intel® Neural Compressor.

  • Adds OPs performance optimizations:

    • Optimizes GroupNorm/Unique operators.
    • Optimizes Einsum/ScaledDotProductAttention with XeTLA enabled.
  • Supports new OPs to cover the majority of TensorFlow 2.14 OPs.

  • Continues to provide experimental support for Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.

  • Moves the experimental support for Intel GPU backend for OpenXLA from the Intel® Extension for TensorFlow repository to the Intel® Extension for OpenXLA* repository. Refer to Intel® Extension for OpenXLA* for more details.

Known Issues

  • FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload with the FP64 kernel on that platform, the workload will exit with an exception as 'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.
  • A GLIBC++ version mismatch may cause a workload exit with the exception, Can not find any devices. To check runtime environment on your host, please run itex/tools/env_check.sh. Try running env_check.sh script to confirm.

Documents

Intel® Extension for TensorFlow* 2.13.0.0

28 Jul 05:44
Compare
Choose a tag to compare

Major Features and Improvements

Intel® Extension for TensorFlow* extended official TensorFlow capability to run TensorFlow workloads on Intel® Data Center Max GPU, Intel® Data Center GPU Flex Series, Intel® Xeon® Scalable Processors. This release contains following major features and improvement:

  • Intel® Extension for TensorFlow* supported TensorFlow version was successfully upgraded to Google latest released TensorFlow2.13, which is the unique supported TensorFlow version in this release.
  • Refined Intel® Extension for TensorFlow* version to four digits version format v2.13.0.0 based on the three digits from stock TensorFlow v2.13.0 with the last digit incrementing per extension release. This will make it easier for users to understand Intel® Extension for TensorFlow* and stock TensorFlow version mapping relationship.
  • Unified one XPU package to support both CPU and GPU backend and provided flexibility for users on different CPU or GPU hardware platforms.
  • Supported TensorFlow Serving running above Intel® Extension for TensorFlow* to provide serving service in a production environment. Learn more in the TensorFlow Serving Installation Guide.
  • Enabled INT8 quantization by oneDNN Graph API as default solution on GPU in Intel® Extension for TensorFlow* to provide better INT8 user experience together with Intel® Neural Compressor >= 2.2.
  • Add OPs performance optimization
    • Enabled SYCL native BFloat16 data type support.
    • SpaceToBatchND/BatchToSpaceND 1.1x ~ 1.8x improvement compare with last release.
    • SelectOP 1.3x ~ 1.7x improvement compare with last release.
    • LstmEltwiseKernel 1.28x ~ 1.7x improvement compare with last release.
    • BucketizeOp 4x improvement compare with last release.
  • Supported new Ops to cover majority of TensorFlow 2.13.0 Ops.
  • Dynamic loading Intel® Advanced Vector Extensions AVX2 and AVX512 Instructions by adapting to user's hardware to maximize CPU performance.
  • Supported FP16 data type with AMX simulation on 4th Gen Intel® Xeon® Scalable processors (code name Sapphire Rapids).
  • This release started to provide product support for second generation Intel® Xeon® Scalable Processors and newer (such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids).
  • This release continued to provide experimental support for Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.

Known Issues

  • FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload with FP64 kernel on that platform, the workload will exit with exception as 'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.
  • GLIBC++ version mismatch may cause workload exit with exception Can not found any devices. To check runtime environment on your host, please run itex/tools/env_check.sh. Please try env_check.sh to confirm.

Documents

Intel® Extension for TensorFlow* 1.2.0

27 Apr 02:15
Compare
Choose a tag to compare

Major Features and Improvements

Intel® Extension for TensorFlow* extended official TensorFlow capability to run TensorFlow workload on Intel® Data Center Max GPU and Intel® Data Center GPU Flex Series. This release contains following major features and improvements:

  • The TensorFlow version supported by Intel® Extension for TensorFlow* v1.2.0 was successfully upgraded to Google latest released TensorFlow 2.12. Due to TensorFlow 2.12 break change in protobuf, Intel® Extension for TensorFlow* can only seamlessly binary co-work with TensorFlow 2.12 in this release.

  • Adopted a uniform Device API PJRT as the supported device plugin mechanism to implement Intel GPU backend for OpenXLA experimental support. Users can build Intel® Extension for TensorFlow* source and run JAX front end APIs with OpenXLA. Refer to OpenXLA Support GPU for more details.

  • Updated oneDNN version to v3.1 which includes multiple functional and performance improvements for CPU and GPU implementations.

  • Supported generative AI model Stable diffusion and optimized model to get better performance. Get started in Stable Diffusion Inference for Text2Image on Intel GPU.

  • Supported XPUAutoShard in Intel® Extension for TensorFlow* as an experimental feature. Given a set of homogeneous XPU devices (eg. 2 GPU tiles), XPUAutoShard automatically shards input data and TensorFlow graph by placing these data/graph shard on different GPU devices to maximize hardware usage. Refer to XPUAutoShard on GPU for more details.

  • Provided Python APIitex.experimental_ops_override() to automatically override some TensorFlow operators by Customized Operators under itex.ops namespace, as well as to be compatible with existing trained parameters. More in usage details.

  • Added operators performance optimization

    • Optimized ResizeNearestNeighborGrad/All/Any/Slice/SpaceToBatchND/BatchToSpaceND/BiasAddGrad operators.
    • Optimized math function(eg. tanh, rsqrt) with small shape (eg. size=8192) on Intel® Data Center GPU Flex Series by vectorization optimization.
    • Optimized reduction series ops by improving threads and memory utility for Col/Row reduction separately.
  • Supported AOT(Ahead-of-time compilation) on Intel® Data Center Max GPU, Intel® Data Center GPU Flex Series and Intel® Arc™ A-Series GPUs in Intel® Extension for TensorFlow* package in PyPI channel. You can also specify hardware platform type when configure your system in source code build.

  • This release continued to provide experimental support for second generation Intel® Xeon® Scalable Processors and newer (such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids) and Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.

Bug Fixes and Other Changes

  • Upgraded pybind11 version to support Python 3.11 source build.
  • Initialized environment variables for Intel® oneAPI Base Toolkit in docker container by default.

Known Issues

  • FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload with FP64 kernel on that platform, the workload will exit with exception as 'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.
  • Tensorboard cannot co-work with stock TensorFlow 2.12 due to two issues of tensorflow/tensorflow#60262 and tensorflow/profiler#602.
  • GLIBC++ version mismatch may cause workload exit with exception Can not found any devices. To check runtime environment on your host, please run itex/tools/env_check.sh. Please try env_check.sh for assistance.

Documents

Intel® Extension for TensorFlow* 1.1.0

06 Jan 09:13
Compare
Choose a tag to compare

Major Features and Improvements

Intel® Extension for TensorFlow* has already extended official TensorFlow capability to run TensorFlow workload on Intel® Data Center Max GPU Series and Intel® Data Center GPU Flex Series. This release contains following major features and improvement:

  • Intel® Extension for TensorFlow* supported TensorFlow version was successfully upgraded to Google latest released TensorFlow 2.11. So in this release Intel® Extension for TensorFlow* can seamlessly binary co-work with TensorFlow 2.11 and TensorFlow 2.10.
  • Added Intel® Optimization for Horovod* in Intel ® Extension for TensorFlow* Intel® Data Center Max GPU Series docker container. Users only need to install GPU driver in host machine and launch docker container directly to run TensorFlow + Horovod distributed workloads. Please get start from Docker Container Guide and Horovod ResNet50 example.
  • Enhanced unit tests to cover majority of TensorFlow Ops.
  • Added new OPs support and performance optimization
    • Added double data type support forMatMul/BatchMatMul/BatchMatMulV2.
    • Enabled Eigen vectorized RNE conversion between packed BF16 and FP32 for element-wise ops.
    • Enabled vectorization pass for Sigmoid OP.
    • Optimized ItexLSTM/NMS/ResizeNearestNeighbor OP.
    • Added more fusion pattern support(Conv+BiasAdd+Relu+Add fusion, Conv + Mish fusion).
  • Enabled INT8 quantization by oneDNN Graph API as default solution on CPU in Intel® Extension for TensorFlow* to provide better INT8 user experience together with Intel® Neural Compressor >= 2.0.
  • Added environment check script for users to check software stack installation status, including OS version, GPU driver, TensorFlow and other dependencies version in Intel® oneAPI Base Toolkit.
  • This release continued to provide experimental support for second generation Intel® Xeon® Scalable Processors and newer (such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids) and Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.

Bug Fixes and Other Changes

  • Fixed several kernel bugs, including NAN issue in LogSoftmax OP, Segment fault failure in Unique/ ParallelConcat OP.
  • Added cast from INT64 to BF16.

Known Issues

  • FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload with FP64 kernel on that platform, the workload will exit with exception as 'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.

Documents

Intel® Extension for TensorFlow* 1.0.0

02 Nov 04:09
Compare
Choose a tag to compare

Major Features

Intel® Extension for TensorFlow* is an Intel optimized Python package to extend official TensorFlow capability of running TensorFlow workloads on Intel GPU, and brings the first Intel GPU product Intel® Data Center GPU Flex Series 170 into TensorFlow open source community for AI workload acceleration. It’s based on TensorFlow PluggableDevice interface and provides fully support from TensorFlow 2.10.

This release contains following major features:

  • AOT (Ahead-of-time compilation)

    AOT Compilation is a performance feature which targets to remove just-in-time(JIT) overhead during application launch. It can be enabled when configure your system to source code build. Intel® Extension for TensorFlow* package in PyPI channel is built with AOT enabled.

  • Graph Optimization

    Advanced Automatic Mixed Precision

    Advanced Automatic Mixed Precision implements low-precision data types (float16 or bfloat16 ) with further boosted performance and less memory consumption. Please get started from how to enable.

    Graph fusion

    Intel® Extension for TensorFlow* provides graph optimization to fuse specified operators pattern to new single operator for better performance, such as Conv2D+ReLU, Linear+ReLU. Refer to the supported fusion list from Graph fusion.

  • Python API

    Public APIs to extend XPU operators are developed for better performance in the itex.ops namespace, including AdamWithWeightDecayOptimizer/gelu/LayerNormalization/ItexLSTM. Please find more details from Intel® Extension for TensorFlow* ops.

  • Intel® Extension for TensorFlow* Profiler

    Intel® Extension for TensorFlow* provides support for TensorFlow* Profiler to trace TensorFlow* models performance on Intel GPU. Please refer to how to enable profiler for more details.

  • Docker Container Support

    Intel® Extension for TensorFlow* Docker container is delivered to include Intel® oneAPI Base Toolkit and all other software stack except Intel GPU Drivers. Users only needs to install GPU driver in host machine, before pull and launch docker container directly. Please get started from Docker Container Guide.

  • FP32 Math Mode

    Float32 precision is to reduce TensorFloat-32 execution by ITEX_FP32_MATH_MODE setting. Users can enable this feature by setting ITEX_FP32_MATH_MODE(default FP32) to be equal with either value (GPU:TF32/FP32). More details in ITEX_FP32_MATH_MODE.

  • Intel® Extension for TensorFlow* Verbose

    ITEX_VERBOSE is designed to help users get more Intel® Extension for TensorFlow* log message by different log levels. More details in ITEX_VERBOSE level introduction.

  • INT8 Quantization

    Intel® Extension for TensorFlow* co-works with Intel® Neural Compressor >= 1.14.1 to provide compatible TensorFlow INT8 quantization solution support with same user experience.

  • Experimental Support

    This release provides experimental support for Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux, and second generation Intel® Xeon® Scalable Processors and newer, such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids.

Known Issues

  • FP64 is not natively supported by the Intel® Data Center GPU Flex Series platform. If you run any AI workload on that platform and receive error message as "[CRITICAL ERROR] Kernel 'XXX' removed due to usage of FP64 instructions unsupported by the targeted hardware" , it means that a kernel requires FP64 instructions is removed and not executed, hence the accuracy of whole workload is wrong.

Documentations to get started