diff --git a/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt b/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt index c6c3cad9253..e779b035034 100644 --- a/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt +++ b/.azure-pipelines/scripts/codeScan/pyspelling/inc_dict.txt @@ -2495,4 +2495,9 @@ DistillationConfig SelfKnowledgeDistillationLossConfig DistillationConfig KnowledgeDistillationLossConfig -confs \ No newline at end of file +confs +HBM +Ponte +SmoothQuant +Vecchio +WeChat \ No newline at end of file diff --git a/README.md b/README.md index 7a5a2df60cd..ff1afab9507 100644 --- a/README.md +++ b/README.md @@ -94,12 +94,13 @@ inc_bench ### Validated Hardware Environment #### Intel® Neural Compressor supports CPUs based on [Intel 64 architecture or compatible processors](https://en.wikipedia.org/wiki/X86-64): -* Intel Xeon Scalable processor (formerly Skylake, Cascade Lake, Cooper Lake, and Icelake) -* Future Intel Xeon Scalable processor (code name Sapphire Rapids) +* Intel Xeon Scalable processor (formerly Skylake, Cascade Lake, Cooper Lake, Ice Lake, and Sapphire Rapids) +* Intel Xeon CPU Max Series (formerly Sapphire Rapids HBM) #### Intel® Neural Compressor supports GPUs built on Intel's Xe architecture: -* [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) +* Intel Data Center GPU Flex Series (formerly Arctic Sound-M) +* Intel Data Center GPU Max Series (formerly Ponte Vecchio) #### Intel® Neural Compressor quantized ONNX models support multiple hardware vendors through ONNX Runtime: @@ -116,6 +117,7 @@ inc_bench Framework TensorFlow Intel TensorFlow + Intel® Extension for TensorFlow* PyTorch Intel® Extension for PyTorch* ONNX Runtime @@ -125,24 +127,25 @@ inc_bench Version - 2.10.0
- 2.9.1
- 2.8.2
- 2.10.0
- 2.9.1
- 2.8.0
- 1.12.1+cpu
- 1.11.0+cpu
- 1.10.0+cpu - 1.12.0
- 1.11.0
- 1.10.0 - 1.12.1
- 1.11.0
- 1.10.0 - 1.8.0
- 1.7.0
- 1.6.0 + 2.11.0
+ 2.10.1
+ 2.9.3
+ 2.11.0
+ 2.10.0
+ 2.9.1
+ 1.0.0 + 1.13.1+cpu
+ 1.12.1+cpu
+ 1.11.0+cpu
+ 1.13.0
+ 1.12.1
+ 1.11.0
+ 1.13.1
+ 1.12.1
+ 1.11.0
+ 1.9.1
+ 1.8.0
+ 1.7.0
@@ -151,13 +154,7 @@ inc_bench > Set the environment variable ``TF_ENABLE_ONEDNN_OPTS=1`` to enable oneDNN optimizations if you are using TensorFlow v2.6 to v2.8. oneDNN is the default for TensorFlow v2.9. ### Validated Models -Intel® Neural Compressor validated 420+ [examples](./examples) for quantization with a performance speedup geomean of 2.2x and up to 4.2x on VNNI while minimizing accuracy loss. Over 30 pruning and knowledge distillation samples are also available. More details for validated models are available [here](./docs/source/validated_model_list.md). - -
- - Architecture - -
+Intel® Neural Compressor validated the quantization for 10K+ models from popular model hubs (e.g., HuggingFace Transformers, Torchvision, TensorFlow Model Hub, ONNX Model Zoo) with the performance speedup up to 4.2x on VNNI while minimizing the accuracy loss. Over 30 pruning and knowledge distillation samples are also available. More details for validated typical models are available [here](./docs/source/validated_model_list.md). ## Documentation @@ -169,52 +166,49 @@ Intel® Neural Compressor validated 420+ [examples](./examples) for quantization - Architecture - Examples - GUI - APIs + Architecture + Workflow + APIs + GUI + Notebook + Examples + Results Intel oneAPI AI Analytics Toolkit - AI and Analytics Samples - - - - - Basic API - - - - - Transform - Dataset - Metric - Objective - Deep Dive + Python-based APIs Quantization - Pruning(Sparsity) - Knowledge Distillation - Mixed Precision - Orchestration + Advanced Mixed Precision + Pruning(Sparsity) + Distillation + Orchestration Benchmarking - Distributed Training - TensorBoard + Distributed Compression + Model Export + + + + + Neural Coder (Zero-code Optimization) + + - Distillation for Quantization - Neural Coder + Launcher + JupyterLab Extension + Visual Studio Code Extension + Supported Matrix - @@ -223,19 +217,20 @@ Intel® Neural Compressor validated 420+ [examples](./examples) for quantization - Adaptor - Strategy + Adaptor + Strategy + Distillation for Quantization + SmoothQuant (Coming Soon) ## Selected Publications/Events -* [#MLefficiency — Optimizing transformer models for efficiency](https://medium.com/@kawapanion/mlefficiency-optimizing-transformer-models-for-efficiency-a9e230cff051)(Dec 2022) -* [One-Click Acceleration of Hugging Face Transformers with Intel’s Neural Coder](https://medium.com/intel-analytics-software/one-click-acceleration-of-huggingface-transformers-with-optimum-intel-by-neural-coder-f35ca3b1a82f)(Dec 2022) -* [One-Click Quantization of Deep Learning Models with the Neural Coder Extension](https://medium.com/intel-analytics-software/one-click-quantize-your-deep-learning-code-in-visual-studio-code-with-neural-coder-extension-8be1a0022c29)(Dec 2022) -* [Accelerate Stable Diffusion with Intel Neural Compressor](https://medium.com/intel-analytics-software/accelerating-stable-diffusion-inference-through-8-bit-post-training-quantization-with-intel-neural-e28f3615f77c)(Dec 2022) -* [Intel together with Tencent deepens the cooperation to build a cloud foundation for digital and intelligent industry](https://mp.weixin.qq.com/s/CPz9-5Nsh-5N9Q8-UmK--w) (Dec 2022) -* [Running Fast Transformers on CPUs: Intel Approach Achieves Significant Speed Ups and SOTA Performance](https://medium.com/syncedreview/running-fast-transformers-on-cpus-intel-approach-achieves-significant-speed-ups-and-sota-448521704c5e) (Nov 2022) +* Blog on Medium: [MLefficiency — Optimizing transformer models for efficiency](https://medium.com/@kawapanion/mlefficiency-optimizing-transformer-models-for-efficiency-a9e230cff051)(Dec 2022) +* Blog on Medium: [One-Click Acceleration of Hugging Face Transformers with Intel’s Neural Coder](https://medium.com/intel-analytics-software/one-click-acceleration-of-huggingface-transformers-with-optimum-intel-by-neural-coder-f35ca3b1a82f)(Dec 2022) +* Blog on Medium: [One-Click Quantization of Deep Learning Models with the Neural Coder Extension](https://medium.com/intel-analytics-software/one-click-quantize-your-deep-learning-code-in-visual-studio-code-with-neural-coder-extension-8be1a0022c29)(Dec 2022) +* Blog on Medium: [Accelerate Stable Diffusion with Intel Neural Compressor](https://medium.com/intel-analytics-software/accelerating-stable-diffusion-inference-through-8-bit-post-training-quantization-with-intel-neural-e28f3615f77c)(Dec 2022) +* Blog on WeChat: [Intel together with Tencent deepens the cooperation to build a cloud foundation for digital and intelligent industry](https://mp.weixin.qq.com/s/CPz9-5Nsh-5N9Q8-UmK--w) (Dec 2022) > View our [full publication list](./docs/source/publication_list.md). diff --git a/docs/source/api-documentation/apis.rst b/docs/source/api-documentation/apis.rst index 5148576eb9e..2028c25edb9 100644 --- a/docs/source/api-documentation/apis.rst +++ b/docs/source/api-documentation/apis.rst @@ -6,11 +6,7 @@ The following API information is available: .. toctree:: :maxdepth: 1 - newAPI - algorithm - strategy + new_api adaptor - pythonic - contrib + strategy model - utils diff --git a/docs/source/api-documentation/newAPI.rst b/docs/source/api-documentation/newAPI.rst deleted file mode 100644 index fb46af589c6..00000000000 --- a/docs/source/api-documentation/newAPI.rst +++ /dev/null @@ -1,12 +0,0 @@ -newAPI -###### - -The newAPI information is available: - -.. toctree:: - :maxdepth: 1 - - newAPI.quantization - newAPI.benchmark - newAPI.objective - newAPI.training diff --git a/docs/source/api-documentation/new_api.rst b/docs/source/api-documentation/new_api.rst index 6d0319a6373..be3bc874733 100644 --- a/docs/source/api-documentation/new_api.rst +++ b/docs/source/api-documentation/new_api.rst @@ -1,10 +1,14 @@ -New API +New user facing APIs ########### -The New API information is available: +The new user facing APIs information is available: .. toctree:: :maxdepth: 1 - new_api/config.rst - new_api/mix_precision.rst + new_api/quantization + new_api/mix_precision + new_api/benchmark + new_api/objective + new_api/training + new_api/config \ No newline at end of file diff --git a/docs/source/api-documentation/newAPI/benchmark.rst b/docs/source/api-documentation/new_api/benchmark.rst similarity index 92% rename from docs/source/api-documentation/newAPI/benchmark.rst rename to docs/source/api-documentation/new_api/benchmark.rst index 3f5a45ce3c9..c774174b482 100644 --- a/docs/source/api-documentation/newAPI/benchmark.rst +++ b/docs/source/api-documentation/new_api/benchmark.rst @@ -1,6 +1,6 @@ -Benchmark -========= - -.. autoapisummary:: - - neural_compressor.benchmark +Benchmark +========= + +.. autoapisummary:: + + neural_compressor.benchmark diff --git a/docs/source/api-documentation/new_api/config.rst b/docs/source/api-documentation/new_api/config.rst index 4be3fd08d9a..49db6e04940 100644 --- a/docs/source/api-documentation/new_api/config.rst +++ b/docs/source/api-documentation/new_api/config.rst @@ -1,6 +1,6 @@ -Config -============== - -.. autoapisummary:: - +Config +============== + +.. autoapisummary:: + neural_compressor.config \ No newline at end of file diff --git a/docs/source/api-documentation/new_api/mix_precision.rst b/docs/source/api-documentation/new_api/mix_precision.rst index 3a9b35162d3..847cdd6c74c 100644 --- a/docs/source/api-documentation/new_api/mix_precision.rst +++ b/docs/source/api-documentation/new_api/mix_precision.rst @@ -1,6 +1,6 @@ -Mix Precision -============== - -.. autoapisummary:: - +Mix Precision +============== + +.. autoapisummary:: + neural_compressor.mix_precision \ No newline at end of file diff --git a/docs/source/api-documentation/newAPI/objective.rst b/docs/source/api-documentation/new_api/objective.rst similarity index 92% rename from docs/source/api-documentation/newAPI/objective.rst rename to docs/source/api-documentation/new_api/objective.rst index 75211d8060c..e79ce8ff730 100644 --- a/docs/source/api-documentation/newAPI/objective.rst +++ b/docs/source/api-documentation/new_api/objective.rst @@ -1,6 +1,6 @@ -Objective -========= - -.. autoapisummary:: - - neural_compressor.objective +Objective +========= + +.. autoapisummary:: + + neural_compressor.objective diff --git a/docs/source/api-documentation/newAPI/quantization.rst b/docs/source/api-documentation/new_api/quantization.rst similarity index 94% rename from docs/source/api-documentation/newAPI/quantization.rst rename to docs/source/api-documentation/new_api/quantization.rst index ff783a8f725..68e9ae1544f 100644 --- a/docs/source/api-documentation/newAPI/quantization.rst +++ b/docs/source/api-documentation/new_api/quantization.rst @@ -1,6 +1,6 @@ -Quantization -============ - -.. autoapisummary:: - +Quantization +============ + +.. autoapisummary:: + neural_compressor.quantization \ No newline at end of file diff --git a/docs/source/api-documentation/newAPI/training.rst b/docs/source/api-documentation/new_api/training.rst similarity index 92% rename from docs/source/api-documentation/newAPI/training.rst rename to docs/source/api-documentation/new_api/training.rst index be6e85f92f0..98b94828b09 100644 --- a/docs/source/api-documentation/newAPI/training.rst +++ b/docs/source/api-documentation/new_api/training.rst @@ -1,6 +1,6 @@ -Training -======== - -.. autoapisummary:: - - neural_compressor.training +Training +======== + +.. autoapisummary:: + + neural_compressor.training diff --git a/docs/source/benchmark.md b/docs/source/benchmark.md index 08502323b6e..0ab89bc3548 100644 --- a/docs/source/benchmark.md +++ b/docs/source/benchmark.md @@ -1,4 +1,4 @@ -Benchmark +Benchmarking ============ 1. [Introduction](#Introduction) 2. [Benchmark Support Matrix](#Benchmark-Support-Matrix)