Merge branch 'main' into LayoutLMv3-TFLite-conversion-support

huggingface · Apr 26, 2024 · 4901d1d · 4901d1d
2 parents 382bda3 + c55f882
commit 4901d1d
Show file tree

Hide file tree

Showing 48 changed files with 1,029 additions and 104 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -20,3 +20,13 @@ Fixes # (issue)
 - [ ] Did you make sure to update the documentation with your changes?
 - [ ] Did you write any new necessary tests?
 
+## Who can review?
+
+<!--
+For faster review, we strongly recommend you to ping the following people:
+- ONNX / ONNX Runtime : @fxmarty, @echarlaix, @JingyaHuang, @michaelbenayoun
+- ONNX Runtime Training: @JingyaHuang
+- BetterTransformer: @fxmarty
+- GPTQ, quantization: @fxmarty, @SunMarc
+- TFLite export: @michaelbenayoun
+-->
diff --git a/.github/workflows/build_main_documentation.yml b/.github/workflows/build_main_documentation.yml
@@ -49,10 +49,14 @@ jobs:
           repository: 'huggingface/optimum-amd'
           path: optimum-amd
 
+      - uses: actions/checkout@v2
+        with:
+          repository: 'huggingface/optimum-tpu'
+          path: optimum-tpu
+
       - name: Free disk space
         run: |
           df -h
-          sudo apt-get update
           sudo apt-get purge -y '^apache.*'
           sudo apt-get purge -y '^imagemagick.*'
           sudo apt-get purge -y '^dotnet.*'
@@ -133,6 +137,8 @@ jobs:
         run: |
           cd optimum-furiosa
           pip install .
+          sudo apt install software-properties-common
+          sudo add-apt-repository --remove https://packages.microsoft.com/ubuntu/22.04/prod
           sudo apt update
           sudo apt install -y ca-certificates apt-transport-https gnupg
           sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-key 5F03AFA423A751913F249259814F888B20B09A7E
@@ -150,6 +156,16 @@ jobs:
           mv furiosa-doc-build ../optimum
           cd ..
 
+      - name: Make TPU documentation
+        run: |
+          sudo docker system prune -a -f
+          cd optimum-tpu
+          pip install -U pip
+          pip install . -f https://storage.googleapis.com/libtpu-releases/index.html
+          doc-builder build optimum.tpu docs/source/ --build_dir tpu-doc-build --version pr_$PR_NUMBER --version_tag_suffix "" --html --clean
+          mv tpu-doc-build ../optimum
+          cd ..
+
       - name: Make AMD documentation
         run: |
           sudo docker system prune -a -f
@@ -171,7 +187,7 @@ jobs:
       - name: Combine subpackage documentation
         run: |
           cd optimum
-          sudo python docs/combine_docs.py --subpackages nvidia amd intel neuron habana furiosa --version ${{ env.VERSION }}
+          sudo python docs/combine_docs.py --subpackages nvidia amd intel neuron tpu habana furiosa --version ${{ env.VERSION }}
           cd ..
 
       - name: Push to repositories

diff --git a/.github/workflows/build_pr_documentation.yml b/.github/workflows/build_pr_documentation.yml
@@ -53,6 +53,11 @@ jobs:
           repository: 'huggingface/optimum-amd'
           path: optimum-amd
 
+      - uses: actions/checkout@v2
+        with:
+          repository: 'huggingface/optimum-tpu'
+          path: optimum-tpu
+
       - name: Setup environment
         run: |
           pip uninstall -y doc-builder
@@ -91,6 +96,16 @@ jobs:
           sudo mv amd-doc-build ../optimum
           cd ..
 
+      - name: Make TPU documentation
+        run: |
+          sudo docker system prune -a -f
+          cd optimum-tpu
+          pip install -U pip
+          pip install . -f https://storage.googleapis.com/libtpu-releases/index.html
+          doc-builder build optimum.tpu docs/source/ --build_dir tpu-doc-build --version pr_$PR_NUMBER --version_tag_suffix "" --html --clean
+          mv tpu-doc-build ../optimum
+          cd ..
+
       - name: Make Optimum documentation
         run: |
           sudo docker system prune -a -f
@@ -101,7 +116,7 @@ jobs:
       - name: Combine subpackage documentation
         run: |
           cd optimum
-          sudo python docs/combine_docs.py --subpackages nvidia amd intel neuron habana furiosa --version pr_$PR_NUMBER
+          sudo python docs/combine_docs.py --subpackages nvidia amd intel neuron tpu habana furiosa --version pr_$PR_NUMBER
           sudo mv optimum-doc-build ../
           cd ..
 

diff --git a/.github/workflows/dev_test_bettertransformer.yml b/.github/workflows/dev_test_bettertransformer.yml
@@ -16,7 +16,7 @@ jobs:
         - 3.8
         os:
         - ubuntu-20.04
-        - macos-latest
+        - macos-13
     runs-on: ${{ matrix.os }}
     steps:
     - uses: actions/checkout@v2
@@ -35,4 +35,4 @@ jobs:
     - name: Test with unittest
       working-directory: tests
       run: |
-        python -m unittest discover -s bettertransformer -p test_*.py
+        python -m unittest discover -s bettertransformer -p test_*.py
diff --git a/.github/workflows/dev_test_dummy_inputs.yml b/.github/workflows/dev_test_dummy_inputs.yml
@@ -17,7 +17,7 @@ jobs:
         - 3.9
         os:
         - ubuntu-20.04
-        - macos-latest
+        - macos-13
     runs-on: ${{ matrix.os }}
     steps:
     - uses: actions/checkout@v2
@@ -35,4 +35,4 @@ jobs:
     - name: Test with unittest
       working-directory: tests
       run: |
-        python -m unittest discover -s utils -p test_*.py
+        python -m unittest discover -s utils -p test_*.py
diff --git a/.github/workflows/dev_test_fx.yml b/.github/workflows/dev_test_fx.yml
@@ -17,7 +17,7 @@ jobs:
         - 3.9
         os:
         - ubuntu-20.04
-        - macos-latest
+        - macos-13
     runs-on: ${{ matrix.os }}
     steps:
     - uses: actions/checkout@v2
@@ -35,4 +35,4 @@ jobs:
     - name: Test with unittest
       working-directory: tests
       run: |
-        python -m pytest fx/optimization/test_transformations.py --exitfirst
+        python -m pytest fx/optimization/test_transformations.py --exitfirst
diff --git a/.github/workflows/dev_test_onnx.yml b/.github/workflows/dev_test_onnx.yml
@@ -17,7 +17,7 @@ jobs:
         - 3.9
         os:
         - ubuntu-20.04
-        - macos-latest
+        - macos-13
     runs-on: ${{ matrix.os }}
     steps:
     - uses: actions/checkout@v2
@@ -34,4 +34,4 @@ jobs:
     - name: Test with unittest
       working-directory: tests
       run: |
-        python -m unittest discover -s onnx -p test_*.py
+        python -m unittest discover -s onnx -p test_*.py
diff --git a/.github/workflows/dev_test_onnxruntime.yml b/.github/workflows/dev_test_onnxruntime.yml
@@ -18,7 +18,7 @@ jobs:
         os:
         - ubuntu-20.04
         - windows-2019
-        - macos-latest
+        - macos-13
     runs-on: ${{ matrix.os }}
     steps:
     - uses: actions/checkout@v2
@@ -36,4 +36,4 @@ jobs:
       working-directory: tests
       run: |
         python -m pytest -n auto -m "not run_in_series" onnxruntime
-        python -m pytest -m "run_in_series" onnxruntime
+        python -m pytest -m "run_in_series" onnxruntime
diff --git a/.github/workflows/dev_test_optimum_common.yml b/.github/workflows/dev_test_optimum_common.yml
@@ -19,7 +19,7 @@ jobs:
         os:
         - ubuntu-20.04
         - windows-2019
-        - macos-latest
+        - macos-13
     runs-on: ${{ matrix.os }}
     steps:
     - uses: actions/checkout@v2
@@ -42,4 +42,4 @@ jobs:
         as the staging tests cannot run in parallel.
         export HUGGINGFACE_CO_STAGING=${{ matrix.python-version == 3.8 && matrix.os
         == ubuntu-20.04 }}
-        python -m unittest discover -s tests -p test_*.py
+        python -m unittest discover -s tests -p test_*.py
diff --git a/.github/workflows/test_bettertransformer.yml b/.github/workflows/test_bettertransformer.yml
@@ -16,7 +16,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: [3.8, 3.9]
-        os: [ubuntu-20.04, macos-latest]
+        os: [ubuntu-20.04, macos-13]
 
     runs-on: ${{ matrix.os }}
     steps:

diff --git a/.github/workflows/test_cli.yml b/.github/workflows/test_cli.yml
@@ -18,7 +18,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: [3.8, 3.9]
-        os: [ubuntu-20.04, macos-latest]
+        os: [ubuntu-20.04, macos-13]
 
     runs-on: ${{ matrix.os }}
     steps:

diff --git a/.github/workflows/test_dummy_inputs.yml b/.github/workflows/test_dummy_inputs.yml
@@ -18,7 +18,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: [3.8, 3.9]
-        os: [ubuntu-20.04, macos-latest]
+        os: [ubuntu-20.04, macos-13]
 
     runs-on: ${{ matrix.os }}
     steps:

diff --git a/.github/workflows/test_fx.yml b/.github/workflows/test_fx.yml
@@ -16,7 +16,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: [3.8, 3.9]
-        os: [ubuntu-20.04, macos-latest]
+        os: [ubuntu-20.04, macos-13]
 
     runs-on: ${{ matrix.os }}
     steps:

diff --git a/.github/workflows/test_offline.yml b/.github/workflows/test_offline.yml
@@ -0,0 +1,43 @@
+name: Offline usage / Python - Test
+
+on:
+  push:
+    branches: [ main ]
+  pull_request:
+    branches: [ main ]
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+
+jobs:
+  build:
+    strategy:
+      fail-fast: false
+      matrix:
+        python-version: [3.9]
+        os: [ubuntu-20.04]
+
+    runs-on: ${{ matrix.os }}
+    steps:
+    - uses: actions/checkout@v2
+    - name: Setup Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v2
+      with:
+        python-version: ${{ matrix.python-version }}
+    - name: Install dependencies for pytorch export
+      run: |
+        pip install .[tests,exporters,onnxruntime]
+    - name: Test with unittest
+      run: |
+        HF_HOME=/tmp/ huggingface-cli download hf-internal-testing/tiny-random-gpt2
+
+        HF_HOME=/tmp/ HF_HUB_OFFLINE=1 optimum-cli export onnx --model hf-internal-testing/tiny-random-gpt2 gpt2_onnx --task text-generation
+
+        huggingface-cli download hf-internal-testing/tiny-random-gpt2
+
+        HF_HUB_OFFLINE=1 optimum-cli export onnx --model hf-internal-testing/tiny-random-gpt2 gpt2_onnx --task text-generation
+
+        pytest tests/onnxruntime/test_modeling.py -k "test_load_model_from_hub and not from_hub_onnx" -s -vvvvv
+
+        HF_HUB_OFFLINE=1 pytest tests/onnxruntime/test_modeling.py -k "test_load_model_from_hub and not from_hub_onnx" -s -vvvvv
diff --git a/.github/workflows/test_onnx.yml b/.github/workflows/test_onnx.yml
@@ -16,7 +16,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: [3.8, 3.9]
-        os: [ubuntu-20.04, macos-latest]
+        os: [ubuntu-20.04, macos-13]
 
     runs-on: ${{ matrix.os }}
     steps:

diff --git a/.github/workflows/test_onnxruntime.yml b/.github/workflows/test_onnxruntime.yml
@@ -18,7 +18,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: [3.8, 3.9]
-        os: [ubuntu-20.04, windows-2019, macos-latest]
+        os: [ubuntu-20.04, windows-2019, macos-13]
 
     runs-on: ${{ matrix.os }}
     steps:

diff --git a/.github/workflows/test_optimum_common.yml b/.github/workflows/test_optimum_common.yml
@@ -18,7 +18,7 @@ jobs:
       fail-fast: false
       matrix:
         python-version: [3.8, 3.9]
-        os: [ubuntu-20.04, windows-2019, macos-latest]
+        os: [ubuntu-20.04, windows-2019, macos-13]
 
     runs-on: ${{ matrix.os }}
     steps:

diff --git a/README.md b/README.md
@@ -14,16 +14,18 @@ python -m pip install optimum
 
 If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:
 
-| Accelerator                                                                                                            | Installation                                      |
-|:-----------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------|
-| [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/overview)                                                                           | `pip install --upgrade-strategy eager optimum[onnxruntime]`       |
-| [Intel Neural Compressor](https://huggingface.co/docs/optimum/intel/index)       | `pip install --upgrade-strategy eager optimum[neural-compressor]`|
-| [OpenVINO](https://huggingface.co/docs/optimum/intel/index)                                                                 | `pip install --upgrade-strategy eager optimum[openvino,nncf]`    |
-| [AMD Instinct GPUs and Ryzen AI NPU](https://huggingface.co/docs/optimum/amd/index)                     | `pip install --upgrade-strategy eager optimum[amd]`              |
-| [Habana Gaudi Processor (HPU)](https://huggingface.co/docs/optimum/habana/index)                                                            | `pip install --upgrade-strategy eager optimum[habana]`           |
-| [FuriosaAI](https://huggingface.co/docs/optimum/furiosa/index)                                                                                   | `pip install --upgrade-strategy eager optimum[furiosa]`          |
-
-The `--upgrade-strategy eager` option is needed to ensure the different packages are upgraded to the latest possible version.
+| Accelerator                                                                                                            | Installation                                                      |
+|:-----------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------|
+| [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/overview)                                               | `pip install --upgrade --upgrade-strategy eager optimum[onnxruntime]`      |
+| [Intel Neural Compressor](https://huggingface.co/docs/optimum/intel/index)                                             | `pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]`|
+| [OpenVINO](https://huggingface.co/docs/optimum/intel/index)                                                            | `pip install --upgrade --upgrade-strategy eager optimum[openvino]`         |
+| [NVIDIA TensorRT-LLM](https://huggingface.co/docs/optimum/main/en/nvidia_overview)                                     | `docker run -it --gpus all --ipc host huggingface/optimum-nvidia`          |
+| [AMD Instinct GPUs and Ryzen AI NPU](https://huggingface.co/docs/optimum/amd/index)                                    | `pip install --upgrade --upgrade-strategy eager optimum[amd]`              |
+| [AWS Trainum & Inferentia](https://huggingface.co/docs/optimum-neuron/index)                                           | `pip install --upgrade --upgrade-strategy eager optimum[neuronx]`          |
+| [Habana Gaudi Processor (HPU)](https://huggingface.co/docs/optimum/habana/index)                                       | `pip install --upgrade --upgrade-strategy eager optimum[habana]`           |
+| [FuriosaAI](https://huggingface.co/docs/optimum/furiosa/index)                                                         | `pip install --upgrade --upgrade-strategy eager optimum[furiosa]`          |
+
+The `--upgrade --upgrade-strategy eager` option is needed to ensure the different packages are upgraded to the latest possible version.
 
 To install from source:
 
@@ -45,6 +47,8 @@ python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/op
 - TensorFlow Lite
 - [OpenVINO](https://huggingface.co/docs/optimum/intel/inference)
 - Habana first-gen Gaudi / Gaudi2, more details [here](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_inference)
+- AWS Inferentia 2 / Inferentia 1, more details [here](https://huggingface.co/docs/optimum-neuron/en/guides/models)
+- NVIDIA TensorRT-LLM , more details [here](https://huggingface.co/blog/optimum-nvidia)
 
 The [export](https://huggingface.co/docs/optimum/exporters/overview) and optimizations can be done both programmatically and with a command line.
 
@@ -66,7 +70,7 @@ The [export](https://huggingface.co/docs/optimum/exporters/overview) and optimiz
 Before you begin, make sure you have all the necessary libraries installed :
 
 ```bash
-pip install --upgrade-strategy eager optimum[openvino,nncf]
+pip install --upgrade --upgrade-strategy eager optimum[openvino]
 ```
 
 It is possible to export 🤗 Transformers and Diffusers models to the OpenVINO format easily:
@@ -75,7 +79,8 @@ It is possible to export 🤗 Transformers and Diffusers models to the OpenVINO
 optimum-cli export openvino --model distilbert-base-uncased-finetuned-sst-2-english distilbert_sst2_ov
 ```
 
-If you add `--int8`, the weights will be quantized to INT8. Static quantization can also be applied on the activations using [NNCF](https://github.com/openvinotoolkit/nncf), more information can be found in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov).
+If you add `--weight-format int8`, the weights will be quantized to `int8`, check out our [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov#weight-only-quantization) for more detail on weight only quantization. To apply quantization on both weights and activations, you can find more information [here](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov#static-quantization).
+
 
 To load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set `export=True` when loading your model.
 
@@ -100,7 +105,7 @@ You can find more examples in the [documentation](https://huggingface.co/docs/op
 Before you begin, make sure you have all the necessary libraries installed :
 
 ```bash
-pip install --upgrade-strategy eager optimum[neural-compressor]
+pip install --upgrade --upgrade-strategy eager optimum[neural-compressor]
 ```
 
 Dynamic quantization can be applied on your model:
@@ -190,14 +195,15 @@ optimum-cli export tflite \
 We support many providers:
 
 - Habana's Gaudi processors
+- AWS Trainium instances, check [here](https://huggingface.co/docs/optimum-neuron/en/guides/distributed_training)
 - ONNX Runtime (optimized for GPUs)
 
 ### Habana
 
 Before you begin, make sure you have all the necessary libraries installed :
 
 ```bash
-pip install --upgrade-strategy eager optimum[habana]
+pip install --upgrade --upgrade-strategy eager optimum[habana]
 ```
 
 ```diff