remove ipex and ccl for xpu and cpu #42852

SunMarc · 2026-01-27T11:38:35Z

maybe we can skip those tests or require a specific tag to be enabled ? It will be too slow on our CI to run the model on our cpu.

@SunMarc, done, thx

SunMarc · 2026-01-27T11:38:46Z

-Original file line number
+Diff line change
@@ Expand Up @@
     | ofi | uses TorchScripts [optimize_for_inference](https://pytorch.org/docs/stable/generated/torch.jit.optimize_for_inference.html#torch-jit-optimize-for-inference) | inference |
     | fx2trt | uses [Torch-TensorRT](https://pytorch.org/TensorRT/tutorials/getting_started_with_fx_path.html) | inference |
     | onnxrt | uses [ONNX-RT](https://onnxruntime.ai/) for CPU and GPU inference | inference |
-    | ipex | uses [IPEX](https://github.com/intel/intel-extension-for-pytorch) for CPU inference | inference |
     ### Scaled dot production attention
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -225,31 +225,6 @@ model = AutoModelForCausalLM.from_pretrained( @@
     )
     ```
-    ## CPU
-    [Intel Extension for PyTorch (IPEX)](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/) is designed to enable performance optimizations on Intel hardware. Run the command below to install the latest version of autoawq with IPEX support.
-    ```bash
-    pip install intel-extension-for-pytorch # for IPEX-GPU refer to https://intel.github.io/intel-extension-for-pytorch/xpu/2.5.10+xpu/
-    pip install git+https://github.com/casper-hansen/AutoAWQ.git
-    ```
-    Set `version="ipex"` in [`AwqConfig`] to enable ExLlamaV2 kernels.
-    ```python
-    import torch
-    from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig
-    device = "cpu" # set to "xpu" for Intel GPU
-    quantization_config = AwqConfig(version="ipex")
-    model = AutoModelForCausalLM.from_pretrained(
-        "TheBloke/TinyLlama-1.1B-Chat-v0.3-AWQ",
-        quantization_config=quantization_config,
-        device_map=device,
-    )
-    ```
     ## Resources
     Run the AWQ demo [notebook](https://colab.research.google.com/drive/1HzZH89yAXJaZgwJDhQj9LqSBux932BvY#scrollTo=Wwsg6nCwoThm) for more examples of how to quantize a model, push a quantized model to the Hub, and more.

-Original file line number
+Diff line change
@@ Expand Up @@
     ```python
     import torch
-    import torch_npu # for xpu, replace it with `import intel_extension_for_pytorch`
+    import torch_npu
     # !! Further additional imports can be added here !!
     # Specify the device name (eg. 'cuda', 'cpu', 'npu', 'xpu', 'mps')
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -65,7 +65,6 @@ @@
         is_faiss_available,
         is_g2p_en_available,
         is_in_notebook,
-        is_ipex_available,
         is_librosa_available,
         is_onnx_available,
         is_pandas_available,
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -20,7 +20,6 @@ @@
         "aqlm": ["replace_with_aqlm_linear"],
         "awq": [
             "post_init_awq_exllama_modules",
-            "post_init_awq_ipex_modules",
             "replace_quantization_scales",
             "replace_with_awq_linear",
         ],
@@ Expand Down Expand Up / @@ -173,7 +172,6 @@ @@
         from .aqlm import replace_with_aqlm_linear
         from .awq import (
             post_init_awq_exllama_modules,
-            post_init_awq_ipex_modules,
             replace_quantization_scales,
             replace_with_awq_linear,
         )
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove ipex and ccl for xpu and cpu #42852

Uh oh!

Diff view

Diff view

There are no files selected for viewing

SunMarc Jan 27, 2026

Uh oh!

yao-matrix Jan 29, 2026

Uh oh!

SunMarc Jan 27, 2026

Uh oh!

yao-matrix Jan 29, 2026

Uh oh!

Uh oh!

-Original file line number
+Diff line change
@@ Expand Up / @@ -65,10 +65,6 @@ def initialize_tensor_parallelism( @@
                     backend_map = {"cuda": "nccl", "cpu": "gloo", "xpu": "xccl", "hpu": "hccl"}
                     backend = backend_map.get(device_type)
-                    if device_type == "cpu" and int(os.environ.get("CCL_WORKER_COUNT", "0")):
-                        backend = "ccl"
-                    if device_type == "xpu" and not is_torch_greater_or_equal("2.8", accept_dev=True):
-                        backend = "ccl"
                     torch.distributed.init_process_group(backend=backend, rank=rank, world_size=world_size)
                     current_device = getattr(torch, device_type)
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -105,7 +105,6 @@ @@
         is_hadamard_available,
         is_hqq_available,
         is_huggingface_hub_greater_or_equal,
-        is_ipex_available,
         is_jinja_available,
         is_jmespath_available,
         is_jumanpp_available,
@@ Expand Down Expand Up / @@ -677,21 +676,6 @@ def require_torchcodec(test_case): @@
         return unittest.skipUnless(is_torchcodec_available(), "test requires Torchcodec")(test_case)
-    def require_intel_extension_for_pytorch(test_case):
-        """
-        Decorator marking a test that requires Intel Extension for PyTorch.
-        These tests are skipped when Intel Extension for PyTorch isn't installed or it does not match current PyTorch
-        version.
-        """
-        return unittest.skipUnless(
-            is_ipex_available(),
-            "test requires Intel Extension for PyTorch to be installed and match current PyTorch version, see"
-            " https://github.com/intel/intel-extension-for-pytorch",
-        )(test_case)
     def require_torchaudio(test_case):
         """
         Decorator marking a test that requires torchaudio. These tests are skipped when torchaudio isn't installed.
@@ Expand Down Expand Up / @@ -895,9 +879,7 @@ def require_torch_xpu(test_case): @@
         """
         Decorator marking a test that requires XPU (in PyTorch).
-        These tests are skipped when XPU backend is not available. XPU backend might be available either via stock
-        PyTorch (>=2.4) or via Intel Extension for PyTorch. In the latter case, if IPEX is installed, its version
-        must match match current PyTorch version.
+        These tests are skipped when XPU backend is not available.
         """
         return unittest.skipUnless(is_torch_xpu_available(), "test requires XPU device")(test_case)
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -2313,7 +2313,7 @@ def _inner_training_loop( @@
             # as the model is wrapped, don't use `accelerator.prepare`
             # this is for unhandled cases such as
-            # FSDP-XLA, SageMaker MP/DP, DataParallel, IPEX
+            # FSDP-XLA, SageMaker MP/DP, DataParallel
             use_accelerator_prepare = model is self.model
             if use_accelerator_prepare and self.is_fsdp_enabled:
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -391,7 +391,7 @@ class TrainingArguments: @@
                 the [TF32](https://huggingface.co/docs/transformers/perf_train_gpu_one#tf32) documentation. This is an
                 experimental API and it may change.
             ddp_backend (`str`, *optional*):
-                The backend to use for distributed training. Must be one of `"nccl"`, `"mpi"`, `"ccl"`, `"gloo"`, `"hccl"`.
+                The backend to use for distributed training. Must be one of `"nccl"`, `"mpi"`, `"xccl"`, `"gloo"`, `"hccl"`.
             dataloader_drop_last (`bool`, *optional*, defaults to `False`):
                 Whether to drop the last incomplete batch (if the length of the dataset is not divisible by the batch size)
                 or not.
@@ Expand Down Expand Up / @@ -1037,7 +1037,7 @@ class TrainingArguments: @@
             default=None,
             metadata={
                 "help": "The backend to be used for distributed training",
-                "choices": ["nccl", "gloo", "mpi", "ccl", "hccl", "cncl", "mccl"],
+                "choices": ["nccl", "gloo", "mpi", "xccl", "hccl", "cncl", "mccl"],
             },
         )
         debug: str | list[DebugOption] = field(
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -241,25 +241,21 @@ def is_torch_npu_available(check_device=False) -> bool: @@
     @lru_cache
     def is_torch_xpu_available(check_device: bool = False) -> bool:
         """
-        Checks if XPU acceleration is available either via native PyTorch (>=2.6),
-        `intel_extension_for_pytorch` or via stock PyTorch (>=2.4) and potentially
-        if a XPU is in the environment.
+        Checks if XPU acceleration is available via stock PyTorch (>=2.6) and
+        potentially if a XPU is in the environment.
         """
         if not is_torch_available():
             return False
         torch_version = version.parse(get_torch_version())
         if torch_version.major == 2 and torch_version.minor < 6:
-            if is_ipex_available():
-                import intel_extension_for_pytorch  # noqa: F401
-            elif torch_version.major == 2 and torch_version.minor < 4:
-                return False
+            return False
         import torch
         if check_device:
             try:
-                # Will raise a RuntimeError if no XPU  is found
+                # Will raise a RuntimeError if no XPU is found
                 _ = torch.xpu.device_count()
                 return torch.xpu.is_available()
             except RuntimeError:
@@ Expand Down Expand Up / @@ -848,29 +844,6 @@ def is_ninja_available() -> bool: @@
             return True
-    @lru_cache
-    def is_ipex_available(min_version: str = "") -> bool:
-        def get_major_and_minor_from_version(full_version):
-            return str(version.parse(full_version).major) + "." + str(version.parse(full_version).minor)
-        ipex_available, ipex_version = _is_package_available("intel_extension_for_pytorch", return_version=True)
-        if not is_torch_available() or not ipex_available:
-            return False
-        torch_major_and_minor = get_major_and_minor_from_version(get_torch_version())
-        ipex_major_and_minor = get_major_and_minor_from_version(ipex_version)
-        if torch_major_and_minor != ipex_major_and_minor:
-            logger.warning_once(
-                f"Intel Extension for PyTorch {ipex_major_and_minor} needs to work with PyTorch {ipex_major_and_minor}.*,"
-                f" but PyTorch {get_torch_version()} is found. Please switch to the matching version and run again."
-            )
-            return False
-        if min_version:
-            return version.parse(ipex_version) >= version.parse(min_version)
-        return True
     @lru_cache
     def is_bitsandbytes_available(min_version: str = BITSANDBYTES_MIN_VERSION) -> bool:
         is_available, bitsandbytes_version = _is_package_available("bitsandbytes", return_version=True)
@@ Expand Down Expand Up / @@ -1161,11 +1134,6 @@ def is_uroman_available() -> bool: @@
         return _is_package_available("uroman")
-    @lru_cache
-    def is_ccl_available() -> bool:
-        return _is_package_available("torch_ccl") or _is_package_available("oneccl_bindings_for_pytorch")
     @lru_cache
     def is_sudachi_available() -> bool:
         return _is_package_available("sudachipy")
@@ Expand Down Expand Up / @@ -1774,13 +1742,6 @@ def is_training_run_on_sagemaker() -> bool: @@
     runtime after installation.
     """
-    # docstyle-ignore
-    CCL_IMPORT_ERROR = """
-    {0} requires the torch ccl library but it was not found in your environment. You can install it with pip:
-    `pip install oneccl_bind_pt -f https://developer.intel.com/ipex-whl-stable`
-    Please note that you may need to restart your runtime after installation.
-    """
     # docstyle-ignore
     ESSENTIA_IMPORT_ERROR = """
     {0} requires essentia library. But that was not found in your environment. You can install them with pip:
@@ Expand Down Expand Up / @@ -1869,7 +1830,6 @@ def is_training_run_on_sagemaker() -> bool: @@
             ("vision", (is_vision_available, VISION_IMPORT_ERROR)),
             ("scipy", (is_scipy_available, SCIPY_IMPORT_ERROR)),
             ("accelerate", (is_accelerate_available, ACCELERATE_IMPORT_ERROR)),
-            ("oneccl_bind_pt", (is_ccl_available, CCL_IMPORT_ERROR)),
             ("cython", (is_cython_available, CYTHON_IMPORT_ERROR)),
             ("rjieba", (is_rjieba_available, RJIEBA_IMPORT_ERROR)),
             ("peft", (is_peft_available, PEFT_IMPORT_ERROR)),
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -17,7 +17,7 @@ @@
     from collections.abc import Callable
     from typing import Any
-    from transformers import is_torch_available, is_torch_xpu_available
+    from transformers import is_torch_available
     from transformers.testing_utils import (
         TestCasePlus,
         backend_device_count,
@@ Expand All / @@ -28,19 +28,12 @@ @@
         torch_device,
         torchrun,
     )
-    from transformers.utils import is_ccl_available, is_ipex_available
     if is_torch_available():
         import functools
         import torch
-        if is_torch_xpu_available():
-            if is_ipex_available():
-                import intel_extension_for_pytorch  # noqa: F401
-            if is_ccl_available():
-                import oneccl_bindings_for_pytorch  # noqa: F401
         import torch.distributed
         from torch.distributed._composable.fsdp import fully_shard, register_fsdp_forward_method
         from torch.distributed.device_mesh import init_device_mesh
@@ Expand Down Expand Up / @@ -154,12 +147,22 @@ def test_generic_task_model_can_be_sharded(self): @@
                 from torch.distributed.fsdp import fully_shard
                 from transformers import AutoModelForTokenClassification
+                current_accelerator = torch.accelerator.current_accelerator(check_available=True)
+                accelerator_type = "cpu" if current_accelerator is None else current_accelerator.type
+                torch_accelerator_module = getattr(torch, accelerator_type, torch.cuda)
+                backend = "gloo"
+                if accelerator_type == "cuda":
+                    backend = "nccl"
+                elif accelerator_type == "xpu":
+                    backend = "xccl"
                 torch.distributed.init_process_group(
-                    backend="nccl" if torch.cuda.is_available() else "gloo", init_method="env://"
+                    backend=backend, init_method="env://"
                 )
                 rank = torch.distributed.get_rank()
-                if torch.cuda.is_available():
-                    torch.cuda.set_device(rank)
+                if torch_accelerator_module.is_available():
+                    torch_accelerator_module.set_device(rank)
                 # Make sure it works
                 model = AutoModelForTokenClassification.from_pretrained("Qwen/Qwen2-0.5B")
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -26,7 +26,6 @@ @@
     import numpy as np
     import pytest
-    from packaging import version
     from parameterized import parameterized
     from transformers import (
@@ Expand Down Expand Up / @@ -55,7 +54,7 @@ @@
         slow,
         torch_device,
     )
-    from transformers.utils import is_ipex_available, is_sklearn_available, is_torchdynamo_exporting
+    from transformers.utils import is_sklearn_available, is_torchdynamo_exporting
     from transformers.utils.generic import is_flash_attention_requested
@@ Expand Down Expand Up @@
         @require_torch_multi_accelerator
         @pytest.mark.generate
         def test_model_parallel_beam_search(self):
-            if "xpu" in torch_device:
-                if not (is_ipex_available("2.5") or version.parse(torch.__version__) >= version.parse("2.6")):
-                    self.skipTest(reason="device_map='auto' does not work with XPU devices")
             for model_class in self.all_generative_model_classes:
                 if model_class._no_split_modules is None:
                     continue
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -150,7 +150,7 @@ def test_small_model_pt_fp16(self): @@
             output = audio_classifier(audio, top_k=4)
             # Expected outputs are collected running the test on torch 2.6 in few scenarios.
-            # Running on CUDA T4/A100 and on XPU PVC (note: using stock torch xpu, NOT using IPEX):
+            # Running on CUDA T4/A100 and on XPU PVC:
             EXPECTED_OUTPUT = [
                 {"score": 0.0833, "label": "go"},
                 {"score": 0.0833, "label": "off"},
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -15,13 +15,14 @@ @@
     import tempfile
     import unittest
+    import pytest
     from transformers import AutoModelForCausalLM, AutoRoundConfig, AutoTokenizer
     from transformers.testing_utils import (
         backend_empty_cache,
         backend_synchronize,
         require_accelerate,
         require_auto_round,
-        require_intel_extension_for_pytorch,
         require_torch_accelerator,
         require_torch_gpu,
         require_torch_multi_accelerator,
@@ Expand Down Expand Up / @@ -106,7 +107,7 @@ def test_quantized_model_bf16(self): @@
             output = quantized_model.generate(**input_ids, max_new_tokens=40, do_sample=False)
             self.assertIn(self.tokenizer.decode(output[0], skip_special_tokens=True), self.EXPECTED_OUTPUTS)
-        @require_intel_extension_for_pytorch
+        @pytest.mark.skip(reason="This test is temperarily disabled for CI machine's CPU is slow")
         def test_quantized_model_on_cpu(self):
             """
             Simple test that checks if the quantized model is working properly
@@ Expand Down Expand Up / @@ -174,7 +175,7 @@ def test_convert_from_gptq(self): @@
             inputs = tokenizer(text, return_tensors="pt").to(model.device)
             tokenizer.decode(model.generate(**inputs, max_new_tokens=5)[0])
-        @require_intel_extension_for_pytorch
+        @pytest.mark.skip(reason="This test is temperarily disabled for CI machine's CPU is slow")
         def test_convert_from_awq_cpu(self):
             """
             Simple test that checks if auto-round work properly with awq format
@@ Expand Down @@

remove ipex and ccl for xpu and cpu #42852

Uh oh!

remove ipex and ccl for xpu and cpu #42852

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

SunMarc Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

yao-matrix Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

SunMarc Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

yao-matrix Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!