Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
20e6ee1
remove ipex and ccl for xpu and cpu, since all ipex optimizations are
yao-matrix Dec 12, 2025
7d0cf21
fix style
yao-matrix Dec 12, 2025
a6037b4
xx
yao-matrix Dec 12, 2025
00a8d4e
Merge branch 'main' into remove-ipex
yao-matrix Dec 15, 2025
b20cb4c
Merge branch 'main' into remove-ipex
yao-matrix Dec 16, 2025
0b8f510
Merge branch 'main' into remove-ipex
yao-matrix Dec 16, 2025
692125e
xx
yao-matrix Dec 16, 2025
c9dce30
xx
yao-matrix Dec 17, 2025
dae56db
Merge branch 'main' into remove-ipex
yao-matrix Dec 17, 2025
7fdbd12
xx
yao-matrix Dec 17, 2025
e1cbfed
Merge branch 'main' into remove-ipex
yao-matrix Dec 17, 2025
49c8837
auto-round
yao-matrix Dec 17, 2025
a7fb9f7
Merge branch 'main' into remove-ipex
yao-matrix Dec 17, 2025
be84649
xx
yao-matrix Dec 17, 2025
b404ccc
Merge branch 'main' into remove-ipex
yao-matrix Dec 18, 2025
2c16e21
Merge branch 'main' into remove-ipex
yao-matrix Dec 19, 2025
2cc79d0
Merge branch 'main' into remove-ipex
yao-matrix Jan 5, 2026
e7cce37
Merge branch 'main' into remove-ipex
yao-matrix Jan 7, 2026
3309a48
Merge branch 'main' into remove-ipex
yao-matrix Jan 16, 2026
55da8d6
fix style
yao-matrix Jan 16, 2026
3eac2a2
Merge branch 'main' into remove-ipex
yao-matrix Jan 16, 2026
2929445
Merge branch 'main' into remove-ipex
yao-matrix Jan 26, 2026
f92b95b
Merge branch 'main' into remove-ipex
yao-matrix Jan 29, 2026
0e8b326
xx
yao-matrix Jan 29, 2026
f7f0927
xx
yao-matrix Jan 29, 2026
43a4796
Merge branch 'main' into remove-ipex
SunMarc Jan 30, 2026
a22818f
Merge branch 'main' into remove-ipex
SunMarc Jan 30, 2026
a39b97f
Merge branch 'main' into remove-ipex
yao-matrix Jan 30, 2026
eb60a71
Merge branch 'main' into remove-ipex
SunMarc Feb 2, 2026
3f7a784
Merge branch 'main' into remove-ipex
yao-matrix Feb 2, 2026
457f64b
Merge branch 'main' into remove-ipex
yao-matrix Feb 3, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion docs/source/en/perf_train_gpu_one.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,6 @@ Refer to the table below to help you choose the right backend for your training
| ofi | uses TorchScripts [optimize_for_inference](https://pytorch.org/docs/stable/generated/torch.jit.optimize_for_inference.html#torch-jit-optimize-for-inference) | inference |
| fx2trt | uses [Torch-TensorRT](https://pytorch.org/TensorRT/tutorials/getting_started_with_fx_path.html) | inference |
| onnxrt | uses [ONNX-RT](https://onnxruntime.ai/) for CPU and GPU inference | inference |
| ipex | uses [IPEX](https://github.com/intel/intel-extension-for-pytorch) for CPU inference | inference |

### Scaled dot production attention

Expand Down
25 changes: 0 additions & 25 deletions docs/source/en/quantization/awq.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,31 +225,6 @@ model = AutoModelForCausalLM.from_pretrained(
)
```

## CPU

[Intel Extension for PyTorch (IPEX)](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/) is designed to enable performance optimizations on Intel hardware. Run the command below to install the latest version of autoawq with IPEX support.

```bash
pip install intel-extension-for-pytorch # for IPEX-GPU refer to https://intel.github.io/intel-extension-for-pytorch/xpu/2.5.10+xpu/
pip install git+https://github.com/casper-hansen/AutoAWQ.git
```

Set `version="ipex"` in [`AwqConfig`] to enable ExLlamaV2 kernels.

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AwqConfig

device = "cpu" # set to "xpu" for Intel GPU
quantization_config = AwqConfig(version="ipex")

model = AutoModelForCausalLM.from_pretrained(
"TheBloke/TinyLlama-1.1B-Chat-v0.3-AWQ",
quantization_config=quantization_config,
device_map=device,
)
```

## Resources

Run the AWQ demo [notebook](https://colab.research.google.com/drive/1HzZH89yAXJaZgwJDhQj9LqSBux932BvY#scrollTo=Wwsg6nCwoThm) for more examples of how to quantize a model, push a quantized model to the Hub, and more.
2 changes: 1 addition & 1 deletion docs/source/en/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -516,7 +516,7 @@ Alternative backends may also require the replacement of device-specific functio

```python
import torch
import torch_npu # for xpu, replace it with `import intel_extension_for_pytorch`
import torch_npu
# !! Further additional imports can be added here !!

# Specify the device name (eg. 'cuda', 'cpu', 'npu', 'xpu', 'mps')
Expand Down
1 change: 0 additions & 1 deletion src/transformers/file_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,6 @@
is_faiss_available,
is_g2p_en_available,
is_in_notebook,
is_ipex_available,
is_librosa_available,
is_onnx_available,
is_pandas_available,
Expand Down
2 changes: 0 additions & 2 deletions src/transformers/integrations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
"aqlm": ["replace_with_aqlm_linear"],
"awq": [
"post_init_awq_exllama_modules",
"post_init_awq_ipex_modules",
"replace_quantization_scales",
"replace_with_awq_linear",
],
Expand Down Expand Up @@ -173,7 +172,6 @@
from .aqlm import replace_with_aqlm_linear
from .awq import (
post_init_awq_exllama_modules,
post_init_awq_ipex_modules,
replace_quantization_scales,
replace_with_awq_linear,
)
Expand Down
4 changes: 0 additions & 4 deletions src/transformers/integrations/tensor_parallel.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,6 @@ def initialize_tensor_parallelism(

backend_map = {"cuda": "nccl", "cpu": "gloo", "xpu": "xccl", "hpu": "hccl"}
backend = backend_map.get(device_type)
if device_type == "cpu" and int(os.environ.get("CCL_WORKER_COUNT", "0")):
backend = "ccl"
if device_type == "xpu" and not is_torch_greater_or_equal("2.8", accept_dev=True):
backend = "ccl"

torch.distributed.init_process_group(backend=backend, rank=rank, world_size=world_size)
current_device = getattr(torch, device_type)
Expand Down
20 changes: 1 addition & 19 deletions src/transformers/testing_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,6 @@
is_hadamard_available,
is_hqq_available,
is_huggingface_hub_greater_or_equal,
is_ipex_available,
is_jinja_available,
is_jmespath_available,
is_jumanpp_available,
Expand Down Expand Up @@ -677,21 +676,6 @@ def require_torchcodec(test_case):
return unittest.skipUnless(is_torchcodec_available(), "test requires Torchcodec")(test_case)


def require_intel_extension_for_pytorch(test_case):
"""
Decorator marking a test that requires Intel Extension for PyTorch.

These tests are skipped when Intel Extension for PyTorch isn't installed or it does not match current PyTorch
version.

"""
return unittest.skipUnless(
is_ipex_available(),
"test requires Intel Extension for PyTorch to be installed and match current PyTorch version, see"
" https://github.com/intel/intel-extension-for-pytorch",
)(test_case)


def require_torchaudio(test_case):
"""
Decorator marking a test that requires torchaudio. These tests are skipped when torchaudio isn't installed.
Expand Down Expand Up @@ -895,9 +879,7 @@ def require_torch_xpu(test_case):
"""
Decorator marking a test that requires XPU (in PyTorch).

These tests are skipped when XPU backend is not available. XPU backend might be available either via stock
PyTorch (>=2.4) or via Intel Extension for PyTorch. In the latter case, if IPEX is installed, its version
must match match current PyTorch version.
These tests are skipped when XPU backend is not available.
"""
return unittest.skipUnless(is_torch_xpu_available(), "test requires XPU device")(test_case)

Expand Down
2 changes: 1 addition & 1 deletion src/transformers/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -2313,7 +2313,7 @@ def _inner_training_loop(

# as the model is wrapped, don't use `accelerator.prepare`
# this is for unhandled cases such as
# FSDP-XLA, SageMaker MP/DP, DataParallel, IPEX
# FSDP-XLA, SageMaker MP/DP, DataParallel
use_accelerator_prepare = model is self.model

if use_accelerator_prepare and self.is_fsdp_enabled:
Expand Down
4 changes: 2 additions & 2 deletions src/transformers/training_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -391,7 +391,7 @@ class TrainingArguments:
the [TF32](https://huggingface.co/docs/transformers/perf_train_gpu_one#tf32) documentation. This is an
experimental API and it may change.
ddp_backend (`str`, *optional*):
The backend to use for distributed training. Must be one of `"nccl"`, `"mpi"`, `"ccl"`, `"gloo"`, `"hccl"`.
The backend to use for distributed training. Must be one of `"nccl"`, `"mpi"`, `"xccl"`, `"gloo"`, `"hccl"`.
dataloader_drop_last (`bool`, *optional*, defaults to `False`):
Whether to drop the last incomplete batch (if the length of the dataset is not divisible by the batch size)
or not.
Expand Down Expand Up @@ -1037,7 +1037,7 @@ class TrainingArguments:
default=None,
metadata={
"help": "The backend to be used for distributed training",
"choices": ["nccl", "gloo", "mpi", "ccl", "hccl", "cncl", "mccl"],
"choices": ["nccl", "gloo", "mpi", "xccl", "hccl", "cncl", "mccl"],
},
)
debug: str | list[DebugOption] = field(
Expand Down
2 changes: 0 additions & 2 deletions src/transformers/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,6 @@
is_av_available,
is_bitsandbytes_available,
is_bs4_available,
is_ccl_available,
is_coloredlogs_available,
is_compressed_tensors_available,
is_cuda_platform,
Expand Down Expand Up @@ -150,7 +149,6 @@
is_hqq_available,
is_huggingface_hub_greater_or_equal,
is_in_notebook,
is_ipex_available,
is_jinja_available,
is_jmespath_available,
is_jumanpp_available,
Expand Down
48 changes: 4 additions & 44 deletions src/transformers/utils/import_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,25 +241,21 @@ def is_torch_npu_available(check_device=False) -> bool:
@lru_cache
def is_torch_xpu_available(check_device: bool = False) -> bool:
"""
Checks if XPU acceleration is available either via native PyTorch (>=2.6),
`intel_extension_for_pytorch` or via stock PyTorch (>=2.4) and potentially
if a XPU is in the environment.
Checks if XPU acceleration is available via stock PyTorch (>=2.6) and
potentially if a XPU is in the environment.
"""
if not is_torch_available():
return False

torch_version = version.parse(get_torch_version())
if torch_version.major == 2 and torch_version.minor < 6:
if is_ipex_available():
import intel_extension_for_pytorch # noqa: F401
elif torch_version.major == 2 and torch_version.minor < 4:
return False
return False

import torch

if check_device:
try:
# Will raise a RuntimeError if no XPU is found
# Will raise a RuntimeError if no XPU is found
_ = torch.xpu.device_count()
return torch.xpu.is_available()
except RuntimeError:
Expand Down Expand Up @@ -848,29 +844,6 @@ def is_ninja_available() -> bool:
return True


@lru_cache
def is_ipex_available(min_version: str = "") -> bool:
def get_major_and_minor_from_version(full_version):
return str(version.parse(full_version).major) + "." + str(version.parse(full_version).minor)

ipex_available, ipex_version = _is_package_available("intel_extension_for_pytorch", return_version=True)

if not is_torch_available() or not ipex_available:
return False

torch_major_and_minor = get_major_and_minor_from_version(get_torch_version())
ipex_major_and_minor = get_major_and_minor_from_version(ipex_version)
if torch_major_and_minor != ipex_major_and_minor:
logger.warning_once(
f"Intel Extension for PyTorch {ipex_major_and_minor} needs to work with PyTorch {ipex_major_and_minor}.*,"
f" but PyTorch {get_torch_version()} is found. Please switch to the matching version and run again."
)
return False
if min_version:
return version.parse(ipex_version) >= version.parse(min_version)
return True


@lru_cache
def is_bitsandbytes_available(min_version: str = BITSANDBYTES_MIN_VERSION) -> bool:
is_available, bitsandbytes_version = _is_package_available("bitsandbytes", return_version=True)
Expand Down Expand Up @@ -1161,11 +1134,6 @@ def is_uroman_available() -> bool:
return _is_package_available("uroman")


@lru_cache
def is_ccl_available() -> bool:
return _is_package_available("torch_ccl") or _is_package_available("oneccl_bindings_for_pytorch")


@lru_cache
def is_sudachi_available() -> bool:
return _is_package_available("sudachipy")
Expand Down Expand Up @@ -1774,13 +1742,6 @@ def is_training_run_on_sagemaker() -> bool:
runtime after installation.
"""

# docstyle-ignore
CCL_IMPORT_ERROR = """
{0} requires the torch ccl library but it was not found in your environment. You can install it with pip:
`pip install oneccl_bind_pt -f https://developer.intel.com/ipex-whl-stable`
Please note that you may need to restart your runtime after installation.
"""

# docstyle-ignore
ESSENTIA_IMPORT_ERROR = """
{0} requires essentia library. But that was not found in your environment. You can install them with pip:
Expand Down Expand Up @@ -1869,7 +1830,6 @@ def is_training_run_on_sagemaker() -> bool:
("vision", (is_vision_available, VISION_IMPORT_ERROR)),
("scipy", (is_scipy_available, SCIPY_IMPORT_ERROR)),
("accelerate", (is_accelerate_available, ACCELERATE_IMPORT_ERROR)),
("oneccl_bind_pt", (is_ccl_available, CCL_IMPORT_ERROR)),
("cython", (is_cython_available, CYTHON_IMPORT_ERROR)),
("rjieba", (is_rjieba_available, RJIEBA_IMPORT_ERROR)),
("peft", (is_peft_available, PEFT_IMPORT_ERROR)),
Expand Down
25 changes: 14 additions & 11 deletions tests/generation/test_fsdp.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from collections.abc import Callable
from typing import Any

from transformers import is_torch_available, is_torch_xpu_available
from transformers import is_torch_available
from transformers.testing_utils import (
TestCasePlus,
backend_device_count,
Expand All @@ -28,19 +28,12 @@
torch_device,
torchrun,
)
from transformers.utils import is_ccl_available, is_ipex_available


if is_torch_available():
import functools

import torch

if is_torch_xpu_available():
if is_ipex_available():
import intel_extension_for_pytorch # noqa: F401
if is_ccl_available():
import oneccl_bindings_for_pytorch # noqa: F401
import torch.distributed
from torch.distributed._composable.fsdp import fully_shard, register_fsdp_forward_method
from torch.distributed.device_mesh import init_device_mesh
Expand Down Expand Up @@ -154,12 +147,22 @@ def test_generic_task_model_can_be_sharded(self):
from torch.distributed.fsdp import fully_shard
from transformers import AutoModelForTokenClassification

current_accelerator = torch.accelerator.current_accelerator(check_available=True)
accelerator_type = "cpu" if current_accelerator is None else current_accelerator.type
torch_accelerator_module = getattr(torch, accelerator_type, torch.cuda)

backend = "gloo"
if accelerator_type == "cuda":
backend = "nccl"
elif accelerator_type == "xpu":
backend = "xccl"

torch.distributed.init_process_group(
backend="nccl" if torch.cuda.is_available() else "gloo", init_method="env://"
backend=backend, init_method="env://"
)
rank = torch.distributed.get_rank()
if torch.cuda.is_available():
torch.cuda.set_device(rank)
if torch_accelerator_module.is_available():
torch_accelerator_module.set_device(rank)

# Make sure it works
model = AutoModelForTokenClassification.from_pretrained("Qwen/Qwen2-0.5B")
Expand Down
7 changes: 1 addition & 6 deletions tests/generation/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@

import numpy as np
import pytest
from packaging import version
from parameterized import parameterized

from transformers import (
Expand Down Expand Up @@ -55,7 +54,7 @@
slow,
torch_device,
)
from transformers.utils import is_ipex_available, is_sklearn_available, is_torchdynamo_exporting
from transformers.utils import is_sklearn_available, is_torchdynamo_exporting
from transformers.utils.generic import is_flash_attention_requested


Expand Down Expand Up @@ -539,10 +538,6 @@ def test_beam_search_generate_dict_outputs_use_cache(self):
@require_torch_multi_accelerator
@pytest.mark.generate
def test_model_parallel_beam_search(self):
if "xpu" in torch_device:
if not (is_ipex_available("2.5") or version.parse(torch.__version__) >= version.parse("2.6")):
self.skipTest(reason="device_map='auto' does not work with XPU devices")

for model_class in self.all_generative_model_classes:
if model_class._no_split_modules is None:
continue
Expand Down
2 changes: 1 addition & 1 deletion tests/pipelines/test_pipelines_audio_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ def test_small_model_pt_fp16(self):
output = audio_classifier(audio, top_k=4)

# Expected outputs are collected running the test on torch 2.6 in few scenarios.
# Running on CUDA T4/A100 and on XPU PVC (note: using stock torch xpu, NOT using IPEX):
# Running on CUDA T4/A100 and on XPU PVC:
EXPECTED_OUTPUT = [
{"score": 0.0833, "label": "go"},
{"score": 0.0833, "label": "off"},
Expand Down
7 changes: 4 additions & 3 deletions tests/quantization/autoround/test_auto_round.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,14 @@
import tempfile
import unittest

import pytest

from transformers import AutoModelForCausalLM, AutoRoundConfig, AutoTokenizer
from transformers.testing_utils import (
backend_empty_cache,
backend_synchronize,
require_accelerate,
require_auto_round,
require_intel_extension_for_pytorch,
require_torch_accelerator,
require_torch_gpu,
require_torch_multi_accelerator,
Expand Down Expand Up @@ -106,7 +107,7 @@ def test_quantized_model_bf16(self):
output = quantized_model.generate(**input_ids, max_new_tokens=40, do_sample=False)
self.assertIn(self.tokenizer.decode(output[0], skip_special_tokens=True), self.EXPECTED_OUTPUTS)

@require_intel_extension_for_pytorch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can skip those tests or require a specific tag to be enabled ? It will be too slow on our CI to run the model on our cpu.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SunMarc, done, thx

@pytest.mark.skip(reason="This test is temperarily disabled for CI machine's CPU is slow")
def test_quantized_model_on_cpu(self):
"""
Simple test that checks if the quantized model is working properly
Expand Down Expand Up @@ -174,7 +175,7 @@ def test_convert_from_gptq(self):
inputs = tokenizer(text, return_tensors="pt").to(model.device)
tokenizer.decode(model.generate(**inputs, max_new_tokens=5)[0])

@require_intel_extension_for_pytorch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pytest.mark.skip(reason="This test is temperarily disabled for CI machine's CPU is slow")
def test_convert_from_awq_cpu(self):
"""
Simple test that checks if auto-round work properly with awq format
Expand Down
Loading