Efficient Training on CPU

This guide focuses on training large models efficiently on CPU.

Mixed precision with IPEX

Mixed precision uses single (fp32) and half-precision (bf16/fp16) data types in a model to accelerate training or inference while still preserving much of the single-precision accuracy. Modern CPUs such as 3rd and 4th Gen Intel® Xeon® Scalable processors natively support bf16, so you should get more performance out of the box by enabling mixed precision training with bf16.

To further maximize training performance, you can use Intel® Extension for PyTorch (IPEX), which is a library built on PyTorch and adds additional CPU instruction level architecture (ISA) level support such as Intel® Advanced Vector Extensions 512 Vector Neural Network Instructions (Intel® AVX512-VNNI), and Intel® Advanced Matrix Extensions (Intel® AMX) for an extra performance boost on Intel CPUs. However, CPUs with only AVX2 (e.g., AMD or older Intel CPUs) are not guaranteed to have better performance under IPEX.

Auto Mixed Precision (AMP) for CPU backends has been enabled since PyTorch 1.10. AMP support for bf16 on CPUs and bf16 operator optimization is also supported in IPEX and partially upstreamed to the main PyTorch branch. You can get better performance and user experience with IPEX AMP.

Check more detailed information for Auto Mixed Precision.

IPEX installation:

IPEX release is following PyTorch, to install via pip:

PyTorch Version	IPEX version
2.1.x	2.1.100+cpu
2.0.x	2.0.100+cpu
1.13	1.13.0+cpu
1.12	1.12.300+cpu

Please run pip list | grep torch to get your pytorch_version, so you can get the IPEX version_name.

pip install intel_extension_for_pytorch==<version_name> -f https://developer.intel.com/ipex-whl-stable-cpu

You can check the latest versions in ipex-whl-stable-cpu if needed.

Check more approaches for IPEX installation.

Usage in Trainer

To enable auto mixed precision with IPEX in Trainer, users should add use_ipex, bf16 and no_cuda in training command arguments.

Take an example of the use cases on Transformers question-answering

Training with IPEX using BF16 auto mixed precision on CPU:

 python run_qa.py \
--model_name_or_path google-bert/bert-base-uncased \
--dataset_name squad \
--do_train \
--do_eval \
--per_device_train_batch_size 12 \
--learning_rate 3e-5 \
--num_train_epochs 2 \
--max_seq_length 384 \
--doc_stride 128 \
--output_dir /tmp/debug_squad/ \
--use_ipex \
--bf16 \
--use_cpu

If you want to enable use_ipex and bf16 in your script, add these parameters to TrainingArguments like this:

training_args = TrainingArguments(
    output_dir=args.output_path,
+   bf16=True,
+   use_ipex=True,
+   use_cpu=True,
    **kwargs
)

Practice example

Blog: Accelerating PyTorch Transformers with Intel Sapphire Rapids

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf_train_cpu.md

perf_train_cpu.md

Efficient Training on CPU

Mixed precision with IPEX

IPEX installation:

Usage in Trainer

Practice example

Files

perf_train_cpu.md

Latest commit

History

perf_train_cpu.md

File metadata and controls

Efficient Training on CPU

Mixed precision with IPEX

IPEX installation:

Usage in Trainer

Practice example