[FIX] allow `Accelerator` to prepare models in eval mode for XPU&CPU #2426

faaany · 2024-02-08T06:02:47Z

Problem

When trying to run the nlp_example.py on Intel GPUs and CPUs, the prepare function in the following code will complain with the following:

Traceback (most recent call last):
  File "/soft/fanli/accelerate/examples/nlp_example.py", line 209, in <module>
    main()
  File "/soft/fanli/accelerate/examples/nlp_example.py", line 205, in main
    training_function(config, args)
  File "/soft/fanli/accelerate/examples/nlp_example.py", line 154, in training_function
    model, optimizer, train_dataloader, eval_dataloader, lr_scheduler = accelerator.prepare(
  File "/soft/fanli/accelerate/src/accelerate/accelerator.py", line 1217, in prepare
    args = self._prepare_ipex(*args)
  File "/soft/fanli/accelerate/src/accelerate/accelerator.py", line 1762, in _prepare_ipex
    model, optimizer = torch.xpu.optimize(
  File "/home/fan/anaconda3/envs/study2/lib/python3.10/site-packages/intel_extension_for_pytorch/xpu/utils.py", line 237, in optimize
    return frontend.optimize(
  File "/home/fan/anaconda3/envs/study2/lib/python3.10/site-packages/intel_extension_for_pytorch/frontend.py", line 339, in optimize
    assert optimizer is None, "The optimizer should not be given for inference mode"
AssertionError: The optimizer should not be given for inference mode

This is a bug because the ipex.optimize function expects the model to be in training mode, otherwise, it will assume that the user is doing inference as shown in this line.

Another thing I noticed is that the dtype passed to ipex.optimize() is fp32, but ipex.optimize() expects the dtype to be either bf16 or fp16 and the default value is None as stated here. So if no mixed_precision is used for training (currently only bf16 is supported), the dtype should keep the same with the default None value.

What does this PR do?

Fix the bug in _prepare_ipex and improve the dtype passed to ipex.optimize() or torch.xpu.optimize(). With this fix, the example code can now run on CPU and XPU both in single and distributed modes.

Who can review?

@muellerzr or @sywangyi

faaany · 2024-02-08T06:03:04Z

@yao-matrix

HuggingFaceDocBuilderDev · 2024-02-09T14:11:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

muellerzr

Thanks for the fix!

bug fix

5c00c6d

faaany changed the title ~~[FIX] enable nlp_example.py to run on XPU for both single and distributed modes~~ [FIX] allow accelerator.prepare to work for models in eval mode on XPU (single&distributed) Feb 8, 2024

faaany changed the title ~~[FIX] allow accelerator.prepare to work for models in eval mode on XPU (single&distributed)~~ [FIX] allow Accelerator to prepare models in eval mode for XPU&CPU Feb 8, 2024

muellerzr approved these changes Feb 9, 2024

View reviewed changes

muellerzr merged commit 9c1d5ba into huggingface:main Feb 9, 2024
21 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FIX] allow `Accelerator` to prepare models in eval mode for XPU&CPU #2426

[FIX] allow `Accelerator` to prepare models in eval mode for XPU&CPU #2426

faaany commented Feb 8, 2024 •

edited

Loading

faaany commented Feb 8, 2024

HuggingFaceDocBuilderDev commented Feb 9, 2024

muellerzr left a comment

[FIX] allow Accelerator to prepare models in eval mode for XPU&CPU #2426

[FIX] allow Accelerator to prepare models in eval mode for XPU&CPU #2426

Conversation

faaany commented Feb 8, 2024 • edited Loading

Problem

What does this PR do?

Who can review?

faaany commented Feb 8, 2024

HuggingFaceDocBuilderDev commented Feb 9, 2024

muellerzr left a comment

Choose a reason for hiding this comment

[FIX] allow `Accelerator` to prepare models in eval mode for XPU&CPU #2426

[FIX] allow `Accelerator` to prepare models in eval mode for XPU&CPU #2426

faaany commented Feb 8, 2024 •

edited

Loading