Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Patch] Add loss for ORT inference #152

Merged
merged 15 commits into from Apr 29, 2022
Merged

[Patch] Add loss for ORT inference #152

merged 15 commits into from Apr 29, 2022

Conversation

JingyaHuang
Copy link
Collaborator

What does this PR do?

  • Wrap OnnxConfig by wrap_onnx_config_for_loss to obtain the loss while using ORTTrainer under the mode inference_with_ort=True.
  • Enable deepspeed for ONNX Runtime training. (Tested with ZeRO stage 2, full availability under progress)
  • Clean up unused dependencies in ORTTrainer.
  • Update CI of onnxruntime training.
  • Update associated tests.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@lewtun
Copy link
Member

lewtun commented Apr 26, 2022

This line is why the doc build is currently failing: https://github.com/huggingface/optimum/pull/152/files#diff-3e928ce0b52f617b86cd0df9399c6bbb5804d6269c6a04b486613eb929449256R25

Until now we never actually imported OnnxConfigWithLoss etc and doing so triggers an import error because transformers is pinned to <4.17 in the doc build (to have both intel and onnxruntime packages in the same env):

from optimum.onnxruntime.trainer import ORTTrainer

ImportError: cannot import name 'TensorType' from 'transformers.utils' (/home/lewis/miniconda3/envs/optimum/lib/python3.8/site-packages/transformers/utils/__init__.py)

The error arises because TensorType was moved from transformers.file_utils to transformers.utils in >=4.17

The solution is to refactor the doc build so that we build intel and onnxruntime in separate envs. Doing so will also allow us to build the Graphcore & other hardware partner docs as well. I don't have bandwidth for this right now, but happy to review a PR if someone else has time to tackle this!

@lewtun
Copy link
Member

lewtun commented Apr 26, 2022

FYI @JingyaHuang if you want to test that the docs build locally you can run:

pip install -e '.[dev,intel.onnxruntime]'
pip install git+https://github.com/huggingface/doc-builder.git
doc-builder build optimum docs/source --build_dir test-docs --version v1.0.0 --clean

You'll need a Linux machine for this since one cannot install intel on macOS

optimum/onnxruntime/trainer.py Show resolved Hide resolved
optimum/onnxruntime/trainer.py Outdated Show resolved Hide resolved
tests/onnxruntime/test_onnxruntime_train.py Outdated Show resolved Hide resolved
optimum/onnxruntime/trainer.py Show resolved Hide resolved
optimum/onnxruntime/trainer.py Outdated Show resolved Hide resolved
optimum/onnxruntime/trainer.py Outdated Show resolved Hide resolved
optimum/onnxruntime/trainer.py Outdated Show resolved Hide resolved
optimum/onnxruntime/trainer.py Outdated Show resolved Hide resolved
@echarlaix
Copy link
Collaborator

This PR looks great !

@JingyaHuang JingyaHuang merged commit 1c4b5b1 into main Apr 29, 2022
@JingyaHuang JingyaHuang deleted the patch-trainer-loss branch April 29, 2022 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants