Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running ORTModule with other EPs from ORT #78

Closed
chethanpk opened this issue Aug 30, 2021 · 11 comments
Closed

Running ORTModule with other EPs from ORT #78

chethanpk opened this issue Aug 30, 2021 · 11 comments
Assignees

Comments

@chethanpk
Copy link

I am building a new wheel with the OneDNN EP using Onnx runtime training. After that is installed, I install torch_ort and then run the configure, but it does not seem to work ( I get the same error asking me to run the configure again). From the instructions, I see that there is no recipe for this combination. Is this possible or is there any other way for me to build a custom wheel and use it to train bert model with OneDNN and ORT?

@natke
Copy link
Collaborator

natke commented Aug 30, 2021

Hi @chethanpk, can you please post the output of the configure step?

@chethanpk
Copy link
Author

chethanpk commented Aug 30, 2021

@natke
C:\Users\WOS>python -m torch_ort.configure
running build
running build_ext
C:\Python37\lib\site-packages\torch\utils\cpp_extension.py:305: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
building 'aten_op_executor' extension
Emitting ninja build file C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Python37\lib\site-packages\torch\lib /LIBPATH:C:\Python37\libs /LIBPATH:C:\Python37\PCbuild\amd64 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\ATLMFC\lib\x64" "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\lib\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\lib\um\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\ucrt\x64" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\um\x64" c10.lib torch.lib torch_cpu.lib torch_python.lib /EXPORT:PyInit_aten_op_executor C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\aten_op_executor\aten_op_executor.obj /OUT:build\lib.win-amd64-3.7\aten_op_executor.cp37-win_amd64.pyd /IMPLIB:C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\aten_op_executor\aten_op_executor.cp37-win_amd64.lib
Creating library C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\aten_op_executor\aten_op_executor.cp37-win_amd64.lib and object C:\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\build\temp.win-amd64-3.7\Release\Python37\lib\site-packages\onnxruntime\training\ortmodule\torch_cpp_extensions\aten_op_executor\aten_op_executor.cp37-win_amd64.exp
Generating code
Finished generating code

@chethanpk
Copy link
Author

Hi @natke, did you get a chance to take a look at this?

@baijumeswani
Copy link
Collaborator

Hi @chethanpk we currently do not have support for running torch_ort.configure on a Windows machine. Have you given this a try on a linux machine?

@chethanpk
Copy link
Author

@baijumeswani will try it on linux and let you know.

@chethanpk
Copy link
Author

@baijumeswani I tried it on linux and I was able to complete the training and it did not error out at ORTModule. However it is not using the OneDNN EP. It was using the default CPU EP.
Is there any way this has to be configured to use OneDNN EP?
The OnnxRuntime installation was done using the wheel I build with OneDNN enabled.

@baijumeswani
Copy link
Collaborator

Thanks @chethanpk for reporting this. On further looking, it would appear that we currently have support for cuda and rocm execution providers through ORTModule. I will ask internally to see if/how we can support this.
https://github.com/microsoft/onnxruntime/blob/master/orttraining/orttraining/python/training/ortmodule/_graph_execution_manager.py#L248-L250

@wschin
Copy link

wschin commented Nov 16, 2021

Just a minor update: supporting other EPs in ORTModules is on our to-do list but we don't have a deadline for it.

@chethanpk
Copy link
Author

Is there any update on this? I am currently running by forcing it to use DNNL EP by default and building the wheel with DNNL EP but we need it so that anyone else can directly build and use it.

@natke natke self-assigned this Mar 10, 2022
@natke
Copy link
Collaborator

natke commented Mar 10, 2022

Hi @chethanpk, I'm the PM for this package. Can you reach out to me at nakersha@microsoft.com and we can have a conversation about your use case

@baijumeswani
Copy link
Collaborator

Closing this issue now. Please re-open the issue in case we can provide more assistance through this channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants