Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading traced pytorch model to C++ #124009

Closed
ZarinaMaks opened this issue Apr 13, 2024 · 5 comments
Closed

Loading traced pytorch model to C++ #124009

ZarinaMaks opened this issue Apr 13, 2024 · 5 comments
Assignees
Labels
intel This tag is for PR from Intel module: cpu CPU specific problem (e.g., perf, algorithm) module: mkl Related to our MKL support module: windows Windows support for PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ZarinaMaks
Copy link

ZarinaMaks commented Apr 13, 2024

馃悰 Describe the bug

Will be grateful for any help :(

I鈥檓 having a problem with using pytorch 2.2.2+cpu pretrained model in C++ via TorchLib 2.2.2 (cpu version).
I鈥檓 also using Visual Studio 2022 to work with C++ project which is located on D:\ drive. I鈥檝e added Torchlib to Cmake and it successfully downloads the model and works until line:

at::Tensor output = module.forward(inputs).toTensor();

After this line it crushes with the error: INTEL MKL ERROR: The specified module could not be found. mkl_avx2.1.dll. Intel MKL FATAL ERROR: Cannot load mkl_avx2.1.dll or mkl_def.1.dll. and I didn't figured out what is wrong with TorchLib here. Seems like the problem is with some part of the lib related to python, but it鈥檚 strange because the model loads to C++ by LibTorch successfully.

The .cpp code is here:

#include <torch/script.h>
...
    string path = "traced_energynn_model.pt";
    torch::jit::script::Module module;
    try {
        module = torch::jit::load(path);
    }
    catch (const c10::Error& e) {
        std::cerr << "error loading the model\n";
        return -1;
    }
    std::cout << "Model traced_energynn_model loaded fine\n";

    std::vector<torch::jit::IValue> inputs;
    inputs.push_back(torch::randn({ 1, 2 }));

    at::Tensor output = module.forward(inputs).toTensor();
    std::cout << output << "\n";
...

The way I saved the model in python 3.9:

from random import randrange

x = X_test.reset_index(drop=True).iloc[randrange(len(X_test))]
example = torch.tensor([x]).to(torch.float32)

traced_model = torch.jit.trace(model, example)
traced_model.save("traced_energynn_model.pt")

Versions

Collecting environment information...
PyTorch version: 2.2.2+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Home
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.28.1
Libc version: N/A

Python version: 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1660 Ti
Nvidia driver version: 516.94
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=2592
DeviceID=CPU0
Family=198
L2CacheSize=1536
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2592
Name=Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] flake8==4.0.1
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.25.2
[pip3] numpydoc==1.4.0
[pip3] torch==2.2.2
[pip3] torchvision==0.17.1
[conda] blas 1.0 mkl
[conda] mkl 2021.4.0 haa95532_640
[conda] mkl-service 2.4.0 py39h2bbff1b_0
[conda] mkl_fft 1.3.1 py39h277e83a_0
[conda] mkl_random 1.2.2 py39hf11a4ad_0
[conda] numpy 1.25.2 pypi_0 pypi
[conda] numpydoc 1.4.0 py39haa95532_0
[conda] torch 2.2.2 pypi_0 pypi
[conda] torchvision 0.17.1 pypi_0 pypi

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @vladimir-aubrecht @iremyux @Blackhex @cristianPanaite @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

@jgong5 jgong5 added module: windows Windows support for PyTorch module: cpu CPU specific problem (e.g., perf, algorithm) labels Apr 15, 2024
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 15, 2024
@LLimerence
Copy link

Hello, have you solved this problem

@xuhancn
Copy link
Collaborator

xuhancn commented Apr 19, 2024

I checked pytorch builder scripts https://github.com/search?q=repo%3Apytorch%2Fbuilder+path%3A%2F%5Ewindows%5C%2Finternal%5C%2F%2F+mkl&type=code
image

It seems Windows libtorch not copy all mkl depend libs, and current pytorch Linux already static linked mkl.
@ZarinaMaks could you please share a example code, and let me reproduce issue?

@xuhancn
Copy link
Collaborator

xuhancn commented Apr 19, 2024

pytorch/builder#1790 @jgong5 Please review and comment this PR.

malfet pushed a commit to pytorch/builder that referenced this issue Apr 24, 2024
From pytorch issue: pytorch/pytorch#124009 I found libtorch seems use shared mkl lib and missing some mkl dll files.
1. Currently pytorch Linux already use static mkl lib.
2. Windows can also support static mkl lib, I have validated as pytorch/pytorch#116946

So, this PR will switch pytorch to use static mkl lib.
I have tested PR on my local PC:
<img width="1151" alt="image" src="https://github.com/pytorch/builder/assets/8433590/d727c361-3344-4d95-ac2e-8dc307b74690">
malfet pushed a commit to pytorch/builder that referenced this issue Apr 26, 2024
resubmit #1790 with fix PR #1797.

From pytorch issue: pytorch/pytorch#124009 I found libtorch seems use shared mkl lib and missing some mkl dll files.
1. Currently pytorch Linux already use static mkl lib.
2. Windows can also support static mkl lib, I have validated as pytorch/pytorch#116946

Tested in https://github.com/pytorch/pytorch/actions/runs/8836875904/job/24264643410
@xuhancn xuhancn added the intel This tag is for PR from Intel label Apr 26, 2024
@xuhancn
Copy link
Collaborator

xuhancn commented Apr 27, 2024

Hi @ZarinaMaks
I have submitted some PRs to fix this issue: #124925 & pytorch/builder#1798
Could you please help on test the latest nightly build?
Install command:

python -m pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cpu --upgrade --force-reinstall

Thanks.

@xuhancn xuhancn added the module: mkl Related to our MKL support label Apr 27, 2024
@leslie-fang-intel
Copy link
Collaborator

Close this issue since the fixing PR landed. Feel free to re-open if any further discussion needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
intel This tag is for PR from Intel module: cpu CPU specific problem (e.g., perf, algorithm) module: mkl Related to our MKL support module: windows Windows support for PyTorch triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants