Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

towards windows support, switch to onnxruntime v1.9.0 #11

Merged
merged 13 commits into from
Oct 28, 2021
Merged

towards windows support, switch to onnxruntime v1.9.0 #11

merged 13 commits into from
Oct 28, 2021

Conversation

jw3126
Copy link
Owner

@jw3126 jw3126 commented Oct 26, 2021

No description provided.

@jw3126
Copy link
Owner Author

jw3126 commented Oct 26, 2021

@DrChainsaw I am trying to add windows support for you. Sadly I get an Access denied error. I am neither a windows expert, nor do I even have access to a windows machine. Could you try this branch locally on you machine?

@DrChainsaw
Copy link

Thanks for giving it a shot!

I get the same error when trying the branch locally. I have never encountered this error before, but a quick websearch pointed me to this issue: JuliaLang/julia#38993

When inspecting the permissions of the dll-file, it is indeed not executable by any user, pretty much the same checkboxes as the 1.6 picture from this post: JuliaLang/julia#38993 (comment)

The tl;dr seems to be that this is an issue with the supplier of the binary (onnxruntime in this case).

@DrChainsaw
Copy link

Manually setting read permissions for onnxruntime.dll gives me test failues like this:

increment2x3.onnx: Error During Test at E:\Programs\julia\.julia\packages\ONNXRunTime\KFR3P\test\test_highlevel.jl:9
 Got exception outside of a @test
 Load model from ??????????????????????????????????????????? failed:Load model ??????????????????????????????????????????? failed. File doesn't exist

 Stacktrace:
   [1] check_and_release
     @ E:\Programs\julia\.julia\packages\ONNXRunTime\KFR3P\src\capi.jl:452 [inlined]
   [2] into_julia(#unused#::Type{ONNXRunTime.CAPI.OrtSession}, api::ONNXRunTime.CAPI.OrtApi, objptr::Base.RefValue{Ptr{Nothing}}, status_ptr::Ptr{Nothing}, gchandles::Vector{Any})
     @ ONNXRunTime.CAPI E:\Programs\julia\.julia\packages\ONNXRunTime\KFR3P\src\capi.jl:416
   [3] CreateSession(api::ONNXRunTime.CAPI.OrtApi, env::ONNXRunTime.CAPI.OrtEnv, path::String, options::ONNXRunTime.CAPI.OrtSessionOptions)
     @ ONNXRunTime.CAPI E:\Programs\julia\.julia\packages\ONNXRunTime\KFR3P\src\capi.jl:513
   [4] load_inference(path::String; execution_provider::Symbol, envname::String, timer::TimerOutputs.TimerOutput)
     @ ONNXRunTime E:\Programs\julia\.julia\packages\ONNXRunTime\KFR3P\src\highlevel.jl:72
   [5] macro expansion
     @ E:\Programs\julia\.julia\packages\ONNXRunTime\KFR3P\test\test_highlevel.jl:11 [inlined]
   [6] macro expansion
     @ C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\Test\src\Test.jl:1282 [inlined]
   [7] macro expansion
     @ E:\Programs\julia\.julia\packages\ONNXRunTime\KFR3P\test\test_highlevel.jl:10 [inlined]
   [8] macro expansion
     @ C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\Test\src\Test.jl:1282 [inlined]
   [9] top-level scope
     @ E:\Programs\julia\.julia\packages\ONNXRunTime\KFR3P\test\test_highlevel.jl:9
  [10] include(fname::String)
     @ Base.MainInclude .\client.jl:451
  [11] top-level scope
     @ E:\Programs\julia\.julia\packages\ONNXRunTime\KFR3P\test\runtests.jl:1
  [12] include(fname::String)
     @ Base.MainInclude .\client.jl:451
  [13] top-level scope
     @ none:6
  [14] eval
     @ .\boot.jl:373 [inlined]
  [15] exec_options(opts::Base.JLOptions)
     @ Base .\client.jl:268
  [16] _start()

Path itself seems to be ok though:

julia> import ONNXRunTime

julia> ONNXRunTime.testdatapath("increment2x3.onnx")
"E:\\Programs\\julia\\.julia\\packages\\ONNXRunTime\\KFR3P\\src\\..\\test\\data\\increment2x3.onnx"

julia> ONNXRunTime.testdatapath("increment2x3.onnx") |> isfile
true

@jw3126
Copy link
Owner Author

jw3126 commented Oct 26, 2021

Thanks a lot for investigating @DrChainsaw . Could you manually download + unzip the official binary https://github.com/microsoft/onnxruntime/releases/download/v1.9.0/onnxruntime-win-x64-1.9.0.zip and place it in the artifact directory? Does it work for you?

@DrChainsaw
Copy link

Manually unzipping the above files in the artifact directory (overwriting what was there) seems to have given the files the right permissions. I still get the error above when running tests though.

@jw3126
Copy link
Owner Author

jw3126 commented Oct 26, 2021

Can you try an official binary from another release like 1.8.1?

@DrChainsaw
Copy link

DrChainsaw commented Oct 26, 2021

Same problem as above :(

The Load model from ??????????????????????????????????????????? failed:Load model ??????????????????????????????????????????? failed. File doesn't exist, that is. The file seems to have the right permissions.

@jw3126
Copy link
Owner Author

jw3126 commented Oct 26, 2021

Is there a binary that ships with the python onnxruntime? Could you try that one?

@DrChainsaw
Copy link

I tried looking for something which looked like a bundled install, but I couldn't find anything:

/e/Programs/julia/.julia/conda/3/Lib/site-packages/onnxruntime$ find . -name onnxruntime.dll
/e/Programs/julia/.julia/conda/3/Lib/site-packages/onnxruntime$ find . -name *.dll
./capi/onnxruntime_providers_shared.dll

Do you have any ideas where it could have ended up? I'll keep looking around a bit more in the meantime.

@jw3126
Copy link
Owner Author

jw3126 commented Oct 26, 2021

Strange I would have expected it in the same directory as onnxruntime_providers_shared.dll.

@DrChainsaw
Copy link

Does not look anything like what is in the artifact directory:

/e/Programs/julia/.julia/conda/3/Lib/site-packages/onnxruntime/capi$ ls
__init__.py  _ld_preload.py    onnxruntime_collect_build_info.py    onnxruntime_providers_shared.dll  onnxruntime_validation.py  version_info.py
__pycache__  _pybind_state.py  onnxruntime_inference_collection.py  onnxruntime_pybind11_state.pyd    training

Doesn't the error look like onnxruntime somehow gets a corrupted path? Or could this be because of some kind of incorrect binary problem?

@jw3126
Copy link
Owner Author

jw3126 commented Oct 26, 2021

Doesn't the error look like onnxruntime somehow gets a corrupted path? Or could this be because of some kind of incorrect binary problem?

I don't know. You can try to insert some print statements here:

function CreateSession(
to take a look at the path.

@jw3126
Copy link
Owner Author

jw3126 commented Oct 26, 2021

@jw3126
Copy link
Owner Author

jw3126 commented Oct 26, 2021

You could hardcode the path in that function playing around with the separator.

@jw3126
Copy link
Owner Author

jw3126 commented Oct 26, 2021

Maybe start with a path that does not contain any seperator.

@DrChainsaw
Copy link

Sorry, got distracted with other stuff.

I tried outputting the path and doing other separators and relative paths. The number of question marks seem to change with the number of characters in the path, but it is always the same error :(

Here is one example where path is printed out in the function you pointed out.

julia> OX.load_inference("Artifacts.toml"; execution_provider=:cpu)
path = "Artifacts.toml"
ERROR: Load model from ??????? failed:Load model ??????? failed. File doesn't exist

Stacktrace:
 [1] check_and_release
   @ E:\Programs\julia\.julia\dev\ONNXRunTime\src\capi.jl:452 [inlined]
 [2] into_julia(#unused#::Type{ONNXRunTime.CAPI.OrtSession}, api::ONNXRunTime.CAPI.OrtApi, objptr::Base.RefValue{Ptr{Nothing}}, status_ptr::Ptr{Nothing}, gchandles::Vector{Any})
   @ ONNXRunTime.CAPI E:\Programs\julia\.julia\dev\ONNXRunTime\src\capi.jl:416
 [3] CreateSession(api::ONNXRunTime.CAPI.OrtApi, env::ONNXRunTime.CAPI.OrtEnv, path::String, options::ONNXRunTime.CAPI.OrtSessionOptions)
   @ ONNXRunTime.CAPI E:\Programs\julia\.julia\dev\ONNXRunTime\src\capi.jl:514
 [4] load_inference(path::String; execution_provider::Symbol, envname::String, timer::TimerOutputs.TimerOutput)
   @ ONNXRunTime E:\Programs\julia\.julia\dev\ONNXRunTime\src\highlevel.jl:72
 [5] top-level scope
   @ REPL[25]:1

I also checked this out of pure desperation:

julia> Base.unsafe_string(Cstring(pointer(OX.testdatapath("increment2x3.onnx"))))
"E:\\Programs\\julia\\.julia\\dev\\ONNXRunTime\\src\\..\\test\\data\\increment2x3.onnx"

@jw3126
Copy link
Owner Author

jw3126 commented Oct 27, 2021

@DrChainsaw I tried switching to wchar strings as was pointed out in microsoft/onnxruntime#9568
Can you try if this works for you? I also tried to fix the permissions issue.

@jw3126
Copy link
Owner Author

jw3126 commented Oct 27, 2021

I have a hard time setting the permissions correctly. @DrChainsaw which permissions did you set on which files to avoid the Access denied?

@DrChainsaw
Copy link

Permissions look right now when I examine them in windows, but it still segfaults when running tests. Maybe a new issue unrelated to permissions?

It seems like the Cwstring is a step in the right direction though as I could run the load_inference before using the new binaries. I will try to manually install the binaries again and see if it still works.

@DrChainsaw
Copy link

Ok, its just the tests which segfault, the following command works in my repl with the binaries from Artifacts.toml:

julia> path = OX.testdatapath("increment2x3.onnx")
"E:\\Programs\\julia\\.julia\\dev\\ONNXRunTime\\src\\..\\test\\data\\increment2x3.onnx"

julia> OX.load_inference(path; execution_provider=:cpu)
ONNXRunTime.InferenceSession(ONNXRunTime.CAPI.OrtApi(Ptr{Nothing} @0x00007ff97dea30c0, etc.... for about 3 screens
────────────────────────────────────────────────────────────
                           Time                   Allocations      
                   ──────────────────────   ───────────────────────
 Tot / % measured:      1576s / 0.00%           1.54GiB / 0.00%

 Section   ncalls     time   %tot     avg     alloc   %tot      avg
 ──────────────────────────────────────────────────────────────────
 ──────────────────────────────────────────────────────────────────)

I'll run the tests manually one by one to see if I can pinpoint anything.

@DrChainsaw
Copy link

The first attempt at inference with the first model fails :(

PS E:\Programs\julia\.julia\dev\ONNXRunTime> julia --project

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.    
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.0-rc1 (2021-09-12)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release  
|__/                   |

julia> using Test

julia> using ONNXRunTime

julia> const OX = ONNXRunTime
ONNXRunTime

julia> using ONNXRunTime: juliatype

julia>   path = OX.testdatapath("increment2x3.onnx")
"E:\\Programs\\julia\\.julia\\dev\\ONNXRunTime\\src\\..\\test\\data\\increment2x3.onnx"

julia>         model = OX.load_inference(path, execution_provider=:cpu);

julia>   @test OX.input_names(model) == ["input"]
Test Passed
  Expression: OX.input_names(model) == ["input"]
   Evaluated: ["input"] == ["input"]

julia>    @test OX.output_names(model) == ["output"]
Test Passed
  Expression: OX.output_names(model) == ["output"]
   Evaluated: ["output"] == ["output"]

julia>      input = randn(Float32, 2,3)
2×3 Matrix{Float32}:
 -0.44249  0.779925  1.60568
 -0.67955  1.03036   0.459724

julia>      #= this works             =# model(Dict("input" => randn(Float32, 2,3)), ["output"])
PS E:\Programs\julia\.julia\dev\ONNXRunTime> $LASTEXITCODE
-1073741819 # I think this means segfault

@jw3126
Copy link
Owner Author

jw3126 commented Oct 27, 2021

Thanks for going through this. Can you try to pinpoint the segfault:

julia> using ONNXRunTime.CAPI

julia> using ONNXRunTime: testdatapath

julia> api = GetApi();

julia> env = CreateEnv(api, name="myenv");

julia> so = CreateSessionOptions(api);

julia> path = testdatapath("increment2x3.onnx");

julia> session = CreateSession(api, env, path, so);

julia> mem = CreateCpuMemoryInfo(api);

julia> input_array = randn(Float32, 2,3)
2×3 Matrix{Float32}:
 -0.888022   1.80374    0.49838
 -1.54438   -0.361334  -1.8961

julia> input_tensor = CreateTensorWithDataAsOrtValue(api, mem, vec(input_array), size(input_array)
);

julia> run_options = CreateRunOptions(api);

julia> input_names = ["input"];

julia> output_names = ["output"];

julia> inputs = [input_tensor];

julia> outputs = Run(api, session, run_options, input_names, inputs, output_names);

julia> output_tensor = only(outputs);

julia> output_array = GetTensorMutableData(api, output_tensor);

@DrChainsaw
Copy link

Glad I can help.

julia> using ONNXRunTime.CAPI

julia> using ONNXRunTime: testdatapath

julia> api = GetApi();

julia> env = CreateEnv(api, name="myenv");

julia> so = CreateSessionOptions(api);

julia> path = testdatapath("increment2x3.onnx");

julia> session = CreateSession(api, env, path, so);

julia> mem = CreateCpuMemoryInfo(api);

julia> input_array = randn(Float32, 2,3)
2×3 Matrix{Float32}:
 -1.20585   0.302946  0.355726
  0.425077  0.606975  0.337844

julia> input_tensor = CreateTensorWithDataAsOrtValue(api, mem, vec(input_array), size(input_array)
       );

julia> run_options = CreateRunOptions(api);

julia> input_names = ["input"];

julia> output_names = ["output"];

julia> inputs = [input_tensor];

julia> outputs = Run(api, session, run_options, input_names, inputs, output_names);

julia> output_tensor = only(outputs);

julia> output_array = GetTensorMutableData(api, output_tensor);
PS E:\Programs\julia\.julia\dev\ONNXRunTime> $LASTEXITCODE
-1073741819

@jw3126
Copy link
Owner Author

jw3126 commented Oct 28, 2021

Thanks! Lets try to narrow it down further:

using ONNXRunTime.CAPI
using ONNXRunTime: testdatapath

api = GetApi();
mem = CreateCpuMemoryInfo(api);
arr = randn(Float32, 2,3)
tensor = CreateTensorWithDataAsOrtValue(api, mem, vec(arr), size(arr))
arr2 = GetTensorMutableData(api, tensor);
@show arr
@show arr2

@jw3126
Copy link
Owner Author

jw3126 commented Oct 28, 2021

@DrChainsaw windows CI now passes! Can you check if CUDA support also works on windows?

@jw3126 jw3126 merged commit f967105 into main Oct 28, 2021
@jw3126 jw3126 mentioned this pull request Oct 28, 2021
@DrChainsaw
Copy link

Test pass with CUDA on my machine:

     Testing Running tests...
Test Summary: | Pass  Total
high level    |  101    101
Test Summary: | Pass  Total
Session       |   17     17
Test Summary:    | Pass  Total
tensor roundtrip |    9      9
[ Info: Found a working CUDA.jl package, running CUDA tests
  Downloaded artifact: onnxruntime_gpu
  Downloaded artifact: onnxruntime_gpu
Test Summary:   | Pass  Total
CUDA high level |    3      3
Test Summary:  | Pass  Total
CUDA low level |   11     11
     Testing ONNXRunTime tests passed 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants