Inconsistency detected by ld.so: dl-version.c: 224: _dl_check_map_versions: Assertion `needed != NULL' failed! #9754

Feywell · 2021-11-13T07:58:37Z

Describe the bug
use onnxruntime-gpu inference my own onnx model. It works well when I use data in cpu device.
But there is a error throwed when I use data in gpu device.

It works well by this code:
ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(img.numpy())
It will fail by this:
ortvalue = onnxruntime.OrtValue.ortvalue_from_numpy(img_lq.numpy(), device_type="cuda", device_id=0)
The error is:

Inconsistency detected by ld.so: dl-version.c: 224: _dl_check_map_versions: Assertion `needed != NULL' failed!

Urgency
If there are particular important use cases blocked by this or strict project-related timelines, please share more information and dates. If there are no hard deadlines, please specify none.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
ONNX Runtime installed from (source or binary): pypi
ONNX Runtime version: onnxruntime-gpu 1.9.0
Python version: Python 3.6.13
Visual Studio version (if applicable):
GCC/Compiler version (if compiling from source): GCC 7.3.0
CUDA/cuDNN version: cudatoolkit 10.1.243
GPU model and memory: gtx 2080ti

To Reproduce

Describe steps/code to reproduce the behavior.
Attach the ONNX model to the issue (where applicable) to expedite investigation.

Expected behavior
Any help to use gpu version tor inference onnx model?

Screenshots

The text was updated successfully, but these errors were encountered:

yuslepukhin · 2021-11-15T18:26:23Z

I have searched the internet and this appears to be relevant.

yuslepukhin · 2021-11-15T18:29:18Z

Other results suggest that it may be due to a missing library. For that you would have to use ldd to search for libraries dependencies and find out what is missing on the system. Unfortunately, it may be some obscure library.

snnn · 2021-11-16T18:41:46Z

It was caused by the patchelf tool we use.
An alternative solution: Patch https://github.com/pypa/auditwheel, add a custom policy file to whitelist CUDA libraries. Then we can remove our hacks in setup.py.

The warnings generated by ldd doesn't impact use, except you can't use ldd with it.

adk9 · 2021-11-24T20:28:16Z

I'm also running into the same issue with onnxruntime-training (1.9.0) and onnxruntime-gpu (1.9.0) wheels installed from PyPI, trying to train a simple model using the CUDA EP. @snnn, the suggested fix above is not clear to me; can you elaborate on it?

snnn · 2021-11-24T23:48:39Z

First, onnxruntime python packages, "onnxruntime" and "onnxruntime-gpu", follow manylinux2014(pep-0599 ) standard. But the gpu one, onnxruntime-gpu, isn't fully compliant.

The PEP 599 policy says: "The wheel's binary executables or shared objects may not link against externally-provided libraries except those in the following list"

libgcc_s.so.1
libstdc++.so.6
libm.so.6
libdl.so.2
librt.so.1
libc.so.6
libnsl.so.1
libutil.so.1
libpthread.so.0
libresolv.so.2
libX11.so.6
libXext.so.6
libXrender.so.1
libICE.so.6
libSM.so.6
libGL.so.1
libgobject-2.0.so.0
libgthread-2.0.so.0
libglib-2.0.so.0

But we need CUDA. And CUDA isn't in the list. BTW, if you run ldd with onnxruntime's cpu only package, "onnxruntime", you won't see the error.

The policy was designed as any external dependency should be packed into the wheel file. However, we can't. Because,

It will make the package over size. Pypi.org has a 100MB per wheel file size limit.
It could cause license issues. I'm not sure if we could redistribute Nvidia's CUDA libraries.

So we did a dirty hack. Before pack the wheel, we patch the so file to pretend it doesn't depend on CUDA. To cheat on manylinux's auditwheel tool. Then we pack the wheel and manually load the CUDA libraries. The error message you saw is caused the tool for patching *.so files: patchelf. If we don't use the tool, we won't have this issue.

Alternatively, we could modify the policy. Patch the auditwheel tool, add a custom policy file to whitelist CUDA libraries. The file is: https://github.com/pypa/auditwheel/blob/main/src/auditwheel/policy/manylinux-policy.json . See #144 for more information.

(Hi @adk9, the above answer is only for onnxruntime inference packages. onnxruntime-training package is built in a special way that I'm not familiar. )

GuillaumeTong · 2022-02-14T10:11:38Z

Hi @snnn thank you for the detailed explanation of the problem. If I understand correctly, this is basically a problem that stems from not being able to specify CUDA as a pip dependency?

I am afraid that as a small potato, not familiar with the deeper workings of pip and auditwheel, I am unable to implement the fix you are describing.

Could the problem be fixed by manually installing a compatible version of CUDA?

Otherwise, could you give a more line-by line set of instruction on what files to edit and what actions to take after the edit (I take it I need to build the wheel after editing the rules? I have never done that before)?

snnn · 2022-02-14T16:41:58Z

Could the problem be fixed by manually installing a compatible version of CUDA?

Yes. Then ldd will still not work, but the onnxruntime python package should be good.

GuillaumeTong · 2022-02-15T03:49:29Z

In my specific case, after looking at the onnx runtime requirements again, I noticed that I might be missing cudnn.
I tried installing libcudnn8 and libcudnn8-dev, and I was able to run my code successfully. Not quite sure if libcudnn8-dev was necessary or if libcudnn8 would have been sufficient.

VikasOjha666 · 2022-04-13T14:09:05Z

@GuillaumeTong @snnn I am facing the same issue. My cuda version is 10.2 followed the same installation instructions for cudnn for cuda 10.2 still getting the same issue. My OS is Ubuntu 18 on AWS with T4 GPU. What could be the reason?. I also have CUDA 11.6 on my system as well. This problem is arriving in my case with onnxruntime-training installation. Below is the code on execution of which the error is occurring:

from onnxruntime import OrtValue
import numpy as np
x = OrtValue.ortvalue_from_numpy(np.random.rand(3), 'cuda')

works for:
x = OrtValue.ortvalue_from_numpy(np.random.rand(3), 'cpu')

snnn · 2022-04-14T02:02:41Z

I'm not familiar with the onnxruntime-training package. @askhade, could you please help? I think there might be some code in pytorch ran this "ldd" command. Do you know how to reproduce it?

farzanehnakhaee70 · 2022-04-25T14:28:26Z

It also solved my issue when upgrading the CUDA to the compatible versions with the onnxruntime-gpu. Thanks a lot.

camblomquist · 2022-05-11T19:50:00Z

I'm in a bit of a similar pickle here, though it might be one outside the scope of this issue or project. Environment includes Ubuntu 18.04, CUDA 11.4, CUDNN 8.2.4, Python 3.6, onnxruntime-gpu coming from pip. I'm packing up the project as a onedir executable using PyInstaller.
On the build machine, it appears to work as expected. When taking the packaged executable onto a different machine, I get hit with the Inconsistency detected assertion failure. This different machine is also running with CUDA 11.4, but uses a different GPU. For reasons, I'm trying to avoid modifying the system itself on this machine, so it needs to be able to work using the libraries included by PyInstaller.
PyInstaller appears to be properly including all of the relevant CUDA libraries and I've manually specified the inclusion of the onnxruntime_provider_cuda.so and *_shared.so files. For giggles, removing the CUDA libraries from the directory does correctly cause the program to error out due to missing files rather than assertions.
When running with LD_DEBUG=all, the last line before the assertion fail is Checking for version 'libcufft.so.10'... where the rest of it is just specifying that it's required by the onnxruntime_provider library. I don't know if this information is of any use.

camblomquist · 2022-05-12T16:37:14Z

My apologies for the ramble. Desperation tends to do that. I had resolved the issue on my own. It turns out that PyInstaller was not including all of the necessary CUDA libraries. Including them manually allowed onnxruntime to start up (and then crash when it couldn't find the cudnn_*_infer libraries, but that error was transparent.)

I will say though that it is incredibly frustrating to have spent the time on what ended up being a fairly simple issue. I understand that the import hack is done to avoid the eyes of the auditor, but this same hack made it much more difficult to realize that it was just a matter of a missing dependency. I know I'm barking up the wrong tree here since I could've just not used Python, but this type of error would've happened much sooner in the pipeline and likely had a more useful error message in any compiled language.

biendltb · 2022-08-04T08:02:02Z

If you are running with TensorrtExecutionProvider, reinstall libnvinfer libraries solves the issue in my case:

Uninstall libnvinfer libraries:

sudo apt-get purge "libnvinfer*"

Install all libnvinfer libraries for your cuda version. Run apt-cache policy libnvinfer8 to check available versions.

For cuda-11.4 and libnvinfer 8.2.5.1:

sudo apt install libnvinfer8=8.2.5-1+cuda11.4 libnvinfer-plugin8=8.2.5-1+cuda11.4 libnvparsers8=8.2.5-1+cuda11.4 libnvonnxparsers8=8.2.5-1+cuda11.4 libnvinfer-dev=8.2.5-1+cuda11.4 libnvinfer-plugin-dev=8.2.5-1+cuda11.4 libnvparsers-dev=8.2.5-1+cuda11.4 libnvonnxparsers-dev=8.2.5-1+cuda11.4 cuda-cudart-dev-11-4 libcublas-dev-11-4

For cuda-11.6 and libinfer 8.4.3 (tested also with cuda-11.8):

sudo apt install libnvinfer8=8.4.3-1+cuda11.6 libnvinfer-plugin8=8.4.3-1+cuda11.6 libnvparsers8=8.4.3-1+cuda11.6 libnvonnxparsers8=8.4.3-1+cuda11.6 libnvinfer-dev=8.4.3-1+cuda11.6 libnvinfer-plugin-dev=8.4.3-1+cuda11.6 libnvinfer-bin=8.4.3-1+cuda11.6 libnvparsers-dev=8.4.3-1+cuda11.6 libnvonnxparsers-dev=8.4.3-1+cuda11.6 cuda-cudart-dev-11-6 libcublas-dev-11-6

Hope it helps.

gorkemgoknar · 2022-10-07T10:55:49Z

I hit the similar issue.
Apparently you need to use correct onnxruntime-gpu version relevant to your system cuda installation.
I have cuda 10.2 installed, after installing onnxruntime-gpu==1.6 I no longer saw this same error.
Onnxruntime requirements:
https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements

Apisteftos · 2023-05-15T01:11:58Z

First, onnxruntime python packages, "onnxruntime" and "onnxruntime-gpu", follow manylinux2014(pep-0599 ) standard. But the gpu one, onnxruntime-gpu, isn't fully compliant.

The PEP 599 policy says: "The wheel's binary executables or shared objects may not link against externally-provided libraries except those in the following list"

libgcc_s.so.1

libstdc++.so.6

libm.so.6

libdl.so.2

librt.so.1

libc.so.6

libnsl.so.1

libutil.so.1

libpthread.so.0

libresolv.so.2

libX11.so.6

libXext.so.6

libXrender.so.1

libICE.so.6

libSM.so.6

libGL.so.1

libgobject-2.0.so.0

libgthread-2.0.so.0

libglib-2.0.so.0

But we need CUDA. And CUDA isn't in the list. BTW, if you run ldd with onnxruntime's cpu only package, "onnxruntime", you won't see the error.

The policy was designed as any external dependency should be packed into the wheel file. However, we can't. Because,

It will make the package over size. Pypi.org has a 100MB per wheel file size limit.

It could cause license issues. I'm not sure if we could redistribute Nvidia's CUDA libraries.

So we did a dirty hack. Before pack the wheel, we patch the so file to pretend it doesn't depend on CUDA. To cheat on manylinux's auditwheel tool. Then we pack the wheel and manually load the CUDA libraries. The error message you saw is caused the tool for patching *.so files: patchelf. If we don't use the tool, we won't have this issue.

Alternatively, we could modify the policy. Patch the auditwheel tool, add a custom policy file to whitelist CUDA libraries. The file is: https://github.com/pypa/auditwheel/blob/main/src/auditwheel/policy/manylinux-policy.json . See #144 for more information.

(Hi @adk9, the above answer is only for onnxruntime inference packages. onnxruntime-training package is built in a special way that I'm not familiar. )

Can you provide bit more help about how to patch the auditwheel tool and create a custom policy file to whitelist CUDA libraries? What are the steps? I have the same issue, I downgraded my cuda version from 11.8 to 11.6 but the problem still remains. I am trying to inference some images, but it works only for cpu and unfortunately not for cuda.

yongjer · 2023-06-17T15:37:51Z

I have the same issue, I'm using docker env
FROM nvcr.io/nvidia/pytorch:23.02-py3 RUN apt-get update && pip install \ transformers \ datasets \ accelerate \ optimum[onnxruntime-gpu] \ diffusers \ evaluate \ jupyter \ notebook \ && \ rm -rf /var/lib/apt/lists/

mattip · 2023-07-27T06:42:26Z

Stumbled across this issue. FWIW, in newer auditwheel there is an option to exclude shared objects that will be provided in a different manner. This is the PR that added the option --exclude pypa/auditwheel#368, specifically for the use case described here:

$ auditwheel repair --help
usage: auditwheel repair [-h] [--plat PLATFORM] [-L LIB_SDIR] [-w WHEEL_DIR] [--no-update-tags] [--strip] [--exclude EXCLUDE]
                         [--only-plat]
                         WHEEL_FILE [WHEEL_FILE ...]

Vendor in external shared library dependencies of a wheel.
If multiple wheels are specified, an error processing one
wheel will abort processing of subsequent wheels.
...
 --exclude EXCLUDE     Exclude SONAME from grafting into the resulting wheel (can be specified multiple times)

snnn · 2023-08-15T05:24:47Z

I will need to rework the PR #1282 .

### Description Resolve #9754

### Description 1. Delete Prefast tasks (#17522) 2. Disable yum update (#17551) 3. Avoid calling patchelf (#17365 and #17562) we that we can validate the above fix The main problem I'm trying to solve is: our GPU package depends on both CUDA 11.x and CUDA 12.x . However, it's not easy to see the information because ldd doesn't work with the shared libraries we generate(see issue #9754) . So the patchelf change are useful for me to validate the "Disabling yum update" was successful. As you can see we call "yum update" from multiple places. Without some kind of validation it's hard to say if I have covered all of them. The Prefast change is needed because I'm going to update the VM images in the next a few weeks. In case of we need to publish a patch release after that. ### Motivation and Context Without this fix we will mix using CUDA 11.x and CUDA 12.x. And it will crash every time when we use TensorRT.

### Description Resolve microsoft#9754

hariharans29 added the api:Python label Nov 15, 2021

RyanUnderhill mentioned this issue Mar 11, 2022

Onnxruntime-gpu try to find missmatched cuda lib. #10806

Closed

sophies927 added api issues related to all other APIs: C, C++, Python, etc. and removed api:Python labels Aug 12, 2022

thomas-beznik mentioned this issue Sep 15, 2022

ONNX Runtime much slower than PyTorch (2-3x slower) #12880

Open

brevity2021 mentioned this issue Oct 20, 2022

what is the right way to convert ortValue to torch.tensor for GPU? #10327

Closed

snnn mentioned this issue Aug 15, 2023

Always getting "Failed to create CUDAExecutionProvider" #11092

Open

snnn mentioned this issue Aug 31, 2023

Avoid calling patchelf #17365

Merged

snnn closed this as completed in #17365 Sep 8, 2023

snnn added a commit that referenced this issue Sep 8, 2023

Avoid calling patchelf (#17365)

f51a765

### Description Resolve #9754

jFkd1 mentioned this issue Sep 14, 2023

Failed to create CUDAExecutionProvider. #17537

Closed

snnn added a commit that referenced this issue Sep 15, 2023

Avoid calling patchelf (#17365)

a678387

### Description Resolve #9754

snnn mentioned this issue Sep 15, 2023

Cherry-picks pipeline changes to 1.16.0 release branch #17577

Merged

Technologicat mentioned this issue Jan 23, 2024

RAG documentation SillyTavern/SillyTavern#1671

Closed

kleiti pushed a commit to kleiti/onnxruntime that referenced this issue Mar 22, 2024

Avoid calling patchelf (microsoft#17365)

73e0aa7

### Description Resolve microsoft#9754

giordano mentioned this issue Apr 7, 2024

diffutils: Inconsistency detected by ld.so: dl-version.c: 204: _dl_check_map_versions: Assertion `needed != NULL' failed! JuliaPackaging/Yggdrasil#8461

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency detected by ld.so: dl-version.c: 224: _dl_check_map_versions: Assertion `needed != NULL' failed! #9754

Inconsistency detected by ld.so: dl-version.c: 224: _dl_check_map_versions: Assertion `needed != NULL' failed! #9754

Feywell commented Nov 13, 2021

yuslepukhin commented Nov 15, 2021

yuslepukhin commented Nov 15, 2021

snnn commented Nov 16, 2021

adk9 commented Nov 24, 2021

snnn commented Nov 24, 2021 •

edited

GuillaumeTong commented Feb 14, 2022

snnn commented Feb 14, 2022

GuillaumeTong commented Feb 15, 2022 •

edited

VikasOjha666 commented Apr 13, 2022 •

edited

snnn commented Apr 14, 2022

farzanehnakhaee70 commented Apr 25, 2022

camblomquist commented May 11, 2022

camblomquist commented May 12, 2022

biendltb commented Aug 4, 2022 •

edited

gorkemgoknar commented Oct 7, 2022

Apisteftos commented May 15, 2023

yongjer commented Jun 17, 2023 •

edited

mattip commented Jul 27, 2023

snnn commented Aug 15, 2023

Inconsistency detected by ld.so: dl-version.c: 224: _dl_check_map_versions: Assertion `needed != NULL' failed! #9754

Inconsistency detected by ld.so: dl-version.c: 224: _dl_check_map_versions: Assertion `needed != NULL' failed! #9754

Comments

Feywell commented Nov 13, 2021

yuslepukhin commented Nov 15, 2021

yuslepukhin commented Nov 15, 2021

snnn commented Nov 16, 2021

adk9 commented Nov 24, 2021

snnn commented Nov 24, 2021 • edited

GuillaumeTong commented Feb 14, 2022

snnn commented Feb 14, 2022

GuillaumeTong commented Feb 15, 2022 • edited

VikasOjha666 commented Apr 13, 2022 • edited

snnn commented Apr 14, 2022

farzanehnakhaee70 commented Apr 25, 2022

camblomquist commented May 11, 2022

camblomquist commented May 12, 2022

biendltb commented Aug 4, 2022 • edited

gorkemgoknar commented Oct 7, 2022

Apisteftos commented May 15, 2023

yongjer commented Jun 17, 2023 • edited

mattip commented Jul 27, 2023

snnn commented Aug 15, 2023

snnn commented Nov 24, 2021 •

edited

GuillaumeTong commented Feb 15, 2022 •

edited

VikasOjha666 commented Apr 13, 2022 •

edited

biendltb commented Aug 4, 2022 •

edited

yongjer commented Jun 17, 2023 •

edited