-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segmnentation fault when running test code #2
Comments
Yes, it may be that your EGL / GL libraries do not match the nvidia driver. Could you run the following in python:
and let me know the output? This will show what shared libraries the rasteriser is linking to. Also, try running the square_test script using gdb, to get a native stack trace for the segfault:
Thanks! |
Hi again @YuDeng, |
Hi @pmh47
And running the square_test.py gets: GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Thread 126 "python" received signal SIGSEGV, Segmentation fault.
Quit anyway? (y or n) y Seems that the problem comes from openGL ? |
Hi @YuDeng, Thanks for the info. Yes, it's an OpenGL issue; it is crashing during the very first call into an OpenGL function. However, all the libraries shown by ldd seem correct. It may be due to conflicts between GLVND and legacy GL libraries on your system, but I can't replicate the exact problem. Did you try running cmake with You could also try making the following changes to csrc/CMakeLists.txt to force use of GLVND libraries (requires cmake 3.10 or newer):
|
Hi @YuDeng , Have you solved it? I met the same problem as you, and after I ran cmake with -DOpenGL_GL_PREFERENCE=GLVND suggested by @pmh47, a new error "extensions eglQueryDevicesEXT, eglQueryDeviceAttribEXT and eglGetPlatformDisplayEXT not available" existed. I am confused to run the code for several days, could you help me to address it? |
Hi @dongdu3,
and ensure nvidia libraries are shown. |
Hi @pmh47 , Thanks for your quick reply. I have considered that, and the libEGL used is from the library libglvnd I build with the source code "https://github.com/NVIDIA/libglvnd". However, when I switch it to nvidia's libEGL, it runs into a segmentation fault (core dumped), just like YuDeng's description. BTW, my testing environment is cuda9.0, nvidia-384.130, and tensorflow-gpu1.8(1.7). While I tried the code in another PC(cuda8.0, nvidia-375, tensorflow-gpu1.4), all testing files(such as square_test.py) passed by using the nvidia's libEGL. Does it matter with the version of cuda or nvidia driver? |
Hi @dongdu3, In this case I think the segmentation fault is 'better' than the error about extensions, as it means at least EGL is exposing the necessary functions, even if the OpenGL context it creates is not then working properly. I'm not sure about using the source version of libglvnd -- the one that comes with the nvidia driver package may do something magic to interface with their driver backend. What OS are you using? I'll try to replicate your exact setup. I've tested successfully with cuda 9.0 in the past, and newer drivers, without problems. |
Hi @pmh47, Thanks again. My OS is ubuntu16.04.3. |
Hi @pmh47, I success this time after uninstalling the libglvnd, and now ldd librasterise.so outputs: linux-vdso.so.1 => (0x00007ffdb19ba000) I would like to study your paper and code to render depth maps or normal maps of meshes for another differentiable loss during training. Thank you for your kindness and patience. |
Hello, @dongdu3 , |
Same question. My report:
@pmh47 Could you help me? |
And I have try this suggestion: But the installation fails. |
@dongdu3 Could you teach me wether I have install libglvnd , and how to uninstall libglvnd? |
@li19960612 Have you solved this ? |
@mehameha998 I have not solved this problem yet. I think the GL and EGL have some conflicts of version, so I changed my CMakeList.txt to call libGL and libEGL with both nvidia version, but it does not work. I dont know what should I do. |
@mehameha998 Do you want to install the octopus with the dirt? If yes, would you mind leave a contact information so that we can have some communication in research? |
@li19960612 You should link libGL and libEGL, or libOpenGL and libEGL, not the _nvidia-xxx versions, but you should ensure that the version of those libraries installed with the nvidia driver is found in the linker path before any other version (e.g. mesa). Let me know the output of
and
and what version/distro of linux you are using. |
@mehameha998 The other way of doing it, with |
@mehameha998 I have solved the "segmentation fault" problem, I change the link "libGL.so" to "libOpenGL.so". But I dont konw what happened. |
@pmh47 Thank you for your patience! I success when I change my link "libGL.so" to "libOpenGL.so". I will study your paper and code conscientiously after that. |
@li19960612 @pmh47 .Thanks ! I change opengl/egl the library path to Nvidia dir, then it works. |
How to change link "libGL.so" to "libOpenGL.so" |
I got the same problem I think, when I try to build the docker image with
I get:
|
@francoisruty Unfortunately I don't have docker on this machine test it myself just now.
If that doesn't work, please send the output of running the following in python in the container:
and
|
I tried the changes you mentioned, still doesn't work, here is the 2 outputs you requested:
and
|
@francoisruty This is specific to tensorflow 1.14 (any maybe later); 1.13 or 1.12 work. I can only reproduce in the dockerfile for now; it works fine in a regular venv. Tracked as #34 |
roger that |
Hi @pmh47 , The gbd output is: Thread 47 "python2.7" received signal SIGSEGV, Segmentation fault.
Quit anyway? (y or n) I linked libEGL/GL to nvidia disk, but it didn't work anyway. |
@lyrgwlr
This configuration seems to be more reliable on modern systems. If it still doesn't work, please paste the output of ldd when using this changed version. |
@pmh47 Some ERRORs happened: CMake Error at CMakeLists.txt:42 (add_library): -- Generating done And I ran the make command: Its output is: |
@lyrgwlr Something strange has happened there; it's as if cmake has not even looked for the OpenGL::OpenGL and OpenGL::EGL components. What version of cmake are you using? 3.10 or newer is required for |
@pmh47 And the cmake result is : -- Configuring incomplete, errors occurred! |
@lyrgwlr Please do |
@pmh47 I think libEGL.so is exist. |
@lyrgwlr I'm not sure why cmake is failing to find it then. Please paste what |
@pmh47 Is that all right? |
Hi,I've the same problem.When I run the test code, it shows a segmentation fault (core dumped). And I run the following in python: it gives the following information:
0 And running 'gdb -ex r -ex bt -ex q --args python tests/square_test.py', GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Any advice would be greatly appreciated |
@pmh47 I have the same issue when running dirt on our cluster. There the libegl_nvidia.so is stored in /usr/lib/x86_64-linux-gnu folder together with other GL so files. Any idea how I can make Ubuntu ignore /usr/lib path without modifying files inside /etc/ld.so.conf.d/ (which I don't have the permission)? Thanks! |
@YinghaoHuang91 If it's building ok but failing at runtime, you may be able to set |
@pmh47 Still no luck. I set LD_PRELOAD to the libEGL.so.1, it's in the same folder as libEGL_nvidia.so.0. This is what I see when running gdb: [Switching to Thread 0x1554bc773700 (LWP 579851)] This is the output from tensorflow: import subprocess Any idea how to solve this annoying issue? |
Hi,when I run python tests/square_test.py,it will output: |
Hi, I've compiled your code successfully on my ubuntu server. But when I run the test code, it shows a segmentation fault (core dumped). It seems that something is wrong with egl. I'm new to opengl, and I searched for solutions on internet but didn't find proper result. Could you tell me how to fix this problem?
Thank you!
The text was updated successfully, but these errors were encountered: