Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmnentation fault when running test code #2

Closed
YuDeng opened this issue Sep 20, 2018 · 43 comments
Closed

segmnentation fault when running test code #2

YuDeng opened this issue Sep 20, 2018 · 43 comments

Comments

@YuDeng
Copy link

YuDeng commented Sep 20, 2018

Hi, I've compiled your code successfully on my ubuntu server. But when I run the test code, it shows a segmentation fault (core dumped). It seems that something is wrong with egl. I'm new to opengl, and I searched for solutions on internet but didn't find proper result. Could you tell me how to fix this problem?
Thank you!

@pmh47
Copy link
Owner

pmh47 commented Sep 20, 2018

Yes, it may be that your EGL / GL libraries do not match the nvidia driver.

Could you run the following in python:

import dirt.rasterise_ops
import subprocess
subprocess.call(['ldd', dirt.rasterise_ops._lib_path + '/librasterise.so'])

and let me know the output? This will show what shared libraries the rasteriser is linking to.

Also, try running the square_test script using gdb, to get a native stack trace for the segfault:

gdb -ex r -ex bt -ex q --args python tests/square_test.py

Thanks!

@pmh47
Copy link
Owner

pmh47 commented Sep 25, 2018

Hi again @YuDeng,
In case you still have this issue, note that another user fixed a very similar problem by adding
-DOpenGL_GL_PREFERENCE=GLVND to the cmake invocation. Whether this helps seems to depend on the exact system configuration, and it's only relevant for cmake 3.10 and newer.

@YuDeng
Copy link
Author

YuDeng commented Oct 8, 2018

Hi @pmh47
when I run subprocess.call, it gives the following information:

linux-vdso.so.1 =>  (0x00007fff54b15000)
libEGL.so.1 => /usr/lib/nvidia-384/libEGL.so.1 (0x00007f89fed7b000)
libGL.so.1 => /usr/lib/nvidia-384/libGL.so.1 (0x00007f89fea39000)
libtensorflow_framework.so => /home/yudeng/anaconda3/lib/python3.5/site-packages/tensorflow/libtensorflow_framework.so (0x00007f89fdbe9000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f89fd9e1000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f89fd7c4000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f89fd5c0000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f89fd23e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f89fcf35000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f89fcd1f000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f89fc955000)
/lib64/ld-linux-x86-64.so.2 (0x00007f89ff26f000)
libGLdispatch.so.0 => /usr/lib/nvidia-384/libGLdispatch.so.0 (0x00007f89fc687000)
libnvidia-tls.so.384.130 => /usr/lib/nvidia-384/tls/libnvidia-tls.so.384.130 (0x00007f89fc483000)
libnvidia-glcore.so.384.130 => /usr/lib/nvidia-384/libnvidia-glcore.so.384.130 (0x00007f89fa5c7000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f89fa28d000)
libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f89fa07b000)
libcublas.so.8.0 => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcublas.so.8.0 (0x00007f89f7036000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f89f61b8000)
libcudnn.so.6 => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.6 (0x00007f89ecc56000)
libcufft.so.8.0 => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcufft.so.8.0 (0x00007f89e3e07000)
libcurand.so.8.0 => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcurand.so.8.0 (0x00007f89dfe91000)
libcudart.so.8.0 => /usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudart.so.8.0 (0x00007f89dfc2b000)
libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f89dfa09000)
libnvidia-fatbinaryloader.so.384.130 => /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.130 (0x00007f89df7b7000)
libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f89df5b3000)
libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f89df3ad000)

And running the square_test.py gets:

GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
Starting program: /home/yudeng/anaconda3/bin/python square_test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff475d700 (LWP 27017)]
[New Thread 0x7ffff3f5c700 (LWP 27018)]
[New Thread 0x7fffef75b700 (LWP 27019)]
[New Thread 0x7fffecf5a700 (LWP 27020)]
[New Thread 0x7fffea759700 (LWP 27021)]
[New Thread 0x7fffe7f58700 (LWP 27022)]
[New Thread 0x7fffe5757700 (LWP 27023)]
[New Thread 0x7fffe2f56700 (LWP 27024)]
[New Thread 0x7fffe0755700 (LWP 27025)]
[New Thread 0x7fffddf54700 (LWP 27026)]
[New Thread 0x7fffdb753700 (LWP 27027)]
[New Thread 0x7fffdaf52700 (LWP 27028)]
[New Thread 0x7fffd6751700 (LWP 27029)]
[New Thread 0x7fffd3f50700 (LWP 27030)]
[New Thread 0x7fffd174f700 (LWP 27031)]
[New Thread 0x7fffd0f4e700 (LWP 27032)]
[New Thread 0x7fffce74d700 (LWP 27033)]
[New Thread 0x7fffc9f4c700 (LWP 27034)]
[New Thread 0x7fffc774b700 (LWP 27035)]
[New Thread 0x7fffc6f4a700 (LWP 27036)]
[New Thread 0x7fffc6749700 (LWP 27037)]
[New Thread 0x7fffc1f48700 (LWP 27038)]
[New Thread 0x7fffbd747700 (LWP 27039)]
[New Thread 0x7fffbaf46700 (LWP 27040)]
[New Thread 0x7fffba745700 (LWP 27041)]
[New Thread 0x7fffb5f44700 (LWP 27042)]
[New Thread 0x7fffb5743700 (LWP 27043)]
[Thread 0x7fffb5743700 (LWP 27043) exited]
[Thread 0x7fffb5f44700 (LWP 27042) exited]
[Thread 0x7fffba745700 (LWP 27041) exited]
[Thread 0x7fffbaf46700 (LWP 27040) exited]
[Thread 0x7fffbd747700 (LWP 27039) exited]
[Thread 0x7fffc1f48700 (LWP 27038) exited]
[Thread 0x7fffc6749700 (LWP 27037) exited]
[Thread 0x7fffc6f4a700 (LWP 27036) exited]
[Thread 0x7fffc774b700 (LWP 27035) exited]
[Thread 0x7fffc9f4c700 (LWP 27034) exited]
[Thread 0x7fffce74d700 (LWP 27033) exited]
[Thread 0x7fffd0f4e700 (LWP 27032) exited]
[Thread 0x7fffd174f700 (LWP 27031) exited]
[Thread 0x7fffd3f50700 (LWP 27030) exited]
[Thread 0x7fffd6751700 (LWP 27029) exited]
[Thread 0x7fffdaf52700 (LWP 27028) exited]
[Thread 0x7fffdb753700 (LWP 27027) exited]
[Thread 0x7fffddf54700 (LWP 27026) exited]
[Thread 0x7fffe0755700 (LWP 27025) exited]
[Thread 0x7fffe2f56700 (LWP 27024) exited]
[Thread 0x7fffe5757700 (LWP 27023) exited]
[Thread 0x7fffe7f58700 (LWP 27022) exited]
[Thread 0x7fffea759700 (LWP 27021) exited]
[Thread 0x7fffecf5a700 (LWP 27020) exited]
[Thread 0x7fffef75b700 (LWP 27019) exited]
[Thread 0x7ffff3f5c700 (LWP 27018) exited]
[Thread 0x7ffff475d700 (LWP 27017) exited]
[New Thread 0x7fffb5743700 (LWP 27056)]
[New Thread 0x7fffb5f44700 (LWP 27057)]
[New Thread 0x7fffba745700 (LWP 27058)]
[New Thread 0x7fffbaf46700 (LWP 27059)]
[New Thread 0x7fff6ce7a700 (LWP 27060)]
[New Thread 0x7fff6c679700 (LWP 27061)]
[New Thread 0x7fff6be78700 (LWP 27062)]
[New Thread 0x7fff6b677700 (LWP 27063)]
2018-10-08 15:02:43.765194: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
[New Thread 0x7fff65eb3700 (LWP 27064)]
[New Thread 0x7fff5d6b2700 (LWP 27065)]
[New Thread 0x7fff656b2700 (LWP 27066)]
[New Thread 0x7fff64eb1700 (LWP 27067)]
[New Thread 0x7fff5ffff700 (LWP 27068)]
[New Thread 0x7fff5f7fe700 (LWP 27069)]
[New Thread 0x7fff5effd700 (LWP 27070)]
[New Thread 0x7fff5e7fc700 (LWP 27071)]
[New Thread 0x7fff5dffb700 (LWP 27072)]
[New Thread 0x7fff5ceb1700 (LWP 27073)]
[New Thread 0x7fff37fff700 (LWP 27074)]
[New Thread 0x7fff377fe700 (LWP 27075)]
[New Thread 0x7fff36ffd700 (LWP 27076)]
[New Thread 0x7fff367fc700 (LWP 27077)]
[New Thread 0x7fff35ffb700 (LWP 27078)]
[New Thread 0x7fff357fa700 (LWP 27079)]
[New Thread 0x7fff34ff9700 (LWP 27080)]
[New Thread 0x7fff17fff700 (LWP 27081)]
[New Thread 0x7fff177fe700 (LWP 27082)]
[New Thread 0x7fff16ffd700 (LWP 27083)]
[New Thread 0x7fff167fc700 (LWP 27084)]
[New Thread 0x7fff15ffb700 (LWP 27085)]
[New Thread 0x7fff157fa700 (LWP 27086)]
[New Thread 0x7fff14ff9700 (LWP 27087)]
[New Thread 0x7ffef7fff700 (LWP 27088)]
[New Thread 0x7ffef77fe700 (LWP 27089)]
[New Thread 0x7ffef6ffd700 (LWP 27090)]
[New Thread 0x7ffef67fc700 (LWP 27091)]
[New Thread 0x7ffef5ffb700 (LWP 27092)]
[New Thread 0x7ffef57fa700 (LWP 27093)]
[New Thread 0x7ffef4ff9700 (LWP 27094)]
2018-10-08 15:02:44.054129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: Tesla M40 24GB major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:05:00.0
totalMemory: 22.40GiB freeMemory: 848.19MiB
[New Thread 0x7ffedffff700 (LWP 27095)]
[New Thread 0x7ffedf7fe700 (LWP 27096)]
2018-10-08 15:02:44.245877: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 1 with properties:
name: Tesla M40 24GB major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:08:00.0
totalMemory: 22.40GiB freeMemory: 848.12MiB
[New Thread 0x7ffedeffd700 (LWP 27097)]
[New Thread 0x7ffede7fc700 (LWP 27098)]
2018-10-08 15:02:44.426323: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 2 with properties:
name: Tesla M40 24GB major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:0d:00.0
totalMemory: 22.40GiB freeMemory: 22.29GiB
[New Thread 0x7ffeddffb700 (LWP 27099)]
[New Thread 0x7ffedd7fa700 (LWP 27100)]
2018-10-08 15:02:44.654860: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 3 with properties:
name: Tesla M40 24GB major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:13:00.0
totalMemory: 22.40GiB freeMemory: 848.19MiB
[New Thread 0x7ffedcff9700 (LWP 27101)]
[New Thread 0x7ffebbfff700 (LWP 27102)]
2018-10-08 15:02:44.912216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 4 with properties:
name: Tesla M40 24GB major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:83:00.0
totalMemory: 22.40GiB freeMemory: 848.19MiB
[New Thread 0x7ffebb7fe700 (LWP 27104)]
[New Thread 0x7ffebaffd700 (LWP 27105)]
2018-10-08 15:02:45.138821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 5 with properties:
name: Tesla M40 24GB major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:89:00.0
totalMemory: 22.40GiB freeMemory: 848.19MiB
[New Thread 0x7ffeba7fc700 (LWP 27106)]
[New Thread 0x7ffeb9ffb700 (LWP 27107)]
2018-10-08 15:02:45.358027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 6 with properties:
name: Tesla M40 24GB major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:8e:00.0
totalMemory: 22.40GiB freeMemory: 22.29GiB
[New Thread 0x7ffeb97fa700 (LWP 27108)]
[New Thread 0x7ffeb8ff9700 (LWP 27109)]
2018-10-08 15:02:45.578721: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 7 with properties:
name: Tesla M40 24GB major: 5 minor: 2 memoryClockRate(GHz): 1.112
pciBusID: 0000:91:00.0
totalMemory: 22.40GiB freeMemory: 22.29GiB
2018-10-08 15:02:45.583515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Device peer to peer matrix
2018-10-08 15:02:45.583794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1051] DMA: 0 1 2 3 4 5 6 7
2018-10-08 15:02:45.583805: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 0: Y Y Y Y N N N N
2018-10-08 15:02:45.583809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 1: Y Y Y Y N N N N
2018-10-08 15:02:45.583814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 2: Y Y Y Y N N N N
2018-10-08 15:02:45.583818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 3: Y Y Y Y N N N N
2018-10-08 15:02:45.583823: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 4: N N N N Y Y Y Y
2018-10-08 15:02:45.583827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 5: N N N N Y Y Y Y
2018-10-08 15:02:45.583831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 6: N N N N Y Y Y Y
2018-10-08 15:02:45.583836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1061] 7: N N N N Y Y Y Y
2018-10-08 15:02:45.583853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla M40 24GB, pci bus id: 0000:05:00.0, compute capability: 5.2)
2018-10-08 15:02:45.583861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:1) -> (device: 1, name: Tesla M40 24GB, pci bus id: 0000:08:00.0, compute capability: 5.2)
2018-10-08 15:02:45.583868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:2) -> (device: 2, name: Tesla M40 24GB, pci bus id: 0000:0d:00.0, compute capability: 5.2)
2018-10-08 15:02:45.583874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:3) -> (device: 3, name: Tesla M40 24GB, pci bus id: 0000:13:00.0, compute capability: 5.2)
2018-10-08 15:02:45.583881: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:4) -> (device: 4, name: Tesla M40 24GB, pci bus id: 0000:83:00.0, compute capability: 5.2)
2018-10-08 15:02:45.583887: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:5) -> (device: 5, name: Tesla M40 24GB, pci bus id: 0000:89:00.0, compute capability: 5.2)
2018-10-08 15:02:45.583893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:6) -> (device: 6, name: Tesla M40 24GB, pci bus id: 0000:8e:00.0, compute capability: 5.2)
2018-10-08 15:02:45.583899: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:7) -> (device: 7, name: Tesla M40 24GB, pci bus id: 0000:91:00.0, compute capability: 5.2)
[New Thread 0x7ffe9bfff700 (LWP 27110)]
[New Thread 0x7ffe9b7fe700 (LWP 27111)]
[New Thread 0x7ffe99c83700 (LWP 27112)]
[New Thread 0x7ffe99482700 (LWP 27113)]
[New Thread 0x7ffe98c81700 (LWP 27114)]
[New Thread 0x7ffe8ac85700 (LWP 27115)]
[New Thread 0x7ffe8a484700 (LWP 27116)]
[New Thread 0x7ffe89c83700 (LWP 27117)]
[New Thread 0x7ffe4da77700 (LWP 27118)]
[New Thread 0x7ffe4d276700 (LWP 27119)]
[New Thread 0x7ffe4ca75700 (LWP 27120)]
[New Thread 0x7ffe3ec85700 (LWP 27121)]
[New Thread 0x7ffe3d10a700 (LWP 27122)]
[New Thread 0x7ffe3c909700 (LWP 27123)]
[New Thread 0x7ffe2ffff700 (LWP 27124)]
[New Thread 0x7ffe2f7fe700 (LWP 27125)]
[New Thread 0x7ffe2effd700 (LWP 27126)]
[New Thread 0x7ffe2e7fc700 (LWP 27127)]
[New Thread 0x7ffe2dffb700 (LWP 27128)]
[New Thread 0x7ffe2d7fa700 (LWP 27129)]
[New Thread 0x7ffe2cff9700 (LWP 27130)]
[New Thread 0x7ffdfda77700 (LWP 27131)]
[New Thread 0x7ffdfd276700 (LWP 27132)]
[New Thread 0x7ffdfca75700 (LWP 27133)]
[New Thread 0x7ffdc5a86700 (LWP 27134)]
[New Thread 0x7ffdc5285700 (LWP 27135)]
[New Thread 0x7ffdc4a84700 (LWP 27136)]
[New Thread 0x7ffd9ffff700 (LWP 27137)]
[New Thread 0x7ffd9f7fe700 (LWP 27138)]
[New Thread 0x7ffd9effd700 (LWP 27139)]
[New Thread 0x7ffd9e7fc700 (LWP 27140)]
[New Thread 0x7ffd9dffb700 (LWP 27141)]
[New Thread 0x7ffd9d7fa700 (LWP 27142)]
[New Thread 0x7ffd9cff9700 (LWP 27143)]
[New Thread 0x7ffd7bfff700 (LWP 27144)]
[New Thread 0x7ffd73fff700 (LWP 27145)]
[New Thread 0x7ffd7b7fe700 (LWP 27146)]
[New Thread 0x7ffd7affd700 (LWP 27147)]
[New Thread 0x7ffd7a7fc700 (LWP 27148)]
[New Thread 0x7ffd79ffb700 (LWP 27149)]
[New Thread 0x7ffd797fa700 (LWP 27150)]
[New Thread 0x7ffd78ff9700 (LWP 27151)]
[New Thread 0x7ffd737fe700 (LWP 27152)]
[New Thread 0x7ffd72ffd700 (LWP 27153)]
[New Thread 0x7ffd727fc700 (LWP 27154)]
2018-10-08 15:02:47.067126: I /home/disk/diskc/yudeng/face_reconstruction/dirt/csrc/gl_common.h:66] selected egl device #0 to match cuda device #0 for thread 0x7ffd727fc700

Thread 126 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffd727fc700 (LWP 27154)]
0x00007fff69479ae9 in glGetString () from /usr/lib/nvidia-384/libGL.so.1
#0 0x00007fff69479ae9 in glGetString () from /usr/lib/nvidia-384/libGL.so.1
#1 0x00007fff69937196 in gl_common::initialise_context(int) () from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#2 0x00007fff69937dd3 in RasteriseOpGpu::initialise_per_thread_objects(RasteriseOpGpu::PerThreadObjects&, HWC const&, CUctx_st* const&) ()
from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#3 0x00007fff69938697 in RasteriseOpGpu::Compute(tensorflow::OpKernelContext*)::{lambda(RasteriseOpGpu::PerThreadObjects&)#1}::operator()(RasteriseOpGpu::PerThreadObjects&) const ()
from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#4 0x00007fff6993aba4 in std::_Function_handler<void (RasteriseOpGpu::PerThreadObjects&), RasteriseOpGpu::Compute(tensorflow::OpKernelContext*)::{lambda(RasteriseOpGpu::PerThreadObjects&)#1}>::_M_invoke(std::_Any_data const&, RasteriseOpGpu::PerThreadObjects&) () from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#5 0x00007fff6993b6e1 in std::function<void (RasteriseOpGpu::PerThreadObjects&)>::operator()(RasteriseOpGpu::PerThreadObjects&) const ()
from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#6 0x00007fff6993ae71 in GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::dispatch_blocking(std::function<void (RasteriseOpGpu::PerThreadObjects&)> const&)::{lambda()#1}::operator()() const ()
from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#7 0x00007fff6993c1d6 in std::_Function_handler<bool (), GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::dispatch_blocking(std::function<void (RasteriseOpGpu::PerThreadObjects&)> const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#8 0x00007fff69941ef2 in std::function<bool ()>::operator()() const () from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#9 0x00007fff699417d8 in GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::thread_fn(moodycamel::BlockingConcurrentQueue<std::function<bool ()>, moodycamel::ConcurrentQueueDefaultTraits>&) ()
from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#10 0x00007fff6994118f in GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::GlThread()::{lambda()#1}::operator()() const () from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#11 0x00007fff699445fc in void std::_Bind_simple<GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::GlThread()::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) ()
from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#12 0x00007fff699444f2 in std::_Bind_simple<GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::GlThread()::{lambda()#1} ()>::operator()() ()
from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#13 0x00007fff69943df2 in std::thread::_Impl<std::_Bind_simple<GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::GlThread()::{lambda()#1} ()> >::_M_run() ()
from /home/yudeng/anaconda3/lib/python3.5/site-packages/dirt/librasterise.so
#14 0x00007fff8c0d4c80 in ?? () from /home/yudeng/anaconda3/bin/../lib/libstdc++.so.6
#15 0x00007ffff76d16ba in start_thread (arg=0x7ffd727fc700) at pthread_create.c:333
#16 0x00007ffff6aef41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
A debugging session is active.

Inferior 1 [process 27013] will be killed.

Quit anyway? (y or n) y

Seems that the problem comes from openGL ?

@pmh47
Copy link
Owner

pmh47 commented Oct 8, 2018

Hi @YuDeng,

Thanks for the info. Yes, it's an OpenGL issue; it is crashing during the very first call into an OpenGL function. However, all the libraries shown by ldd seem correct.

It may be due to conflicts between GLVND and legacy GL libraries on your system, but I can't replicate the exact problem.

Did you try running cmake with -DOpenGL_GL_PREFERENCE=GLVND?

You could also try making the following changes to csrc/CMakeLists.txt to force use of GLVND libraries (requires cmake 3.10 or newer):

  • change line 5 to find_package(OpenGL REQUIRED COMPONENTS OpenGL EGL)
  • comment line 9
  • change line 42 to target_link_libraries(rasterise PRIVATE OpenGL::OpenGL OpenGL::EGL ${Tensorflow_LIBRARY})

@dongdu3
Copy link

dongdu3 commented Nov 21, 2018

Hi @YuDeng ,

Have you solved it? I met the same problem as you, and after I ran cmake with -DOpenGL_GL_PREFERENCE=GLVND suggested by @pmh47, a new error "extensions eglQueryDevicesEXT, eglQueryDeviceAttribEXT and eglGetPlatformDisplayEXT not available" existed. I am confused to run the code for several days, could you help me to address it?

@pmh47
Copy link
Owner

pmh47 commented Nov 21, 2018

Hi @dongdu3,
The error "extensions eglQueryDevicesEXT, eglQueryDeviceAttribEXT and eglGetPlatformDisplayEXT not available" likely means you are not linking nvidia's libEGL, but a different one, e.g. mesa. Please check the output of

import dirt.rasterise_ops
import subprocess
subprocess.call(['ldd', dirt.rasterise_ops._lib_path + '/librasterise.so'])

and ensure nvidia libraries are shown.

@dongdu3
Copy link

dongdu3 commented Nov 21, 2018

Hi @pmh47 ,

Thanks for your quick reply. I have considered that, and the libEGL used is from the library libglvnd I build with the source code "https://github.com/NVIDIA/libglvnd". However, when I switch it to nvidia's libEGL, it runs into a segmentation fault (core dumped), just like YuDeng's description.

BTW, my testing environment is cuda9.0, nvidia-384.130, and tensorflow-gpu1.8(1.7). While I tried the code in another PC(cuda8.0, nvidia-375, tensorflow-gpu1.4), all testing files(such as square_test.py) passed by using the nvidia's libEGL. Does it matter with the version of cuda or nvidia driver?

@pmh47
Copy link
Owner

pmh47 commented Nov 21, 2018

Hi @dongdu3,

In this case I think the segmentation fault is 'better' than the error about extensions, as it means at least EGL is exposing the necessary functions, even if the OpenGL context it creates is not then working properly. I'm not sure about using the source version of libglvnd -- the one that comes with the nvidia driver package may do something magic to interface with their driver backend.

What OS are you using? I'll try to replicate your exact setup. I've tested successfully with cuda 9.0 in the past, and newer drivers, without problems.

@dongdu3
Copy link

dongdu3 commented Nov 21, 2018

Hi @pmh47

Thanks again. My OS is ubuntu16.04.3.

@dongdu3
Copy link

dongdu3 commented Nov 22, 2018

Hi @pmh47,

I success this time after uninstalling the libglvnd, and now ldd librasterise.so outputs:

linux-vdso.so.1 => (0x00007ffdb19ba000)
libOpenGL.so.0 => /usr/lib/nvidia-384/libOpenGL.so.0 (0x00007f72787bd000)
libEGL.so.1 => /usr/lib/nvidia-384/libEGL.so.1 (0x00007f72785b8000)
libtensorflow_framework.so => /home/administrator/.local/lib/python2.7/site-packages/tensorflow/libtensorflow_framework.so (0x00007f727769e000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f7277496000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7277279000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7277075000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7276cf3000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f72769ea000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f72767d4000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f727640a000)
/lib64/ld-linux-x86-64.so.2 (0x00007f7278d7a000)
libGLdispatch.so.0 => /usr/lib/nvidia-384/libGLdispatch.so.0 (0x00007f727613c000)
libcublas.so.9.0 => /usr/local/cuda-9.0/lib64/libcublas.so.9.0 (0x00007f72729fa000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f7271b7c000)
libcudnn.so.7 => /usr/local/cuda-9.0/lib64/libcudnn.so.7 (0x00007f72606e5000)
libcufft.so.9.0 => /usr/local/cuda-9.0/lib64/libcufft.so.9.0 (0x00007f7258644000)
libcurand.so.9.0 => /usr/local/cuda-9.0/lib64/libcurand.so.9.0 (0x00007f72546e0000)
libcudart.so.9.0 => /usr/local/cuda-9.0/lib64/libcudart.so.9.0 (0x00007f7254473000)
libnvidia-fatbinaryloader.so.384.130 => /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.130 (0x00007f7254221000)

I would like to study your paper and code to render depth maps or normal maps of meshes for another differentiable loss during training. Thank you for your kindness and patience.

@gh18l
Copy link

gh18l commented Apr 12, 2019

Hello, @dongdu3 ,
I met same problem as you with segmentation fault when I run the test code, and my environment is same as yours, just cuda9.0 cudnn7 tensorflow-gpu1.8 nvidia-384.130, and my system has no libglvnd. Have you solved the problem successfully? Can you describe the solution more concretely? Thank you!

@gsygsy96
Copy link

gsygsy96 commented Apr 13, 2019

Same question. My report:

import dirt.rasterise_ops
import subprocess
subprocess.call(['ldd', dirt.rasterise_ops._lib_path + '/librasterise.so'])
linux-vdso.so.1 => (0x00007fffc93f8000)
libEGL.so.1 => /usr/lib/nvidia-384/libEGL.so.1 (0x00007f0c686e1000)
libGL.so.1 => /usr/lib/nvidia-384/libGL.so.1 (0x00007f0c6839f000)
libtensorflow_framework.so => /home/guan/miniconda2/envs/coma/lib/python2.7/site-packages/tensorflow/libtensorflow_framework.so (0x00007f0c675ec000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0c673e4000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0c671c7000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0c66fc3000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0c66c41000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0c66938000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0c66722000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0c66358000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0c68c6f000)
libGLdispatch.so.0 => /usr/lib/nvidia-384/libGLdispatch.so.0 (0x00007f0c6608a000)
libnvidia-tls.so.384.130 => /usr/lib/nvidia-384/tls/libnvidia-tls.so.384.130 (0x00007f0c65e86000)
libnvidia-glcore.so.384.130 => /usr/lib/nvidia-384/libnvidia-glcore.so.384.130 (0x00007f0c63fca000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f0c63c90000)
libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f0c63a7e000)
libcublas.so.8.0 => /home/guan/miniconda2/envs/coma/lib/python2.7/site-packages/tensorflow/../../../libcublas.so.8.0 (0x00007f0c60a36000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f0c5fbb8000)
libcudnn.so.7 => /home/guan/miniconda2/envs/coma/lib/python2.7/site-packages/tensorflow/../../../libcudnn.so.7 (0x00007f0c53762000)
libcufft.so.8.0 => /home/guan/miniconda2/envs/coma/lib/python2.7/site-packages/tensorflow/../../../libcufft.so.8.0 (0x00007f0c4a911000)
libcurand.so.8.0 => /home/guan/miniconda2/envs/coma/lib/python2.7/site-packages/tensorflow/../../../libcurand.so.8.0 (0x00007f0c4699a000)
libcudart.so.8.0 => /home/guan/miniconda2/envs/coma/lib/python2.7/site-packages/tensorflow/../../../libcudart.so.8.0 (0x00007f0c46732000)
libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f0c46510000)
libnvidia-fatbinaryloader.so.384.130 => /usr/lib/nvidia-384/libnvidia-fatbinaryloader.so.384.130 (0x00007f0c462be000)
libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f0c460ba000)
libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f0c45eb4000)

@pmh47 Could you help me?

@gsygsy96
Copy link

And I have try this suggestion:
change line 5 to find_package(OpenGL REQUIRED COMPONENTS OpenGL EGL)
comment line 9
change line 42 to target_link_libraries(rasterise PRIVATE OpenGL::OpenGL OpenGL::EGL ${Tensorflow_LIBRARY})

But the installation fails.
Error:
Could NOT find OpenGL (missing: OPENGL_opengl_LIBRARY)

@gsygsy96
Copy link

@dongdu3 Could you teach me wether I have install libglvnd , and how to uninstall libglvnd?

@gsygsy96
Copy link

@li19960612 Have you solved this ?

@gh18l
Copy link

gh18l commented Apr 15, 2019

@mehameha998 I have not solved this problem yet. I think the GL and EGL have some conflicts of version, so I changed my CMakeList.txt to call libGL and libEGL with both nvidia version, but it does not work. I dont know what should I do.

@gh18l
Copy link

gh18l commented Apr 15, 2019

@mehameha998 Do you want to install the octopus with the dirt? If yes, would you mind leave a contact information so that we can have some communication in research?

@pmh47
Copy link
Owner

pmh47 commented Apr 15, 2019

@li19960612 You should link libGL and libEGL, or libOpenGL and libEGL, not the _nvidia-xxx versions, but you should ensure that the version of those libraries installed with the nvidia driver is found in the linker path before any other version (e.g. mesa).

Let me know the output of

import dirt.rasterise_ops
import subprocess
subprocess.call(['ldd', dirt.rasterise_ops._lib_path + '/librasterise.so'])

and

ls -l /usr/lib/*/*GL*

and what version/distro of linux you are using.

@pmh47
Copy link
Owner

pmh47 commented Apr 15, 2019

@mehameha998
The output of ldd that you posted above looks correct. Please run ls -l /usr/lib/*/*GL* and send me the output.

The other way of doing it, with OpenGL REQUIRED COMPONENTS OpenGL EGL probably fails because cmake cannot find the opengl library in the nvidia folder. You could try putting -D_OPENGL_LIB_PATH=/usr/lib/nvidia-384 in the cmake command

@gh18l
Copy link

gh18l commented Apr 19, 2019

@mehameha998 I have solved the "segmentation fault" problem, I change the link "libGL.so" to "libOpenGL.so". But I dont konw what happened.

@gh18l
Copy link

gh18l commented Apr 19, 2019

@pmh47 Thank you for your patience! I success when I change my link "libGL.so" to "libOpenGL.so". I will study your paper and code conscientiously after that.

@gsygsy96
Copy link

gsygsy96 commented Apr 19, 2019

@li19960612 @pmh47 .Thanks ! I change opengl/egl the library path to Nvidia dir, then it works.

@pmh47 pmh47 closed this as completed Apr 26, 2019
@BigDataHa
Copy link

@pmh47 Thank you for your patience! I success when I change my link "libGL.so" to "libOpenGL.so". I will study your paper and code conscientiously after that.

How to change link "libGL.so" to "libOpenGL.so"

@francoisruty
Copy link

I got the same problem I think, when I try to build the docker image with

sudo docker build -t dirt --build-arg CUDA_BASE_VERSION=10.0 --build-arg CUDNN_VERSION=7.6.0.64 --build-arg UBUNTU_VERSION=16.04 --build-arg TENSORFLOW_VERSION=1.14.0 .

I get:

Step 21/21 : RUN python ~/dirt/tests/square_test.py
 ---> Running in cbd152971ada
WARNING: Logging before flag parsing goes to stderr.
W0701 11:01:22.309566 139893748598528 deprecation_wrapper.py:119] From /root/dirt/tests/square_test.py:41: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2019-07-01 11:01:22.310481: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-07-01 11:01:22.546655: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-01 11:01:22.547662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
2019-07-01 11:01:22.547941: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-01 11:01:22.549120: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-01 11:01:22.550130: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-01 11:01:22.550417: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-01 11:01:22.551827: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-01 11:01:22.552891: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-01 11:01:22.555843: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-01 11:01:22.555942: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-01 11:01:22.556813: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-01 11:01:22.557597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-01 11:01:22.557940: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-01 11:01:22.633165: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-01 11:01:22.633981: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56c1730 executing computations on platform CUDA. Devices:
2019-07-01 11:01:22.633997: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
2019-07-01 11:01:22.653678: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2019-07-01 11:01:22.654188: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x57327f0 executing computations on platform Host. Devices:
2019-07-01 11:01:22.654201: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-07-01 11:01:22.654384: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-01 11:01:22.656951: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
2019-07-01 11:01:22.656978: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-01 11:01:22.656989: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-07-01 11:01:22.656998: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-07-01 11:01:22.657006: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-07-01 11:01:22.657014: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-07-01 11:01:22.657030: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-07-01 11:01:22.657049: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-07-01 11:01:22.657105: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-01 11:01:22.657782: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-01 11:01:22.658422: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-07-01 11:01:22.658442: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-07-01 11:01:22.659297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-01 11:01:22.659308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-07-01 11:01:22.659313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-07-01 11:01:22.659567: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-01 11:01:22.660236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-01 11:01:22.660891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7605 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
Segmentation fault (core dumped)
The command '/bin/sh -c python ~/dirt/tests/square_test.py' returned a non-zero code: 139

@pmh47
Copy link
Owner

pmh47 commented Jul 1, 2019

@francoisruty Unfortunately I don't have docker on this machine test it myself just now.
Could you make the changes to csrc/CMakeLists.txt described above and see if that helps? Specifically,

  • change line 5 to find_package(OpenGL REQUIRED COMPONENTS OpenGL EGL)
  • comment line 9
  • replace ${EGL_LIBRARIES} ${OPENGL_LIBRARIES} in target_link_libraries with OpenGL::OpenGL OpenGL::EGL (either line 42 or 56 depending which commit you're using)

If that doesn't work, please send the output of running the following in python in the container:

import dirt.rasterise_ops
import subprocess
subprocess.call(['ldd', dirt.rasterise_ops._lib_path + '/librasterise.so'])

and

ls -l /usr/lib/*/*GL*

@francoisruty
Copy link

I tried the changes you mentioned, still doesn't work, here is the 2 outputs you requested:

>>> import dirt.rasterise_ops
>>> import subprocess
>>> subprocess.call(['ldd', dirt.rasterise_ops._lib_path + '/librasterise.so'])
	linux-vdso.so.1 =>  (0x00007fff1a52b000)
	libEGL.so.1 => /usr/local/lib/x86_64-linux-gnu/libEGL.so.1 (0x00007fc5d2d79000)
	libGL.so.1 => /usr/local/lib/x86_64-linux-gnu/libGL.so.1 (0x00007fc5d2ae9000)
	libtensorflow_framework.so.1 => not found
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc5d28e1000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc5d26c4000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc5d24c0000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc5d213e000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc5d1e35000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc5d1c1f000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc5d1855000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fc5d334c000)
	libGLdispatch.so.0 => /usr/local/lib/x86_64-linux-gnu/libGLdispatch.so.0 (0x00007fc5d159a000)
	libGLX.so.0 => /usr/local/lib/x86_64-linux-gnu/libGLX.so.0 (0x00007fc5d135e000)
	libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007fc5d1024000)
	libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007fc5d0e02000)
	libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007fc5d0bfe000)
	libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007fc5d09f8000)
0
>>> 

and

root@ea8361b7877a:/# ls -l /usr/lib/*/*GL*
lrwxrwxrwx 1 root root      23 Jun 30 13:02 /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0 -> libEGL_nvidia.so.418.39
-rw-r--r-- 1 root root 1210304 Feb 10 01:15 /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.418.39
lrwxrwxrwx 1 root root      29 Jun 30 13:02 /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.418.39
-rw-r--r-- 1 root root   60832 Feb 10 01:13 /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.418.39
lrwxrwxrwx 1 root root      26 Jun 30 13:02 /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.418.39
-rw-r--r-- 1 root root  110784 Feb 10 01:13 /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.418.39
lrwxrwxrwx 1 root root      23 Jun 30 13:02 /usr/lib/x86_64-linux-gnu/libGLX_indirect.so.0 -> libGLX_nvidia.so.418.39
lrwxrwxrwx 1 root root      23 Jun 30 13:02 /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0 -> libGLX_nvidia.so.418.39
-rw-r--r-- 1 root root 1275408 Feb 10 01:11 /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.418.39

@pmh47
Copy link
Owner

pmh47 commented Jul 2, 2019

@francoisruty This is specific to tensorflow 1.14 (any maybe later); 1.13 or 1.12 work. I can only reproduce in the dockerfile for now; it works fine in a regular venv. Tracked as #34

@francoisruty
Copy link

roger that

@lyrgwlr
Copy link

lyrgwlr commented Jul 30, 2019

Hi @pmh47 ,
I met the segmentation fault too. And I try set cmake config -DOpenGL_GL_PREFERENCE=GLVND but it didn't work.
The lib output is:
linux-vdso.so.1 => (0x00007ffc87fab000)
libEGL.so.1 => /usr/lib/nvidia-384/libEGL.so.1 (0x00007f1793081000)
libGL.so.1 => /usr/lib/nvidia-384/libGL.so.1 (0x00007f1792d3f000)
libtensorflow_framework.so => not found
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1792b37000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f179291a000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1792716000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1792394000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f179208b000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1791e75000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1791aab000)
/lib64/ld-linux-x86-64.so.2 (0x00007f179357e000)
libGLdispatch.so.0 => /usr/lib/nvidia-384/libGLdispatch.so.0 (0x00007f17917dd000)
libnvidia-tls.so.384.130 => /usr/lib/nvidia-384/tls/libnvidia-tls.so.384.130 (0x00007f17915d9000)
libnvidia-glcore.so.384.130 => /usr/lib/nvidia-384/libnvidia-glcore.so.384.130 (0x00007f178f71d000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f178f3e3000)
libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f178f1d1000)
libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f178efaf000)
libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f178edab000)
libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f178eba5000)

The gbd output is:
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python2.7...(no debugging symbols found)...done.
Starting program: /usr/bin/python2.7 tests/square_test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff43da700 (LWP 26383)]
[New Thread 0x7ffff3bd9700 (LWP 26384)]
[New Thread 0x7fffef3d8700 (LWP 26385)]
[New Thread 0x7fffecbd7700 (LWP 26386)]
[New Thread 0x7fffea3d6700 (LWP 26387)]
[New Thread 0x7fffe7bd5700 (LWP 26388)]
[New Thread 0x7fffe53d4700 (LWP 26389)]
[Thread 0x7fffea3d6700 (LWP 26387) exited]
[Thread 0x7fffe53d4700 (LWP 26389) exited]
[Thread 0x7fffe7bd5700 (LWP 26388) exited]
[Thread 0x7fffecbd7700 (LWP 26386) exited]
[Thread 0x7fffef3d8700 (LWP 26385) exited]
[Thread 0x7ffff3bd9700 (LWP 26384) exited]
[Thread 0x7ffff43da700 (LWP 26383) exited]
[New Thread 0x7fffe53d4700 (LWP 26393)]
[New Thread 0x7fffe7bd5700 (LWP 26394)]
[New Thread 0x7fffea3d6700 (LWP 26395)]
[New Thread 0x7fffecbd7700 (LWP 26396)]
[New Thread 0x7fff7a1d2700 (LWP 26397)]
[New Thread 0x7fff779d1700 (LWP 26398)]
[New Thread 0x7fff751d0700 (LWP 26399)]
[New Thread 0x7fff6397f700 (LWP 26400)]
2019-07-30 14:33:25.461830: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
[New Thread 0x7fff6317e700 (LWP 26401)]
[New Thread 0x7fff6297d700 (LWP 26402)]
[New Thread 0x7fff6217c700 (LWP 26403)]
[New Thread 0x7fff6197b700 (LWP 26404)]
[New Thread 0x7fff6117a700 (LWP 26405)]
[New Thread 0x7fff60979700 (LWP 26406)]
[New Thread 0x7fff43fff700 (LWP 26407)]
[New Thread 0x7fff437fe700 (LWP 26408)]
[New Thread 0x7fff42ffd700 (LWP 26416)]
[New Thread 0x7fff427fc700 (LWP 26417)]
[New Thread 0x7fff41ffb700 (LWP 26418)]
[New Thread 0x7fff417fa700 (LWP 26419)]
2019-07-30 14:33:26.184380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2019-07-30 14:33:26.184415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-07-30 14:33:26.453332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-30 14:33:26.453370: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0
2019-07-30 14:33:26.453382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N
2019-07-30 14:33:26.453623: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10396 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
[New Thread 0x7fff25d96700 (LWP 26420)]
[New Thread 0x7fff25595700 (LWP 26421)]
[New Thread 0x7fff24d94700 (LWP 26422)]
[New Thread 0x7ffeffb18700 (LWP 26423)]
[New Thread 0x7ffef7fff700 (LWP 26424)]
[New Thread 0x7ffeff317700 (LWP 26425)]
[New Thread 0x7ffefeb16700 (LWP 26426)]
[New Thread 0x7ffefe315700 (LWP 26427)]
[New Thread 0x7ffefdb14700 (LWP 26428)]
[New Thread 0x7ffefd313700 (LWP 26429)]
[New Thread 0x7ffefcb12700 (LWP 26430)]
[Thread 0x7ffefcb12700 (LWP 26430) exited]
[New Thread 0x7ffefcb12700 (LWP 26431)]
[Thread 0x7ffefcb12700 (LWP 26431) exited]
[New Thread 0x7ffefcb12700 (LWP 26432)]
[Thread 0x7ffefcb12700 (LWP 26432) exited]
[New Thread 0x7ffefcb12700 (LWP 26433)]
[Thread 0x7ffefcb12700 (LWP 26433) exited]
[New Thread 0x7ffefcb12700 (LWP 26434)]
[Thread 0x7ffefcb12700 (LWP 26434) exited]
[New Thread 0x7ffefcb12700 (LWP 26435)]
[Thread 0x7ffefcb12700 (LWP 26435) exited]
[New Thread 0x7ffefcb12700 (LWP 26436)]
[Thread 0x7ffefcb12700 (LWP 26436) exited]
[New Thread 0x7ffefcb12700 (LWP 26437)]
[Thread 0x7ffefcb12700 (LWP 26437) exited]
[New Thread 0x7ffefcb12700 (LWP 26438)]
2019-07-30 14:33:26.648868: I /home/cv/wlr/dirt/csrc/gl_common.h:66] selected egl device #0 to match cuda device #0 for thread 0x7ffefcb12700

Thread 47 "python2.7" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffefcb12700 (LWP 26438)]
0x00007fff671ccae9 in glGetString () from /usr/lib/nvidia-384/libGL.so.1
#0 0x00007fff671ccae9 in glGetString () from /usr/lib/nvidia-384/libGL.so.1
#1 0x00007fff6768b00a in gl_common::initialise_context(int) () from /home/cv/wlr/dirt/dirt/librasterise.so
#2 0x00007fff6768bbf5 in RasteriseOpGpu::initialise_per_thread_objects(RasteriseOpGpu::PerThreadObjects&, HWC const&, CUctx_st* const&) () from /home/cv/wlr/dirt/dirt/librasterise.so
#3 0x00007fff6768c4b9 in RasteriseOpGpu::Compute(tensorflow::OpKernelContext*)::{lambda(RasteriseOpGpu::PerThreadObjects&)#1}::operator()(RasteriseOpGpu::PerThreadObjects&) const ()
from /home/cv/wlr/dirt/dirt/librasterise.so
#4 0x00007fff6768ed22 in std::_Function_handler<void (RasteriseOpGpu::PerThreadObjects&), RasteriseOpGpu::Compute(tensorflow::OpKernelContext*)::{lambda(RasteriseOpGpu::PerThreadObjects&)#1}>::_M_invoke(std::_Any_data const&, RasteriseOpGpu::PerThreadObjects&) () from /home/cv/wlr/dirt/dirt/librasterise.so
#5 0x00007fff6768f85f in std::function<void (RasteriseOpGpu::PerThreadObjects&)>::operator()(RasteriseOpGpu::PerThreadObjects&) const () from /home/cv/wlr/dirt/dirt/librasterise.so
#6 0x00007fff6768efef in GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::dispatch_blocking(std::function<void (RasteriseOpGpu::PerThreadObjects&)> const&)::{lambda()#1}::operator()() const () from /home/cv/wlr/dirt/dirt/librasterise.so
#7 0x00007fff67690354 in std::_Function_handler<bool (), GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::dispatch_blocking(std::function<void (RasteriseOpGpu::PerThreadObjects&)> const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /home/cv/wlr/dirt/dirt/librasterise.so
#8 0x00007fff6769604a in std::function<bool ()>::operator()() const () from /home/cv/wlr/dirt/dirt/librasterise.so
#9 0x00007fff67695930 in GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::thread_fn(moodycamel::BlockingConcurrentQueue<std::function<bool ()>, moodycamel::ConcurrentQueueDefaultTraits>&) () from /home/cv/wlr/dirt/dirt/librasterise.so
#10 0x00007fff676952e7 in GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::GlThread()::{lambda()#1}::operator()() const () from /home/cv/wlr/dirt/dirt/librasterise.so
#11 0x00007fff67698754 in void std::_Bind_simple<GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::GlThread()::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) ()
from /home/cv/wlr/dirt/dirt/librasterise.so
#12 0x00007fff6769864a in std::_Bind_simple<GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::GlThread()::{lambda()#1} ()>::operator()() ()
from /home/cv/wlr/dirt/dirt/librasterise.so
#13 0x00007fff67697f4a in std::thread::_Impl<std::_Bind_simple<GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::GlThread()::{lambda()#1} ()> >::_M_run() ()
from /home/cv/wlr/dirt/dirt/librasterise.so
#14 0x00007fffac776c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#15 0x00007ffff7bc16ba in start_thread (arg=0x7ffefcb12700) at pthread_create.c:333
#16 0x00007ffff78f741d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
A debugging session is active.

    Inferior 1 [process 26379] will be killed.

Quit anyway? (y or n)

I linked libEGL/GL to nvidia disk, but it didn't work anyway.
I'm struggling a lot. Please give me a help.

@pmh47
Copy link
Owner

pmh47 commented Jul 30, 2019

@lyrgwlr
In csrc/CMakeLists.txt, make the following changes:

  • change line 5 to find_package(OpenGL REQUIRED COMPONENTS OpenGL EGL)
  • comment line 9
  • change line 52 to target_link_libraries(rasterise OpenGL::OpenGL OpenGL::EGL ${Tensorflow_LINK_FLAGS})

This configuration seems to be more reliable on modern systems. If it still doesn't work, please paste the output of ldd when using this changed version.

@lyrgwlr
Copy link

lyrgwlr commented Aug 3, 2019

@pmh47
After I changed like you said, I run the cmake command:
cmake ../csrc/ -DCMAKE_LIBRARY_PATH=/usr/lib/nvidia-384

Some ERRORs happened:
-- The CXX compiler identification is GNU 5.4.0
-- The CUDA compiler identification is NVIDIA 9.0.176
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda-9.0/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-9.0/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found OpenGL: /usr/lib/nvidia-384/libGL.so
-- Configuring done
CMake Error at CMakeLists.txt:42 (add_library):
Target "rasterise" links to target "OpenGL::OpenGL" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?

CMake Error at CMakeLists.txt:42 (add_library):
Target "rasterise" links to target "OpenGL::EGL" but the target was not
found. Perhaps a find_package() call is missing for an IMPORTED target, or
an ALIAS target is missing?

-- Generating done
-- Build files have been written to: /home/cv/wlr/dirt/build

And I ran the make command:
make

Its output is:
/usr/bin/ld: cannot find -lOpenGL::OpenGL
/usr/bin/ld: cannot find -lOpenGL::EGL
collect2: error: ld returned 1 exit status
CMakeFiles/rasterise.dir/build.make:225: recipe for target '/home/cv/wlr/dirt/dirt/librasterise.so' failed
make[2]: *** [/home/cv/wlr/dirt/dirt/librasterise.so] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/rasterise.dir/all' failed
make[1]: *** [CMakeFiles/rasterise.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

@pmh47
Copy link
Owner

pmh47 commented Aug 3, 2019

@lyrgwlr Something strange has happened there; it's as if cmake has not even looked for the OpenGL::OpenGL and OpenGL::EGL components. What version of cmake are you using? 3.10 or newer is required for find_package(OpenGL REQUIRED COMPONENTS OpenGL EGL) to work, but I'd expect an error from earlier versions.

@lyrgwlr
Copy link

lyrgwlr commented Aug 6, 2019

@pmh47
I met the same problem like @mehameha998 .
When I update my cmake to 3.14, error occured at cmake.
My cmake command is :
cmake -D_OPENGL_LIB_PATH=/usr/lib/nvidia-384 ../csrc

And the cmake result is :
-- The CXX compiler identification is GNU 5.4.0
-- The CUDA compiler identification is NVIDIA 9.0.176
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda-9.0/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-9.0/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
CMake Error at /usr/local/share/cmake-3.14/Modules/FindPackageHandleStandardArgs.cmake:137 (message):
Could NOT find OpenGL (missing: EGL)
Call Stack (most recent call first):
/usr/local/share/cmake-3.14/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE)
/usr/local/share/cmake-3.14/Modules/FindOpenGL.cmake:397 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
CMakeLists.txt:5 (find_package)

-- Configuring incomplete, errors occurred!
See also "/home/cv/wlr/dirt/build/CMakeFiles/CMakeOutput.log".

@pmh47
Copy link
Owner

pmh47 commented Aug 6, 2019

@lyrgwlr Please do ls -l /usr/lib/nvidia-384 and paste the output here. Possibly libEGL.so is missing.

@lyrgwlr
Copy link

lyrgwlr commented Aug 7, 2019

@pmh47
The output is here:
-rw-r--r-- 1 root root 0 3月 7 23:46 alt_ld.so.conf
drwxr-xr-x 2 root root 4096 3月 28 11:43 bin
-rw-r--r-- 1 root root 42 3月 7 23:46 ld.so.conf
lrwxrwxrwx 1 root root 24 3月 7 23:46 libEGL_nvidia.so.0 -> libEGL_nvidia.so.384.130
-rw-r--r-- 1 root root 1313208 3月 21 2018 libEGL_nvidia.so.384.130
lrwxrwxrwx 1 root root 17 3月 7 23:46 libEGL.so -> libEGL.so.384.130
lrwxrwxrwx 1 root root 17 3月 7 23:46 libEGL.so.1 -> libEGL.so.384.130
-rw-r--r-- 1 root root 73328 3月 21 2018 libEGL.so.1.1.0
-rw-r--r-- 1 root root 20192 3月 21 2018 libEGL.so.384.130
-rw-r--r-- 1 root root 711864 3月 21 2018 libGLdispatch.so.0
lrwxrwxrwx 1 root root 30 3月 7 23:46 libGLESv1_CM_nvidia.so.1 -> libGLESv1_CM_nvidia.so.384.130
-rw-r--r-- 1 root root 54392 3月 21 2018 libGLESv1_CM_nvidia.so.384.130
lrwxrwxrwx 1 root root 17 3月 7 23:46 libGLESv1_CM.so -> libGLESv1_CM.so.1
lrwxrwxrwx 1 root root 21 3月 28 11:43 libGLESv1_CM.so.1 -> libGLESv1_CM.so.1.2.0
-rw-r--r-- 1 root root 43696 3月 21 2018 libGLESv1_CM.so.1.2.0
lrwxrwxrwx 1 root root 27 3月 7 23:46 libGLESv2_nvidia.so.2 -> libGLESv2_nvidia.so.384.130
-rw-r--r-- 1 root root 86232 3月 21 2018 libGLESv2_nvidia.so.384.130
lrwxrwxrwx 1 root root 14 3月 7 23:46 libGLESv2.so -> libGLESv2.so.2
lrwxrwxrwx 1 root root 18 3月 28 11:43 libGLESv2.so.2 -> libGLESv2.so.2.1.0
-rw-r--r-- 1 root root 83280 3月 21 2018 libGLESv2.so.2.1.0
lrwxrwxrwx 1 root root 10 3月 7 23:46 libGL.so -> libGL.so.1
lrwxrwxrwx 1 root root 16 3月 7 23:46 libGL.so.1 -> libGL.so.384.130
-rw-r--r-- 1 root root 665720 3月 21 2018 libGL.so.1.7.0
-rw-r--r-- 1 root root 1291320 3月 21 2018 libGL.so.384.130
lrwxrwxrwx 1 root root 24 3月 7 23:46 libGLX_indirect.so.0 -> libGLX_nvidia.so.384.130
lrwxrwxrwx 1 root root 24 3月 7 23:46 libGLX_nvidia.so.0 -> libGLX_nvidia.so.384.130
-rw-r--r-- 1 root root 1291320 3月 21 2018 libGLX_nvidia.so.384.130
lrwxrwxrwx 1 root root 11 3月 7 23:46 libGLX.so -> libGLX.so.0
-rw-r--r-- 1 root root 65840 3月 21 2018 libGLX.so.0
lrwxrwxrwx 1 root root 15 3月 7 23:46 libnvcuvid.so -> libnvcuvid.so.1
lrwxrwxrwx 1 root root 21 3月 7 23:46 libnvcuvid.so.1 -> libnvcuvid.so.384.130
-rw-r--r-- 1 root root 2407728 3月 21 2018 libnvcuvid.so.384.130
lrwxrwxrwx 1 root root 18 3月 7 23:46 libnvidia-cfg.so -> libnvidia-cfg.so.1
lrwxrwxrwx 1 root root 24 3月 7 23:46 libnvidia-cfg.so.1 -> libnvidia-cfg.so.384.130
-rw-r--r-- 1 root root 166152 3月 21 2018 libnvidia-cfg.so.384.130
lrwxrwxrwx 1 root root 23 3月 7 23:46 libnvidia-compiler.so -> libnvidia-compiler.so.1
lrwxrwxrwx 1 root root 29 3月 7 23:46 libnvidia-compiler.so.1 -> libnvidia-compiler.so.384.130
-rw-r--r-- 1 root root 48462448 3月 21 2018 libnvidia-compiler.so.384.130
-rw-r--r-- 1 root root 28210936 3月 21 2018 libnvidia-eglcore.so.384.130
lrwxrwxrwx 1 root root 30 3月 7 23:46 libnvidia-egl-wayland.so.1 -> libnvidia-egl-wayland.so.1.0.1
-rw-r--r-- 1 root root 31152 3月 21 2018 libnvidia-egl-wayland.so.1.0.1
lrwxrwxrwx 1 root root 21 3月 7 23:46 libnvidia-encode.so -> libnvidia-encode.so.1
lrwxrwxrwx 1 root root 27 3月 7 23:46 libnvidia-encode.so.1 -> libnvidia-encode.so.384.130
-rw-r--r-- 1 root root 164512 3月 21 2018 libnvidia-encode.so.384.130
-rw-r--r-- 1 root root 313704 3月 21 2018 libnvidia-fatbinaryloader.so.384.130
lrwxrwxrwx 1 root root 18 3月 7 23:46 libnvidia-fbc.so -> libnvidia-fbc.so.1
lrwxrwxrwx 1 root root 24 3月 7 23:46 libnvidia-fbc.so.1 -> libnvidia-fbc.so.384.130
-rw-r--r-- 1 root root 106600 3月 21 2018 libnvidia-fbc.so.384.130
-rw-r--r-- 1 root root 30027616 3月 21 2018 libnvidia-glcore.so.384.130
-rw-r--r-- 1 root root 511576 3月 21 2018 libnvidia-glsi.so.384.130
lrwxrwxrwx 1 root root 18 3月 7 23:46 libnvidia-ifr.so -> libnvidia-ifr.so.1
lrwxrwxrwx 1 root root 24 3月 7 23:46 libnvidia-ifr.so.1 -> libnvidia-ifr.so.384.130
-rw-r--r-- 1 root root 207880 3月 21 2018 libnvidia-ifr.so.384.130
lrwxrwxrwx 1 root root 17 3月 7 23:46 libnvidia-ml.so -> libnvidia-ml.so.1
lrwxrwxrwx 1 root root 23 3月 7 23:46 libnvidia-ml.so.1 -> libnvidia-ml.so.384.130
-rw-r--r-- 1 root root 1312544 3月 21 2018 libnvidia-ml.so.384.130
lrwxrwxrwx 1 root root 35 3月 28 11:43 libnvidia-ptxjitcompiler.so.1 -> libnvidia-ptxjitcompiler.so.384.130
-rw-r--r-- 1 root root 10323240 3月 21 2018 libnvidia-ptxjitcompiler.so.384.130
-rw-r--r-- 1 root root 13080 3月 21 2018 libnvidia-tls.so.384.130
lrwxrwxrwx 1 root root 24 3月 28 11:43 libnvidia-wfb.so.1 -> libnvidia-wfb.so.384.130
-rw-r--r-- 1 root root 295416 12月 14 2012 libnvidia-wfb.so.384.130
lrwxrwxrwx 1 root root 14 3月 7 23:46 libOpenGL.so -> libOpenGL.so.0
-rw-r--r-- 1 root root 211728 3月 21 2018 libOpenGL.so.0
drwxr-xr-x 2 root root 4096 3月 28 11:43 tls
drwxr-xr-x 2 root root 4096 3月 28 11:43 vdpau
drwxr-xr-x 2 root root 4096 3月 28 11:43 xorg

I think libEGL.so is exist.

@pmh47
Copy link
Owner

pmh47 commented Aug 7, 2019

@lyrgwlr I'm not sure why cmake is failing to find it then. Please paste what cat CMakeCache.txt | grep -i EGL (run in the build folder) shows; it might reveal something. CMake's FindOpenGL may be interacting with your system in a strange way. As a hack, you can try directly linking the correct library: remove EGL from line 5 of CMakeLists (so FindOpenGL no longer searches for it), and at line 52, replace OpenGL::EGL by /usr/lib/nvidia-384/libEGL.so.1.1.0 (or maybe /usr/lib/nvidia-384/libEGL.so.384.130 -- I don't know why both are present).

@lyrgwlr
Copy link

lyrgwlr commented Aug 7, 2019

@pmh47
Thanks very much!! Finally I solved this.
I used your method:
directly linking the correct library: remove EGL from line 5 of CMakeLists (so FindOpenGL no longer searches for it), and at line 52, replace OpenGL::EGL by /usr/lib/nvidia-384/libEGL.so.1.1.0
And I can successfully install dirt now.
My test code output is:
2019-08-07 18:09:41.537339: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-07 18:09:41.680515: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1411] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2019-08-07 18:09:41.680547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1490] Adding visible gpu devices: 0
2019-08-07 18:09:46.700712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-07 18:09:46.700759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] 0
2019-08-07 18:09:46.700773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0: N
2019-08-07 18:09:46.707158: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1103] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10396 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-08-07 18:09:47.253460: I /home/cv/wlr/dirt/csrc/gl_common.h:66] selected egl device #0 to match cuda device #0 for thread 0x7f717fb18700
2019-08-07 18:09:47.545617: I /home/cv/wlr/dirt/csrc/gl_common.h:84] successfully created new GL context on thread 0x7f717fb18700 (EGL = 1.4, GL = 4.5.0 NVIDIA 384.130, renderer = GeForce GTX 1080 Ti/PCIe/SSE2)
2019-08-07 18:09:47.832571: I /home/cv/wlr/dirt/csrc/rasterise_egl.cpp:266] reinitialised framebuffer with size 128 x 128
successful: all pixels agree
cv@cv-ThinkStation-P410-Invalid-entry-length-16-Fixed-up-to-11:~/wlr/dirt$ 2019-08-07 18:09:41.537339: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-07 18:09:47.253460: I /home/cv/wlr/dirt/csrc/gl_common.h:66] selected egl device #0 to match cuda device #0 for thread 0x7f717fb18700
2019-08-07 18:09:47.545617: I /home/cv/wlr/dirt/csrc/gl_common.h:84] successfully created new GL context on thread 0x7f717fb18700 (EGL = 1.4, GL = 4.5.0 NVIDIA 384.130, renderer = GeForce GTX 1080 Ti/PCIe/SSE2)
2019-08-07 18:09:47.832571: I /home/cv/wlr/dirt/csrc/rasterise_egl.cpp:266] reinitialised framebuffer with size 128 x 128
successful: all pixels agree

Is that all right?
Thanks again for your such quick reply.

@Eby-123
Copy link

Eby-123 commented Nov 11, 2019

Hi,I've the same problem.When I run the test code, it shows a segmentation fault (core dumped).

And I run the following in python:
import dirt.rasterise_ops
import subprocess
subprocess.call(['ldd', dirt.rasterise_ops._lib_path + '/librasterise.so'])

it gives the following information:

linux-vdso.so.1 =>  (0x00007ffe5dcad000)
libEGL.so.1 => /usr/lib/nvidia-410/libEGL.so.1 (0x00007f556f82f000)
libGL.so.1 => /usr/lib/nvidia-410/libGL.so.1 (0x00007f556f4f1000)
libtensorflow_framework.so => not found
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f556f2e9000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f556f0cc000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f556eec8000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f556eb46000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f556e83d000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f556e627000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f556e25d000)
/lib64/ld-linux-x86-64.so.2 (0x00007f556fd4a000)
libGLdispatch.so.0 => /usr/lib/nvidia-410/libGLdispatch.so.0 (0x00007f556df8f000)
libnvidia-tls.so.410.48 => /usr/lib/nvidia-410/tls/libnvidia-tls.so.410.48 (0x00007f556dd8b000)
libnvidia-glcore.so.410.48 => /usr/lib/nvidia-410/libnvidia-glcore.so.410.48 (0x00007f556c1d1000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f556be97000)
libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6 (0x00007f556bc85000)
libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f556ba63000)
libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f556b85f000)
libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f556b659000)

0

And running 'gdb -ex r -ex bt -ex q --args python tests/square_test.py',

GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
Starting program: /home/wuyibin/anaconda3/envs/py36/bin/python tests/square_test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
/home/wuyibin/anaconda3/envs/py36/bin/python: can't open file 'tests/square_test.py': [Errno 2] No such file or directory
[Inferior 1 (process 23034) exited with code 02]
No stack.

Any advice would be greatly appreciated

@pmh47
Copy link
Owner

pmh47 commented Nov 11, 2019

@Eby-123 Please try the suggestions in this comment above.

@YinghaoHuang91
Copy link

@pmh47 I have the same issue when running dirt on our cluster. There the libegl_nvidia.so is stored in /usr/lib/x86_64-linux-gnu folder together with other GL so files. Any idea how I can make Ubuntu ignore /usr/lib path without modifying files inside /etc/ld.so.conf.d/ (which I don't have the permission)? Thanks!

@pmh47
Copy link
Owner

pmh47 commented Oct 1, 2020

@YinghaoHuang91 If it's building ok but failing at runtime, you may be able to set LD_PRELOAD to the correct libEGL.so.1 (i.e. the one provided by nvidia).

@YinghaoHuang91
Copy link

@pmh47 Still no luck. I set LD_PRELOAD to the libEGL.so.1, it's in the same folder as libEGL_nvidia.so.0. This is what I see when running gdb:

[Switching to Thread 0x1554bc773700 (LWP 579851)]
0x00001554bc2f4dcf in ?? () from /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0
#0 0x00001554bc2f4dcf in ?? () from /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0
#1 0x00001554bc280f89 in ?? () from /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0
#2 0x000015550cb378d6 in gl_common::initialise_context(int) () from /lustre/work/yhuang2/Projects//DIRT_1001/dirt/dirt/librasterise.so
#3 0x000015550cb3885b in RasteriseOpGpu::initialise_per_thread_objects(RasteriseOpGpu::PerThreadObjects&, HWC const&, CUctx_st* const&) () from /lustre/work/yhuang2/Projects/DIRT_1001/dirt/dirt/librasterise.so
#4 0x000015550cb3915b in RasteriseOpGpu::Compute(tensorflow::OpKernelContext*)::{lambda(RasteriseOpGpu::PerThreadObjects&)#1}::operator()(RasteriseOpGpu::PerThreadObjects&) const () from /lustre/work/yhuang2/Projects/DIRT_1001/dirt/dirt/librasterise.so
#5 0x000015550cb3c0f0 in std::_Function_handler<void (RasteriseOpGpu::PerThreadObjects&), RasteriseOpGpu::Compute(tensorflow::OpKernelContext*)::{lambda(RasteriseOpGpu::PerThreadObjects&)#1}>::_M_invoke(std::_Any_data const&, RasteriseOpGpu::PerThreadObjects&) ()
from /lustre/work/yhuang2/Projects/DIRT_1001/dirt/dirt/librasterise.so
#6 0x000015550cb3d0db in std::function<void (RasteriseOpGpu::PerThreadObjects&)>::operator()(RasteriseOpGpu::PerThreadObjects&) const () from /lustre/work/yhuang2/Projects//DIRT_1001/dirt/dirt/librasterise.so
#7 0x000015550cb3c375 in GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::dispatch_blocking(std::function<void (RasteriseOpGpu::PerThreadObjects&)> const&)::{lambda()#1}::operator()() const ()
from /lustre/work/yhuang2/Projects//DIRT_1001/dirt/dirt/librasterise.so
#8 0x000015550cb3dbed in std::_Function_handler<bool (), GlDispatcherRasteriseOpGpu::PerThreadObjects::GlThread::dispatch_blocking(std::function<void (RasteriseOpGpu::PerThreadObjects&)> const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
from /lustre/work/yhuang2/Projects/DIRT_1001/dirt/dirt/librasterise.so

This is the output from tensorflow:

import subprocess
subprocess.call(['ldd', dirt.rasterise_ops._lib_path + '/librasterise.so'])
linux-vdso.so.1 (0x00007ffc75bf1000)
/is/cluster/work/yhuang2/Projects//DIRT_1001/dirt_old/TMP_PATH/libEGL.so.1 (0x000014b52d8e0000)
libOpenGL.so.0 => /usr/lib/x86_64-linux-gnu/libOpenGL.so.0 (0x000014b52d883000)
libtensorflow_framework.so.1 => not found
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x000014b52d876000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x000014b52d853000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x000014b52d84d000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x000014b52d66c000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000014b52d51d000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x000014b52d500000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000014b52d30e000)
/lib64/ld-linux-x86-64.so.2 (0x000014b52d9f9000)
libGLdispatch.so.0 => /usr/lib/x86_64-linux-gnu/libGLdispatch.so.0 (0x000014b52d256000)
0

Any idea how to solve this annoying issue?

@jy0119
Copy link

jy0119 commented Nov 24, 2020

Hi,when I run python tests/square_test.py,it will output:
2020-11-24 11:24:04.813466: F /home/zz/NOL/dirt/csrc/gl_common.h:65] none of 1 egl devices matches the active cuda device
Aborted (core dumped)
I didn't find a solution on the Internet.Could you tell me how to fix this problem?Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests