Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error importing lmbspecialops #3

Closed
marcobius opened this issue Aug 8, 2017 · 22 comments
Closed

Error importing lmbspecialops #3

marcobius opened this issue Aug 8, 2017 · 22 comments

Comments

@marcobius
Copy link

Hello I'm trying to install/use the library and I'm having some issues.

My configuration is:

Ubuntu 16.04 LTS
tensorflow 1.2.1
cmake 3.5.1
python 3.5.2
cuda 8.0.61

I've cloned your project into ~/src/lmbspecialops and when I try to import the lib I get the error included bellow.
Any help will be appreciated, I'm afraid I've used all my arsenal.
Thanks in advance.


bou@bou-yoga:~/src$ export PYTHONPATH="lmbspecialops/python"
bou@bou-yoga:~/src$ export | grep PYTHON*
declare -x PYTHONPATH="lmbspecialops/python"
bou@bou-yoga:~/src$ python3
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lmbspecialops
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bou/src/lmbspecialops/python/lmbspecialops.py", line 28, in <module>
    lmbspecialopslib = tf.load_op_library(_lib_path)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library
    None, None, error_msg, error_code)
tensorflow.python.framework.errors_impl.NotFoundError: /home/bou/src/lmbspecialops/build/lib/lmbspecialops.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv
@benjaminum
Copy link
Collaborator

Hi,
do you have multiple tensorflow versions installed or did you change the tensorflow version after building lmbspecialops?

Can you try to reproduce this in a virtualenv?

@marcobius
Copy link
Author

marcobius commented Aug 8, 2017

Hi, I've tried both.
You where right with multiple versions of tensorflow installed, so I followed next steps:

  1. uninstall all versions of tensorflow
  2. create a new virtualenv (virtualenv -p python3 test-lmb)
  3. activate the environment
  4. install tensorflow-gpu (pip3 install tensorflow-gpu)
  5. build your lib following readme instructions
  6. test it importing from python lmbspecial ops

The error is again the same:

(test-lmb) bou@bou-yoga:~/src$ python
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lmbspecialops
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bou/src/lmbspecialops/python/lmbspecialops.py", line 28, in <module>
    lmbspecialopslib = tf.load_op_library(_lib_path)
  File "/home/bou/src/test-lmb/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 64, in load_op_library
    None, None, error_msg, error_code)
tensorflow.python.framework.errors_impl.NotFoundError: /home/bou/src/lmbspecialops/build/lib/lmbspecialops.so: undefined symbol: _ZN10tensorflow11GetNodeAttrERKNS_9AttrSliceENS_11StringPieceEPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

any idea..?

EDIT: just in case, I've tried to install tensorflow (without -gpu) and still same error.

@benjaminum
Copy link
Collaborator

The symbol name looks strange. Can you try importing one of the libs that comes with tf?

e.g.

import os
import tensorflow as tf
tfroot = os.path.join(tf.sysconfig.get_lib(), '..')
imgops = tf.load_op_library(os.path.join(tfroot,'contrib/image/python/ops/_image_ops.so'))

@benjaminum
Copy link
Collaborator

Someone who is getting the same error told me that this problem could be solved by using a newer cmake version.

We are using: https://cmake.org/files/v3.7/cmake-3.7.1-Linux-x86_64.tar.gz

@marcobius
Copy link
Author

Yes!
Now I can import lmbspecialops! Thanks!

But I've got this testing your example code:

Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lmbspecialops
>>> import tensorflow as tf
>>> import numpy as np
>>> 
>>> tf.InteractiveSession()
<tensorflow.python.client.session.InteractiveSession object at 0x7ff3140238d0>
>>> 
>>> A = tf.constant([1,2,np.nan])
>>> B = lmbspecialops.replace_nonfinite(A)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'lmbspecialops' has no attribute 'replace_nonfinite'
>>> print(B.eval()) # prints [1, 2, 0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'B' is not defined

@benjaminum
Copy link
Collaborator

From your output it looks like no code in lmbspecialops.py is executed.

Can you give me the __dict__ of the lmbspecialops module?

import lmbspecialops
for k,v in lmbspecialops.__dict__.items():
    print(k,v)

@globalcaos
Copy link

globalcaos commented Aug 11, 2017

Same error here. My configuration is:

Linux Mint 18.1 Serena
TensorFlow 1.3.0-rc2
Cmake 3.7.1
Python 3.5.2
Cuda 8.0.61
CuDNN 6.0.21

I changed import lmbspecialops as sops by import lmbspecialops.python.lmbspecialops as sops and the AttributeError disappeared.

However, I am back to the original error:

/usr/bin/python3.5 /home/globalcaos/Documents/demon/examples/example.py Traceback (most recent call last): File "/home/globalcaos/Documents/demon/examples/example.py", line 18, in <module> from depthmotionnet.networks_original import * File "/home/globalcaos/Documents/demon/python/depthmotionnet/networks_original.py", line 18, in <module> from .blocks_original import * File "/home/globalcaos/Documents/demon/python/depthmotionnet/blocks_original.py", line 19, in <module> from .helpers import * File "/home/globalcaos/Documents/demon/python/depthmotionnet/helpers.py", line 19, in <module> import lmbspecialops.python.lmbspecialops as sops File "/home/globalcaos/Documents/demon/lmbspecialops/python/lmbspecialops.py", line 13, in <module> lmbspecialopslib = tf.load_op_library(_lib_path) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library lib_handle = py_tf.TF_LoadLibrary(library_filename, status) File "/usr/lib/python3.5/contextlib.py", line 66, in __exit__ next(self.gen) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status c_api.TF_GetCode(status)) tensorflow.python.framework.errors_impl.NotFoundError: /home/globalcaos/Documents/demon/lmbspecialops/build/lib/lmbspecialops.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv

What version of TensorFlow are you guys running?

@marcobius
Copy link
Author

Ok, I can import now and it executes the example.
Sadly, I don't know exactly what was my error. I guess I may have forgot to recompile lmbspacialops last time I installed tensorflow, but not sure of it.

Thanks for your help!!

@marcobius
Copy link
Author

marcobius commented Aug 12, 2017

What version of TensorFlow are you guys running?

Till now, tensorflow 1.2.1.
My next step is to compile tensorflow from sources and see..

@benjaminum
Copy link
Collaborator

What version of TensorFlow are you guys running?

We have used tensorflow versions 1.0, 1.1 and 1.2.1 with lmbspecialops. We didn't test 1.3 yet.

@globalcaos did you try compiling lmbspecialops with tensorflow 1.2.1?
When configuring the build (cmake ..), cmake prints the path to the tensorflow include directory.
Can you check if the path corresponds to the correct tensorflow version?

The errors above seem to be either related to a D_GLIBCXX_USE_CXX11_ABI mismatch or an include mismatch.

@globalcaos
Copy link

globalcaos commented Aug 16, 2017

Did you mean ccmake??

This is the output:

BUILD_WITH_CUDA ON
CMAKE_BUILD_TYPE Release
CMAKE_INSTALL_PREFIX /usr/local
CUDA_HOST_COMPILER /usr/bin/cc
CUDA_SDK_ROOT_DIR CUDA_SDK_ROOT_DIR-NOTFOUND
CUDA_TOOLKIT_ROOT_DIR /usr/local/cuda
CUDA_USE_STATIC_CUDA_RUNTIME ON
CUDA_rt_LIBRARY /usr/lib/x86_64-linux-gnu/librt.so
GENERATE_KEPLER_SM30_CODE OFF
GENERATE_KEPLER_SM35_CODE ON
GENERATE_KEPLER_SM37_CODE ON
GENERATE_MAXWELL_SM50_CODE OFF
GENERATE_MAXWELL_SM52_CODE ON
GENERATE_PASCAL_SM61_CODE ON
GENERATE_PTX30_CODE OFF
GENERATE_PTX61_CODE OFF

I will try to define the CUDA_SDK_ROOT_DIR

@benjaminum
Copy link
Collaborator

ccmake unfortunately does not show the variable with the tensorflow include path.

If you run cmake .. again from the build directory it should print the tensorflow include dir in the first line.

CUDA_SDK_ROOT_DIR should not be important. It works for me without defining it.

@globalcaos
Copy link

globalcaos commented Aug 17, 2017

You were right @benjaminum , the TensorFlow include path is wrong, because it should be pointing to the tensorflow installed under Python3:

Here is what I get:
globalcaos@CrazyFire ~/Documents/demon/lmbspecialops/build $ cmake ..
-- /usr/local/lib/python2.7/dist-packages/tensorflow/include
-- found test test_LeakyRelu
-- found test test_Median3x3Downsample
-- found test test_ReplaceNonfinite
-- found test test_ScaleInvariantGradient
-- Configuring done
-- Generating done
-- Build files have been written to: /home/globalcaos/Documents/demon/lmbspecialops/build

How do I specify it to point to the other one? What if I change the symbolic link python?

@marcobius
Copy link
Author

I guess the problem is the tensorflow version.
I've tried just now 1.3.0rc2 (compiled locally) and I get the famous error.
Uninstall it, and install tensorflow 1.3.0 (pip3 without compiling) and it runs perfectly.

@marcobius
Copy link
Author

Hello again,
now I'm getting a new error running the tests. (don't know if I should I open a new issue)

When I run test_LeakyRelu.py, I can see the message Creating TensorFlow device twice, four matrices prints and after that I get:

2017-08-19 17:48:47.120384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0)
terminate called after throwing an instance of 'std::runtime_error'
  what():  /home/bou/src/lmbspecialops/src/leakyrelu_cuda.cu:192: cuda error: invalid device function

Is it an issue with the driver?

@benjaminum
Copy link
Collaborator

How do I specify it to point to the other one? What if I change the symbolic link python?

@globalcaos you can try to manually set the PYTHON_EXECUTABLE and then rerun the configure step. If you use a virtualenv activate it before running cmake

@benjaminum
Copy link
Collaborator

Is it an issue with the driver?

@marcobius invalid device function can mean that there is no code for your specific gpu.
Try to check with ccmake if code generation for your gpu is enabled. GTX 940M needs SM50.

Can you open a new issue if this does not solve the problem?

@marcobius
Copy link
Author

Thanks @benjaminum , it was exactly what you say.
The SM50 code generation was disabled.

@globalcaos
Copy link

globalcaos commented Aug 22, 2017

Thanks @benjaminum , it filanny works!

Things that did not work:

  • Building TensorFlow 1.3.0 or 1.3.0-rc2 (latest git pull today) from sources (lmbspecialops would cause the undefined symbol error or not even compile, respectively)

Things that worked:

  • Installing TensorFlow 1.3.0 with sudo -H pip3 install tensorflow-gpu, then compiling lmbspecialops
  • Compiling lmbspecialops specifying python version with cmake .. -DPYTHON_EXECUTABLE=/usr/bin/python3 -DCUDA_SDK_ROOT_DIR=/usr/local/cuda/samples
  • I had to install QT5 and VTK for the voxel cloud visualization to work
  • I also had to install a bunch of libraries imported in the example.py from demon with pip3 and a bunch of packages with sudo apt-get install libjpeg-dev libxxf86vm1 libxxf86vm-dev libxi-dev mesa-common-dev libxext-dev libpng-dev libimlib2-dev libglew-dev libxrender-dev libxrandr-dev libglm-dev libxt-dev (some libraries are not necessary, I think, but I don't care which ones)

Things that still bug me:

  • TensorFlow shows warnings that this version was not compiled for my CPU, and that building it from sources should accelerate some CPU processes. However, when I do so, lmbspecialops stops working.

Hope it helps!

@kmyi
Copy link

kmyi commented Oct 30, 2017

Hi Guys,

Just figured out the reason for this.
ABI for pip version of tensorflow is compiled with old ABI ie pre 5.x, and it matches the CMake option here. This should be turned off if you compile from source, as you are using 5.x ABI in that case.

Cheers!
Kwang

@rnunziata
Copy link

getting a similar issue:

ImportError: ../lmbspecialops/build/lib/lmbspecialops.so: undefined symbol: _ZN10tensorflow11GetNodeAttrERKNS_9AttrSliceENS_11StringPieceEPNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

I installed in virtual python3 with tensorflow-gpu....current download of demon.

@dingshenglan
Copy link

是!
现在我可以导入lmbspecialops!谢谢!

但我已经测试了你的示例代码:

Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lmbspecialops
>>> import tensorflow as tf
>>> import numpy as np
>>> 
>>> tf.InteractiveSession()
<tensorflow.python.client.session.InteractiveSession object at 0x7ff3140238d0>
>>> 
>>> A = tf.constant([1,2,np.nan])
>>> B = lmbspecialops.replace_nonfinite(A)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'lmbspecialops' has no attribute 'replace_nonfinite'
>>> print(B.eval()) # prints [1, 2, 0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'B' is not defined

I met the same problem, and I install cmake 7.1. However, it didn't solve this problem. Could you please tell me how to deal with it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants