How to obtain the .so files? #23

clauslang · 2018-04-21T11:40:11Z

When I try to run the code according to the README instructions, I get an error that certain .so files are not found. Indeed, the necessary .cc and .h files are in the ops directory, but no .so or .o files.

How can I obtain them? Or are they supposed to be generated somehow at first?

(Maybe I have more general understanding problem: What does ops actually stand for?)

clauslang · 2018-04-21T11:41:54Z

Example exception stacktrace:

Traceback (most recent call last):
File "/Users/clauslang/UnFlow/src/e2eflow/ops.py", line 61, in
op_lib = tf.load_op_library(lib_path)
File "/Users/clauslang/UnFlow/src/unflow_venv/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/Users/clauslang/UnFlow/src/unflow_venv/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: dlopen(./backward_warp_op.so, 6): image not found

simonmeister · 2018-04-21T18:58:46Z

Hi, the custom tensorflow operations should automatically compile (if they are missing when executing run.py) to produce the .so files. It worked for me with tensorflow 1.7 and Ubuntu 17.10. Which command did you run to get this output?

DongJT1996 · 2018-04-22T14:33:51Z

Did you solve the problem?

clauslang · 2018-04-23T09:34:05Z

Partially. One problem was that I hadn't installed the cuda toolkit, so the command nvcc wasn't found. Maybe it's obvious, but could be added to dependencies:

sudo apt install nvidia-cuda-toolkit

I get the same error now, but for a different reason:

~/UnFlow/src$ python run.py --help
WARNING:tensorflow:From /home/clauslang/UnFlow/unflow_venv/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
In file included from /home/clauslang/UnFlow/unflow_venv/lib/python3.5/site-packages/tensorflow/include/tensorflow/core/util/cuda_kernel_helper.h:21:0,
from backward_warp_op.cu.cc:8:
/home/clauslang/UnFlow/unflow_venv/lib/python3.5/site-packages/tensorflow/include/tensorflow/core/util/cuda_device_functions.h:32:31: fatal error: cuda/include/cuda.h: No such file or directory
compilation terminated.
Traceback (most recent call last):
File "/home/clauslang/UnFlow/src/e2eflow/ops.py", line 61, in
op_lib = tf.load_op_library(lib_path)
File "/home/clauslang/UnFlow/unflow_venv/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/home/clauslang/UnFlow/unflow_venv/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ./backward_warp_op.so: cannot open shared object file: No such file or directory

I use tensorflow 1.7 and Cuda 9.0 on Ubuntu 16.04.

gsaibro · 2018-04-23T11:51:36Z

I have the same issue:

Traceback (most recent call last):
File "/media/gsaibro/DATA/InternshipIrcad/FlowNet2/UnFlow-master/src/e2eflow/ops.py", line 81, in
op_lib = tf.load_op_library(lib_path)
File "/home/gsaibro/anaconda3/envs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/home/gsaibro/anaconda3/envs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ./backward_warp_op.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 19, in
from e2eflow.core.train import Trainer
File "/media/gsaibro/DATA/InternshipIrcad/FlowNet2/UnFlow-master/src/e2eflow/core/train.py", line 12, in
from ..ops import forward_warp
File "/media/gsaibro/DATA/InternshipIrcad/FlowNet2/UnFlow-master/src/e2eflow/ops.py", line 87, in
op_lib = tf.load_op_library(lib_path)
File "/home/gsaibro/anaconda3/envs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/home/gsaibro/anaconda3/envs/tensorflowGPU/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ./backward_warp_op.so: undefined symbol: _ZTIN10tensorflow8OpKernelE

gsaibro · 2018-04-23T14:05:20Z

@simonmeister Hi Simon, How did you install Tensorflow? From the source or using something like anaconda?

clauslang · 2018-04-23T14:15:34Z

Following the discussion tensorflow/tensorflow#15002, I removed the -D GOOGLE_CUDA=1 option from the nvcc command (line 43 in ops.py) and was thus able to produce the backward_warp_op.so file.

Now, I got a similar problem to @gsaibro:

~/UnFlow/src$ python run.py --help
Traceback (most recent call last):
File "/home/clauslang/UnFlow/src/e2eflow/ops.py", line 63, in
op_lib = tf.load_op_library(lib_path)
File "/home/clauslang/UnFlow/unflow_venv/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 58, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/home/clauslang/UnFlow/unflow_venv/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ./backward_warp_op.so: undefined symbol: _Z16BackwardWarpGradRKN5Eigen9GpuDeviceENS_9TensorMapINS_6TensorIKfLi4ELi1ElEELi16ENS_11MakePointerEEES8_S8_NS3_INS4_IfLi4ELi1ElEELi16ES7_EE

simonmeister · 2018-04-23T15:01:42Z

I used pip to install tensorflow-gpu. @clauslang i get the same issue without using GOOGLE_CUDA, as it doesn't compile the CUDA code in that case. When keeping the flag it works for me.

simonmeister · 2018-04-23T15:06:19Z

@clauslang It seems that cuda.h is not found. The current code expects cuda to be in /usr/local/cuda. I am not exactly sure if that is where it is put when you install it with apt. In most cases it's better to use the installer from the NVIDIA site to get a clean install.

clauslang · 2018-04-23T15:19:43Z

Thanks, @simonmeister, for the clarification! I got a bit confused there: I did have cuda installed, but thought I had to install nvcc on top of that (instead of just pointing to the correct cuda install location).

For now, I removed the -D GOOGLE_CUDA=1 flag from both the nvcc and the gcc command and resolved the issue that way for me @gsaibro. Removing it only from the nvcc command indeed results in the same error.

gsaibro · 2018-04-23T18:00:13Z

Thanks @simonmeister and @clauslang.
Removing '-D GOOGLE_CUDA=1' I can go through this part, but I still get stucked when calling for downsampling.

Using the '-D GOOGLE_CUDA=1' and setting the environment variables I advanced a little getting an error when trying to build correlation_op.cu.cc, as below in bold. Would you have any guess about what is causing that @simonmeister ? Thanks.

(tensorflow) gsaibro@IHUW074 /media/gsaibro/DATA/InternshipIrcad/FlowNet2/UnFlow-master/src $ python run.py
WARNING:tensorflow:From /home/gsaibro/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
/home/gsaibro/anaconda3/envs/tensorflow/lib/python3.5/site-packages/tensorflow/include/google/protobuf/arena_impl.h(57): warning: integer conversion resulted in a change of sign