Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolving undefined symbol errors caused by _GLIBCXX_USE_CXX11_ABI #12

Closed
weinman opened this issue Nov 1, 2018 · 2 comments
Closed
Labels
enhancement New feature or request

Comments

@weinman
Copy link

weinman commented Nov 1, 2018

I am not sure if this is just an issue with a build for python2 version of tensorflow, or there are larger matters at play, but I wanted to report a workaround for undefined symbol errors in the default build process and the test of the custom op.

My platform:
Python 2.7
Ubuntu 18.04
Tensorflow 1.10.1
g++ 6

Of course, to build with python2.7, I had to change buildTF.sh so that it invoked python rather than python3. The compile succeeds, but with several warnings:

<command-line>:0:0: warning: "__GLIBCXX_USE_CXX11_ABI" redefined

This was the first hint something might be awry.

Upon executing tf.load_op_library in the testCustomOp.py script, I then received the following error

Traceback (most recent call last):
  File "testCustomOp.py", line 85, in <module>
    testMiniExample()
  File "testCustomOp.py", line 62, in testMiniExample
    res=testCustomOp(mat, corpus, chars, wordChars)
  File "testCustomOp.py", line 16, in testCustomOp
    word_beam_search_module = tf.load_op_library('../cpp/proj/TFWordBeamSearch.so')
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: ../cpp/proj/TFWordBeamSearch.so: undefined symbol: _ZN10tensorflow11GetNodeAttrERKNS_9AttrSliceENS_11StringPieceEPSs

When I remov the errant command-line define -D_GLIBCXX_USE_CXX11_ABI=0 from the compile lines, the build succeeds without warning and the test runs as expected.

I'm not sure why the define was there in the first place, but I thought I'd add a report. My tactic involved pulling out the define with an in-script environment variable. If this is a wider issue for others, perhaps such a configuration would make it easy to manually change whether it is defined or not, based on the specific platform's requirements. (I can submit a PR if you wish).

Thanks for developing and sharing this. Congratulations on the best paper award as well (how I found out about your repo). I am looking forward to some fruitful experiments blending this with my own ctc-based ocr

@githubharald
Copy link
Owner

githubharald commented Nov 1, 2018

Hi,

the way to compile a custom op for TF changes from TF version to version, this is really a bit troublesome.
The reason for the define "-D_GLIBCXX_USE_CXX11_ABI=0" is described in [1], it is in fact needed if the gcc version is >=5.

I just tried to compile the custom op with TF 1.10.1 (pre-built, taken from [2]), Python 2.7 and g++ 5.4.0. I only changed "python3" to "python2" in the build script and both the build and the TF test script worked without any problems.
You are using a g++ 6 compiler, I'm using 5.4 - maybe that's the relevant difference?

I'm planning to add a FAQ section - I will simply mention your solution there (and will link to this issue) instead of catching each special case in the build script to keep things simple.

However, I'm glad you found a solution and I'm looking forward to hear from your experiments. In [3] you can see how to integrate it into a TF model compared CTC decoders shipped with TF. If you like to share your results with me or have any questions you can also contact me via mail (mail can be found in the author section of the paper).

Best regards
Harald

[1] taken from https://www.tensorflow.org/extend/adding_an_op: "Note on gcc version >=5: gcc uses the new C++ ABI since version 5. The binary pip packages available on the TensorFlow website are built with gcc4 that uses the older ABI. If you compile your op library with gcc>=5, add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line to make the library compatible with the older abi."

[2] https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.10.1-cp27-none-linux_x86_64.whl

[3] https://github.com/githubharald/SimpleHTR/blob/master/src/Model.py#L103

@githubharald githubharald added the enhancement New feature or request label Nov 1, 2018
@weinman
Copy link
Author

weinman commented Nov 1, 2018

I agree the culprit is likely the gcc compiler (which is the cause of all this ABI mess in the first place). I custom compiled my own TF and wanted to be sure I used the same compiler for this op—maybe that consistency means the define wasn't necessary.

When I installed the precompiled version you linked to, the compile (using g++ 6.4.0) and run worked either way (with the define or without).

So it seems the define is necessary only for these certain combinations.

Thanks for the link in [3]. That's super helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants