New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installation issue: py-tensorflow #14488
Comments
You can try opening an issue on the TensorFlow repo, but they've been pretty unresponsive to my issues so far... |
Thanks. I'll see how far I can get and will report back. TF does only list instructions for building on Ubuntu, other systems are not supported. Searching for the error there did not help much. |
This looks very similar to the issue I ran into bazelbuild/bazel#10437. |
Thanks for sharing. Looks complicated. Also 28 days without response. Does not look promising to get this building on CentOS soon. |
I am able to build v1.14 for both python 2 and 3 using an old version of tf package on Centos7. While this issue is sorted out, you might try using the old tf package file located in branch https://github.com/Sinan81/spack/tree/old_tensorflow_v1.14_builds I am also able to build tf@2.1 with python3 using this. The only requirement for building these versions is to use jdk@1.8 as java provider. |
Thanks. I tried it with
==> Installing py-tensorflow
==> Searching for binary cache of py-tensorflow
==> No binary for py-tensorflow found: installing from source
==> Error: PermissionError: [Errno 13] Permission denied: '/cache'
/opt/spack/var/spack/repos/builtin/packages/py-tensorflow/package.py:189, in setup_build_environment:
186 # stay at least also OSX compatible
187 tmp_path = '/cache/spack/tf'
188# tmp_path = env['SPACK_TMPDIR'] '/tmp/spack') + '/tf' #TODO
>> 189 mkdirp(tmp_path)
190 env.set('TEST_TMPDIR', tmp_path)
191 env.set('HOME', tmp_path) |
That is just a temporary path used by bazel in building TF. just set it to a path where you have write permission and sufficient space (up to 10GB?) and make sure that path is not an NFS share. If you are building this on your laptop, then any path under your home directory should work. |
@Sinan81 thanks, that was stupid from my side. Unfortunately I'll arrive at the same errors again :/ Did you use any specific python version? 3 errors found in build log:
65 WARNING: /tmp/patrick/spack-stage-py-tensorflow-1.14.0-y5h6dpw45cndcpi2wn5uppnuhc2fd4pg/spack-src/tensorflow/contrib/BUILD:12:1: in py_library rule //tensorflow/contrib:contrib_py: target '//tensorflow/contrib:contrib_py' depends o
n deprecated target '//tensorflow/contrib/distributions:distributions_py': TensorFlow Distributions has migrated to TensorFlow Probability (https://github.com/tensorflow/probability). Deprecated copies remaining in tf.contrib.distr
ibutions are unmaintained, unsupported, and will be removed by late 2018. You should update all usage of `tf.contrib.distributions` to `tfp.distributions`.
66 INFO: Analyzed target //tensorflow/tools/pip_package:build_pip_package (374 packages loaded, 18256 targets configured).
67 INFO: Found 1 target...
68 [0 / 6] [-----] Expanding template tensorflow/tools/pip_package/simple_console
69 ERROR: /home/patrick/tf/_bazel_patrick/593a39a5b039aa2dbd755c47f5736f44/external/nasm/BUILD.bazel:8:1: C++ compilation of rule '@nasm//:nasm' failed (Exit 1)
70 In file included from external/nasm/output/outcoff.c:52:
>> 71 /opt/spack/opt/spack/linux-centos7-x86_64/gcc-9.2.0/python-3.6.8-knusttxlspjruvcbrfgyuyrlyfvdspk2/include/python3.6m/eval.h:10:12: error: unknown type name 'PyObject'
72 10 | PyAPI_FUNC(PyObject *) PyEval_EvalCode(PyObject *, PyObject *, PyObject *);
73 | ^~~~~~~~
>> 74 /opt/spack/opt/spack/linux-centos7-x86_64/gcc-9.2.0/python-3.6.8-knusttxlspjruvcbrfgyuyrlyfvdspk2/include/python3.6m/eval.h:12:12: error: unknown type name 'PyObject'
75 12 | PyAPI_FUNC(PyObject *) PyEval_EvalCodeEx(PyObject *co,
76 | ^~~~~~~~
>> 77 /opt/spack/opt/spack/linux-centos7-x86_64/gcc-9.2.0/python-3.6.8-knusttxlspjruvcbrfgyuyrlyfvdspk2/include/python3.6m/eval.h:21:12: error: unknown type name 'PyObject'
78 21 | PyAPI_FUNC(PyObject *) _PyEval_CallTracing(PyObject *func, PyObject *args);
79 | ^~~~~~~~
80 Target //tensorflow/tools/pip_package:build_pip_package failed to build
81 Use --verbose_failures to see the command lines of failed build steps.
82 INFO: Elapsed time: 418.588s, Critical Path: 2.54s
83 INFO: 13 processes: 13 local. |
I used python@3.7.4 and 2.7.16 (and also gcc@7.4.0, bazel@0.25.2, jdk@1.8.0_202_b08). Oddly enough when I do |
Now I confirm that I can still build v1.14 though I ended up with disabling a build option (mkl_dnn) also I had to use cuda@10.0 as opposed to 10.1 (yielding header not found error) updated tensorflow package file is now pushed to P.S. For some reason, it seems Just in case you are wondering about the specifics of the installation, here is the spec output: $ spack find -l tensorflow
==> 4 installed packages
-- linux-centos7-x86_64 / gcc@7.4.0 -----------------------------
r4xyzrp tensorflow@1.14.0 uxgzcdj tensorflow@1.14.0 zm35imj tensorflow@1.14.0 t3ok3ag tensorflow@2.1.0-rc0
sbulut@ws-067 ~
$ spack spec tensorflow/r4xyzrp
Input spec
--------------------------------
tensorflow@1.14.0%gcc@7.4.0+cuda~gcp+nccl arch=linux-centos7-x86_64
^cuda@10.0.130%gcc@7.4.0 arch=linux-centos7-x86_64
^cudnn@7.5.1-10.0-x86_64%gcc@7.4.0 arch=linux-centos7-x86_64
^nccl@2.4.8-1%gcc@7.4.0 patches=42778c78eb9875dacddf5eca20f7f6a077773fcbee41e51174f81b3143684b6d arch=linux-centos7-x86_64
^py-absl-py@0.1.6%gcc@7.4.0 arch=linux-centos7-x86_64
^py-six@1.12.0%gcc@7.4.0 arch=linux-centos7-x86_64
^python@3.7.4%gcc@7.4.0+bz2+ctypes+dbm+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4~uuid+zlib arch=linux-centos7-x86_64
^bzip2@1.0.8%gcc@7.4.0+shared arch=linux-centos7-x86_64
^expat@2.2.9%gcc@7.4.0+libbsd arch=linux-centos7-x86_64
^libbsd@0.9.1%gcc@7.4.0 arch=linux-centos7-x86_64
^gdbm@1.18.1%gcc@7.4.0 arch=linux-centos7-x86_64
^readline@8.0%gcc@7.4.0 arch=linux-centos7-x86_64
^ncurses@6.1%gcc@7.4.0~symlinks~termlib arch=linux-centos7-x86_64
^gettext@0.20.1%gcc@7.4.0+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-centos7-x86_64
^libxml2@2.9.9%gcc@7.4.0~python arch=linux-centos7-x86_64
^libiconv@1.16%gcc@7.4.0 arch=linux-centos7-x86_64
^xz@5.2.4%gcc@7.4.0 arch=linux-centos7-x86_64
^zlib@1.2.11%gcc@7.4.0+optimize+pic+shared arch=linux-centos7-x86_64
^tar@1.32%gcc@7.4.0 arch=linux-centos7-x86_64
^libffi@3.2.1%gcc@7.4.0 arch=linux-centos7-x86_64
^openssl@1.1.1b%gcc@7.4.0+systemcerts arch=linux-centos7-x86_64
^sqlite@3.30.0%gcc@7.4.0+column_metadata+fts~functions+rtree arch=linux-centos7-x86_64
^py-astor@0.8.0%gcc@7.4.0 arch=linux-centos7-x86_64
^py-future@0.17.1%gcc@7.4.0 arch=linux-centos7-x86_64
^py-gast@0.3.2%gcc@7.4.0 arch=linux-centos7-x86_64
^py-grpcio@1.25.0%gcc@7.4.0 arch=linux-centos7-x86_64
^c-ares@1.15.0%gcc@7.4.0 build_type=RelWithDebInfo arch=linux-centos7-x86_64
^py-h5py@2.9.0%gcc@7.4.0~mpi arch=linux-centos7-x86_64
^hdf5@1.10.5%gcc@7.4.0+cxx~debug~fortran+hl~mpi+pic+shared~szip~threadsafe arch=linux-centos7-x86_64
^py-numpy@1.17.3%gcc@7.4.0+blas+lapack arch=linux-centos7-x86_64
^openblas@0.3.6%gcc@7.4.0+avx2~avx512 cpu_target=auto ~ilp64+pic+shared threads=none ~virtual_machine arch=linux-centos7-x86_64
^py-keras-applications@1.0.8%gcc@7.4.0 arch=linux-centos7-x86_64
^py-keras-preprocessing@1.1.0%gcc@7.4.0 arch=linux-centos7-x86_64
^py-mock@3.0.5%gcc@7.4.0 arch=linux-centos7-x86_64
^py-protobuf@3.6.0%gcc@7.4.0~cpp arch=linux-centos7-x86_64
^py-setuptools@41.0.1%gcc@7.4.0 arch=linux-centos7-x86_64
^py-termcolor@1.1.0%gcc@7.4.0 arch=linux-centos7-x86_64
^py-wheel@0.33.1%gcc@7.4.0 arch=linux-centos7-x86_64
Concretized
--------------------------------
tensorflow@1.14.0%gcc@7.4.0+cuda~gcp+nccl arch=linux-centos7-x86_64
^cuda@10.0.130%gcc@7.4.0 arch=linux-centos7-x86_64
^cudnn@7.5.1-10.0-x86_64%gcc@7.4.0 arch=linux-centos7-x86_64
^nccl@2.4.8-1%gcc@7.4.0 patches=42778c78eb9875dacddf5eca20f7f6a077773fcbee41e51174f81b3143684b6d arch=linux-centos7-x86_64
^py-absl-py@0.1.6%gcc@7.4.0 arch=linux-centos7-x86_64
^py-six@1.12.0%gcc@7.4.0 arch=linux-centos7-x86_64
^python@3.7.4%gcc@7.4.0+bz2+ctypes+dbm+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tix~tkinter~ucs4~uuid+zlib arch=linux-centos7-x86_64
^bzip2@1.0.8%gcc@7.4.0+shared arch=linux-centos7-x86_64
^expat@2.2.9%gcc@7.4.0+libbsd arch=linux-centos7-x86_64
^libbsd@0.9.1%gcc@7.4.0 arch=linux-centos7-x86_64
^gdbm@1.18.1%gcc@7.4.0 arch=linux-centos7-x86_64
^readline@8.0%gcc@7.4.0 arch=linux-centos7-x86_64
^ncurses@6.1%gcc@7.4.0~symlinks~termlib arch=linux-centos7-x86_64
^gettext@0.20.1%gcc@7.4.0+bzip2+curses+git~libunistring+libxml2+tar+xz arch=linux-centos7-x86_64
^libxml2@2.9.9%gcc@7.4.0~python arch=linux-centos7-x86_64
^libiconv@1.16%gcc@7.4.0 arch=linux-centos7-x86_64
^xz@5.2.4%gcc@7.4.0 arch=linux-centos7-x86_64
^zlib@1.2.11%gcc@7.4.0+optimize+pic+shared arch=linux-centos7-x86_64
^tar@1.32%gcc@7.4.0 arch=linux-centos7-x86_64
^libffi@3.2.1%gcc@7.4.0 arch=linux-centos7-x86_64
^openssl@1.1.1b%gcc@7.4.0+systemcerts arch=linux-centos7-x86_64
^sqlite@3.30.0%gcc@7.4.0+column_metadata+fts~functions+rtree arch=linux-centos7-x86_64
^py-astor@0.8.0%gcc@7.4.0 arch=linux-centos7-x86_64
^py-future@0.17.1%gcc@7.4.0 arch=linux-centos7-x86_64
^py-gast@0.3.2%gcc@7.4.0 arch=linux-centos7-x86_64
^py-grpcio@1.25.0%gcc@7.4.0 arch=linux-centos7-x86_64
^c-ares@1.15.0%gcc@7.4.0 build_type=RelWithDebInfo arch=linux-centos7-x86_64
^py-h5py@2.9.0%gcc@7.4.0~mpi arch=linux-centos7-x86_64
^hdf5@1.10.5%gcc@7.4.0+cxx~debug~fortran+hl~mpi+pic+shared~szip~threadsafe arch=linux-centos7-x86_64
^py-numpy@1.17.3%gcc@7.4.0+blas+lapack arch=linux-centos7-x86_64
^openblas@0.3.6%gcc@7.4.0+avx2~avx512 cpu_target=auto ~ilp64+pic+shared threads=none ~virtual_machine arch=linux-centos7-x86_64
^py-keras-applications@1.0.8%gcc@7.4.0 arch=linux-centos7-x86_64
^py-keras-preprocessing@1.1.0%gcc@7.4.0 arch=linux-centos7-x86_64
^py-mock@3.0.5%gcc@7.4.0 arch=linux-centos7-x86_64
^py-protobuf@3.6.0%gcc@7.4.0~cpp arch=linux-centos7-x86_64
^py-setuptools@41.0.1%gcc@7.4.0 arch=linux-centos7-x86_64
^py-termcolor@1.1.0%gcc@7.4.0 arch=linux-centos7-x86_64
^py-wheel@0.33.1%gcc@7.4.0 arch=linux-centos7-x86_64
sbulut@ws-067 ~
$
|
I made a minor hacky edit to run on a workstation and the build went past the original error but the final build (took almost 2 hours!) had some strange errors.
Strange link errors in
and finally, I have no clue why this happened :
|
For reference I've added my build spec and the output of I see that |
In the latest version of tensorflow package, harware optimizations are handled perfectly. For now you might want to disable all optional flags just to confirm first the package builds. I would suggest you also try building with cuda@10.0 (which would yield libcudart.10.0 ) |
@s-sajid-ali Do you have an avx512 capable cpu? If so, a patch might be needed. |
@s-sajid-ali when I tried to compile v1.15.0 I got the same error as yours despite the fact that I was using cuda@10.0. The problem is that somehow tf can't locate libcudart although it's provided in spack cuda package. Let me look into that. |
Yes, I have access to KNL (and skylake) nodes on which I plan to compile tf from source. Would it not be possible to achieve this by just changing the Both of these systems have no gpu's so I'm not particuarly bothered by the |
Thanks for the pointer! It looks like this was fixed in |
@Sinan81 Thanks for looking again! I am still unable to build with the same error. Just FYI, I am building with |
@Sinan81 : I tried your build recipe on two different workstations with If PS: Trying a simple cpu-only benchmark (on a broadwell cpu) showed a ~230% speedup for spack built tf when compared to the intel-conda build (which was built for avx systems)! |
The million dollar question is what is done differently in my version of TF package vs the latest one so that
Don't remember the last time I tried building
That's a big difference. In a sense, I would expect it since conda wouldn't use latest vectorizion and SIMD instructions so that the package is usable for a wider audience. This difference should be even bigger for the official spack TF package since it has built-in micro-arch optimizations. |
I built All I can say is that for some reason the crosstool compiler toolchain is somehow better than a host only compiler toolchain. The exact command that fails (both at develop with
Thanks for looking into this! It would be great if this is an easy fix. EDIT : I've also tried not injecting the miniconda python path in the list of Let me know if you want me to share any build logs or config files, I'd be happy to post them here. |
I also saw number of similar issues while building py-tensorflow with latest recipes (which have been reported already here). Just of curiosity, what I did was I took latest receipe of py-tensorflow and used it our quite old Spack fork and I got following: Honestly, I am surprised by the fact that it successfully installed
|
If you look at #15698, specifically the changes to lib/spack/env/cc - that should make this problem (well, the PyObject thing at least) go away as the wrong eval.h is being used here due to jumbling up of the ordering of include paths. Can you let me know if changing that around leads to a successful build for you? |
Just as a point of reference, I was able to build tensorflow without cuda and nccl with the patch to lib/spack/env/cc. I also needed to add the attached -lrt patch because I am building in a centos6 container. It looks like tensorflow 2.2 may include this: I built with gcc 8.3, bazel 0.29.1, and python 3.8.2. |
Spack attempts to inject it's paths just before the actual system include directories. Currently, if a build utilizes -isystem, Spack's headers will be injected using -I, and that effectively inserts them before any specified -isystem headers. This leads to unexpected failures. Here, we assume that if a build attempts to use -isystem (implying that the underlying compiler supports the flag) - we change to injecting the paths into the end of the -isystem paths. There is a potential concern if the build includes system paths in -I *and* uses -isystem, but I think that's probably an unlikely usage. See: https://gcc.gnu.org/onlinedocs/gcc/Directory-Options.html Fixes spack#14488, spack#14234
Spack attempts to inject it's paths just before the actual system include directories. Currently, if a build utilizes -isystem, Spack's headers will be injected using -I, and that effectively inserts them before any specified -isystem headers. This leads to unexpected failures. Here, we assume that if a build attempts to use -isystem (implying that the underlying compiler supports the flag) - we change to injecting the paths into the end of the -isystem paths. There is a potential concern if the build includes system paths in -I *and* uses -isystem, but I think that's probably an unlikely usage. See: https://gcc.gnu.org/onlinedocs/gcc/Directory-Options.html Fixes spack#14488, Fixes spack#14234
Spack attempts to inject it's paths just before the actual system include directories. Currently, if a build utilizes -isystem, Spack's headers will be injected using -I, and that effectively inserts them before any specified -isystem headers. This leads to unexpected failures. Here, we assume that if a build attempts to use -isystem (implying that the underlying compiler supports the flag) - we change to injecting the paths into the end of the -isystem paths. There is a potential concern if the build includes system paths in -I *and* uses -isystem, but I think that's probably an unlikely usage. See: https://gcc.gnu.org/onlinedocs/gcc/Directory-Options.html Fixes spack#14488, Fixes spack#14234
Which variable does Bazel use to use TMPDIR? |
The logic for setting the temporary directory can be seen at following location in py-tensorflow's recipe : |
I wish there was some way we could use the same stage directory that Spack uses for building, but it won't work if that directory uses NFS. |
Thanks, @adamjstewart I could change tmp dir from TensorFlow recipe. but now I'm seeing errors like "Spack compiler must be run from Spack! Input 'SPACK_ENV_PATH' is missing." |
Are you using a non-Spack-installed bazel? Spack adds a patch to bazel to allow our compiler wrappers to work. Apparently there's a better way to do this by making some kind of Spack toolchain for bazel, but I haven't had the time to investigate. |
@adamjstewart I have tried both ways non-spack-bazel and spack-installed-bazel but nothing has worked for me yet. |
Hmm, the |
Steps to reproduce the issue
Platform and user environment
Please report your OS here:
Compiler: gcc@9.2.0
Additional information
py-cython
as a build dep as suggested in Add new TensorFlow package #13112 (comment) did not helpcc @adamjstewart
The text was updated successfully, but these errors were encountered: