Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INSTALL does not have complete cmake instructions for BlueGeneQ #390

Closed
apeyser opened this issue Jun 8, 2016 · 7 comments
Closed

INSTALL does not have complete cmake instructions for BlueGeneQ #390

apeyser opened this issue Jun 8, 2016 · 7 comments

Comments

@apeyser
Copy link
Contributor

apeyser commented Jun 8, 2016

Does not work:

cmake .. -DCMAKE_TOOLCHAIN_FILE=Platform/BlueGeneQ_XLC -DCMAKE_INSTALL_PREFIX=../install -Dstatic-libraries=OFF
module load gsl

is needed... but still does not work, because LTDL_LIBRARIES are not found.

Additionally the INSTALL says:
"It is recommended to build statically on larger BlueGene/Q systems."

Who says so? That hasn't been an issue on BGQ systems for more than a year. The problem is that the CMake code is incomplete/incorrect/hacked, and not any essential problem with dynamic loading on BGQ --- even the performance issues have been vastly ameliorated.

This becomes a higher priority issue vis-a-vis cmake, given that "cmake is bad" will become an issue on new architectures.

@tammoippen
Copy link
Contributor

Hi @apeyser,

Additionally the INSTALL says:
"It is recommended to build statically on larger BlueGene/Q systems."
Who says so?

See the official Compiling and Tuning Applications on JUQUEEN site:

Support for Shared libraries and dynamic executables

Blue Gene/Q offers the possibility to create shared libraries and dynamic executables, but in general it is not recommended to use shared libraries on JUQUEEN because loading the shared libraries can delay the startup of the dynamically linked application considerably, especially when using large partitions. Therefore, please use shared libraries only if there is no other possiblity.
See here for further information how to generate shared libraries and create dynamic executables in C and Fortran.

Also the xlc's by default compile statically.

On the LTDL issue: If -Dstatic-libraries=ON, as suggested in the INSTALL, libltdl is not needed. BTW: NEST uses libltdl do load external user modules dynamically. I agree, that there should be a paragraph discussing this:

  1. If the user wants to load external modules dynamically, she also has to compile NEST dynamically and have libltdl installed. Maybe add install instructions similar to GSL on K.
  2. Otherwise, the user can compile either statically or dynamically. If she decides to go dynamic, e.g. to run PyNEST, then she should add the flag -Dwith-ltdl=OFF.

@apeyser
Copy link
Contributor Author

apeyser commented Jun 8, 2016

@tammoippen
Thanks for identifying who's to blame for the incomplete advice.

NB: That's old, and for most applications it's an urban myth. XLC compiles by default statically to hack around bad build scripts from the early days on BGQ when GPFS hadn't been tuned properly, which then required counter-hacks to turn dynamic linking back on --- the usual bad design leading to further bad design. If you're running code that uses 8 out of 64 cores / node, worrying about a 10 second delay on startup to load dynamic libraries is worse than crazy.

NB: In fact, a thesis was just given out that shows that shared library load time is independent of number of tasks, once the partition distribution was accounted for.

Ok, I'll test that I can do what you suggest, and then we should add that to the INSTALL.

NB: By "the cmake code" I mean the cmake files shipped with cmake --- BGQ comprises 30+ files, and pkg-config is ~1000 lines of code. That's crazy!

@apeyser
Copy link
Contributor Author

apeyser commented Jun 8, 2016

@tammoippen
So with:

cmake .. -DCMAKE_TOOLCHAIN_FILE=Platform/BlueGeneQ_XLC -DCMAKE_INSTALL_PREFIX=../install -Dstatic-libraries=ON -Dwith-python=/bgsys/local/python3/3.4.2/bin/python3 -Dwith-ltdl=OFF

I get:

/opt/ibmcmp/vacpp/bg/12.1/bin/.orig/bgxlc++_r: 1501-289 (W) Option -Wall was incorrectly specified. The option will be ignored.
/bgsys/drivers/toolchain/V1R2M4_base/gnu-linux/lib/gcc/powerpc64-bgq-linux/4.4.7/../../../../powerpc64-bgq-linux/lib/libpthread.a(pthread_create.o):(.toc+0xa0): undefined reference to `_dl_stack_flags'
"/homeb/slns/slns007/local/nest/pynest/pynestkernel.cpp", line 27656.32: 1540-1102 (W) "x" might be used before it is set.
/bgsys/drivers/toolchain/V1R2M4_base/gnu-linux/lib/gcc/powerpc64-bgq-linux/4.4.7/../../../../powerpc64-bgq-linux/lib/libpthread.a(nptl-init.o): In function `__pthread_initialize_minimal_internal':
/bgsys/drivers/V1R2M4/ppc64/toolchain/gnu/glibc-2.12.2/nptl/nptl-init.c:277: undefined reference to `__libc_setup_tls'
/bgsys/drivers/toolchain/V1R2M4_base/gnu-linux/lib/gcc/powerpc64-bgq-linux/4.4.7/../../../../powerpc64-bgq-linux/lib/libpthread.a(nptl-init.o):(.toc+0x8): undefined reference to `_dl_cpuclock_offset'
/bgsys/drivers/toolchain/V1R2M4_base/gnu-linux/lib/gcc/powerpc64-bgq-linux/4.4.7/../../../../powerpc64-bgq-linux/lib/libpthread.a(nptl-init.o):(.toc+0x80): undefined reference to `_dl_init_static_tls'
/bgsys/drivers/toolchain/V1R2M4_base/gnu-linux/lib/gcc/powerpc64-bgq-linux/4.4.7/../../../../powerpc64-bgq-linux/lib/libpthread.a(nptl-init.o):(.toc+0x90): undefined reference to `_dl_wait_lookup_done'
make[2]: *** [nest/nest] Error 1
make[1]: *** [nest/CMakeFiles/nest.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
    1500-029: (W) WARNING: subprogram __pyx_fatalerror(const char *, ...) could not be inlined into __pyx_memoryview_fromslice(__Pyx_memviewslice, int, PyObject *(*)(char *), int (*)(char *, PyObject *), int).
    1500-029: (W) WARNING: subprogram __pyx_fatalerror(const char *, ...) could not be inlined into __Pyx_XDEC_MEMVIEW(__Pyx_memviewslice *, int, int).
    1500-030: (I) INFORMATION: __pyx_pf_12pynestkernel_10NESTEngine_4init(__pyx_obj_12pynestkernel_NESTEngine *, PyObject *, PyObject *): Additional optimization may be attained by recompiling and specifying MAXMEM option with a value greater than 8192.

So it looks like the pthread tool chain that I'm picking up references some dl pieces.

@tammoippen
Copy link
Contributor

@apeyser
INSTALL:142:

[...] If you need PyNEST on BlueGene/Q, you have to compile dynamically, i.e. -Dstatic-libraries=OFF [...]

pynestkernal.so has to be a shared library for Python to load it -> you can only build shared libraries with XLC if additional flags are set -> the flags become set, when we decide for -Dstatic-libraries=OFF, then the flags are set in the same way as in the original BG/Q toolchain files.

Compiling with python3 on JUQUEEN would look like this:
First, cythonize pynestkernel.pyx:

  cd <nest-src>/pynest
  /bgsys/local/python3/3.4.2/bin/cythonize pynestkernel.pyx

Then, configure NEST for BG/Q+dynamic+PyNEST-libltdl:

cd <nest-bld-dir>
cmake <nest-src> \
      -DCMAKE_TOOLCHAIN_FILE=Platform/BlueGeneQ_XLC \
      -DCMAKE_INSTALL_PREFIX=../install \
      -Dstatic-libraries=OFF \
      -Dcythonize-pynest=OFF \
      -Dwith-python=/bgsys/local/python3/3.4.2/bin/python3 \
      -DPYTHON_LIBRARY=/bgsys/local/python3/3.4.2/lib/libpython3.4m.a \
      -DPYTHON_INCLUDE_DIR=/bgsys/local/python3/3.4.2/include/python3.4m \
      -Dwith-ltdl=OFF
make
make install

I run it like this:

source <prefix>/bin/nest_vars.sh
runjob --exp-env HOME --exp-env PYTHONPATH --np 1 --ranks-per-node 1 : /bgsys/local/python3/3.4.2/bin/python3 test.py

Loading the PyNEST .so works, but some dependancies are missing, e.g. scimath, which causes loading PyNEST completely to fail .

@apeyser
Copy link
Contributor Author

apeyser commented Jun 15, 2016

@tammoippen
"Loading the PyNEST .so works, but some dependancies are missing, e.g. scimath, which causes loading PyNEST completely to fail ."

I've gotten to where you are.
Nope, it's not missing dependencies, but a broken environment. The question now is, since it was working before cmake, is it a cmake change or a library change (did some library way inside get broken and thus leading to these symptoms).

So, the old autoconf system needs to be built and tested, which is a painful process with JQ.

@heplesser
Copy link
Contributor

@apeyser Does this problem still persist?

@heplesser
Copy link
Contributor

Closing due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants