Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation on cray machines [SOLVED] #57

Closed
azrael417 opened this issue Oct 12, 2016 · 19 comments
Closed

Compilation on cray machines [SOLVED] #57

azrael417 opened this issue Oct 12, 2016 · 19 comments

Comments

@azrael417
Copy link

azrael417 commented Oct 12, 2016

I have several issues compiling GRID on a cray machine.

  1. the automatically generated Makefile in lib does not get the correct include paths:
    make[2]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
    CXX Init.o
    CXX PerfCount.o
    CXX algorithms/approx/MultiShiftFunction.o
    CXX Log.o
    CXX qcd/action/fermion/CayleyFermion5D.o
    CXX qcd/action/fermion/ContinuedFractionFermion5D.o
    CXX qcd/action/fermion/PartialFractionFermion5D.o
    CXX qcd/action/fermion/WilsonFermion.o
    CXX qcd/action/fermion/WilsonKernels.o
    CXX qcd/action/fermion/WilsonFermion5D.o
    CXX qcd/action/fermion/WilsonKernelsAsm.o
    CXX qcd/action/fermion/WilsonKernelsHand.o
    In file included from ../../src/lib/qcd/action/fermion/CayleyFermion5D.cc:32:0:
    ../../src/lib/Grid.h:62:46: fatal error: Grid/serialisation/Serialisation.h: No such file or directory
    #include <Grid/serialisation/Serialisation.h>
    That I could fix by manually adding -I's in the generated makefile

  2. the lib compilation uses gcc/g++ and not the compiler I selected. I want to use the cray wrappers cc/CC to enable cray mpi, but I got:
    make[1]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
    CXX Init.o
    In file included from ../../src/include/Grid/Communicator.h:31:0,
    from ../../src/lib/Grid.h:72,
    from ../../src/lib/Init.cc:44:
    ../../src/include/Grid/communicator/Communicator_base.h:35:17: fatal error: mpi.h: No such file or directory
    #include <mpi.h>
    That is expected, as gcc does not know about mpi. Please fix this so that CC/CXX is actually CC/CXX specified by the user

  3. after fixing that, the next error is:
    ../../src/include/Grid/Stencil.h(276): error: a value of type "Grid::iScalar<Grid::iVector<Grid::iVector<Grid::vComplexF, 3>, 2>> *" cannot be used to initialize an entity of type "uint64_t={unsigned long}"
    uint64_t cbase = & comm_buf[0];
    to me this looks like a some implicit casting the intel compiler does not like. I think an explicit typecast would be healthy here.

  4. after fixing these things, I finally get:
    make[2]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
    make[2]: *** No rule to make target 'simd/Grid_empty.h', needed by 'all-am'. Stop.
    That I don't know how to solve. Please advise.

Best
Thorsten

@coppolachan
Copy link
Collaborator

coppolachan commented Oct 12, 2016

Hello Thorsten,

  1. did you run the bootstrap.h script to generate the configure?
  2. What is your configure command line?
    The configure command line should be
    CXX=<your compiler> ./configure ....
  3. We acknowledge this, a fix will be released soon.
  4. I think it is related to 1)

@paboyle
Copy link
Owner

paboyle commented Oct 12, 2016

  1. Not surprised. Antonin reworked the build system and I was worried about the complexities
    of modules and CC wrappers on the Crays and I worried something might go wrong.

  2. did you specify CXX=CC. I'm able to override CXX=mpicxx or example on other machines,
    or CXX=clang++-3.9 and am surprised you weren't able to override.

That said, Shoji seems to be able to compile on an XC40 just fine (except the missing typecast I committed last night). Please try develop again on that.

  1. Please specify the full configure command line, the configure output, and
    the output from "make V=1"

  2. Personally, I don't overly like default hiding of the compile flow details
    e.g.
    CXX Benchmark_comms.o
    CXXLD Benchmark_comms

and would prefer not to by default hide since there really is complexity and pretending it
is all magic just makes things harder to debug. It will be the death of open source.

But others disagree with me so feedback from many people welcome to get a feel
for the average opinion.

@paboyle
Copy link
Owner

paboyle commented Oct 12, 2016

p.s. I committed a patch to Travis for the typecast in Stencil.h

Important : are any of the NERSC Cray systems available for remote login and compile just now?

It would be good if we could try it ourselves, especially since Travis provides neither
the Intel compiler, nor the Cray wrappers so this is very hard to catch in our continuous
integration framework.

@aportelli aportelli added this to the 0.6.0 milestone Oct 12, 2016
@aportelli aportelli self-assigned this Oct 12, 2016
@aportelli
Copy link
Collaborator

Hi Thorsten,

We cannot really help you if we don't have the specifics of the build. Please:

  • confirm that you are using the HEAD of develop
  • give us the configure command line
  • give us the configure summary (at the end of configure output)
  • give us the config.log file
  • give us the output of make V=1

@azrael417
Copy link
Author

azrael417 commented Oct 19, 2016

Hello, using mpicxx or something related is not a good option as that would basically disable priority access to Aries interconnect. To my knowledge, there is no good way of circumventing the cray wrappers and static linking when one wants a good performance at scale on a XC-40.

@azrael417
Copy link
Author

azrael417 commented Oct 19, 2016

Here are my build details:

commit:
commit 7af9b8731847667eaf3b2e33a2457b977a7254ae Author: paboyle <paboyle@ph.ed.ac.uk> Date: Tue Oct 18 09:51:37 2016 +0100

build script:

#!/bin/bash

installpath=$(pwd)/install/grid_dp

mkdir -p build

cd build
../src/configure --prefix=${installpath} \
    --enable-simd=AVX512MIC \
    --enable-precision=double \
    --enable-comms=mpi \
    --host=x86_64-unknown-linux \
    CXX="CC" \
    CXXFLAGS="-mkl -xMIC-AVX512 -std=c++11 -I/project/projectdirs/mpccc/tkurth/NESAP/GRID/src/include" \
    CC="cc" \
    CFLAGS="-mkl -xMIC=AVX512 -std=c99 -I/project/projectdirs/mpccc/tkurth/NESAP/GRID/src/include" \
    LDFLAGS="-mkl -lmemkind"

make -j12

cd ..

configure output:

[tkurth@cori08 src (develop)]$ cat config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Grid configure 1.0, which was
generated by GNU Autoconf 2.63.  Invocation command line was

  $ ./configure 

## --------- ##
## Platform. ##
## --------- ##

hostname = cori12
uname -m = x86_64
uname -r = 3.12.51-52.39-default
uname -s = Linux
uname -v = #1 SMP Fri Jan 15 20:03:12 UTC 2016 (16f5bac)

/usr/bin/uname -p = x86_64
/bin/uname -X     = unknown

/bin/arch              = x86_64
/usr/bin/arch -k       = unknown
/usr/convex/getsysinfo = unknown
/usr/bin/hostinfo      = unknown
/bin/machine           = unknown
/usr/bin/oslevel       = unknown
/bin/universe          = unknown

PATH: /usr/common/software/darshan/3.0.1/bin
PATH: /usr/common/software/bin
PATH: /usr/common/mss/bin
PATH: /usr/common/nsg/bin
PATH: /global/homes/t/tkurth/MODULES/spack/bin
PATH: /usr/common/software/intel/compilers_and_libraries_2017.0.064/linux/bin/intel64
PATH: /opt/cray/pe/mpt/7.4.0/gni/bin
PATH: /opt/cray/rca/1.0.0-6.21/bin
PATH: /opt/cray/alps/6.1.3-17.12/sbin
PATH: /opt/cray/job/1.5.5-3.58/bin
PATH: /opt/cray/pe/pmi/5.0.10-1.0000.11050.0.0.ari/bin
PATH: /opt/cray/pe/craype/2.5.5/bin
PATH: /opt/cray/pe/modules/3.2.10.4/bin
PATH: /usr/syscom/nsg/sbin
PATH: /usr/syscom/nsg/bin
PATH: /opt/modules/3.2.6.7/bin
PATH: /global/homes/t/tkurth/bin
PATH: /usr/local/bin
PATH: /usr/bin
PATH: /bin
PATH: /usr/bin/X11
PATH: /usr/games
PATH: /usr/lib/mit/bin
PATH: /usr/lib/mit/sbin
PATH: /opt/cray/pe/bin
PATH: /global/homes/t/tkurth/src/xmldiff/bin


## ----------- ##
## Core tests. ##
## ----------- ##


## ---------------- ##
## Cache variables. ##
## ---------------- ##

ac_cv_env_CCC_set=
ac_cv_env_CCC_value=
ac_cv_env_CC_set=
ac_cv_env_CC_value=
ac_cv_env_CFLAGS_set=
ac_cv_env_CFLAGS_value=
ac_cv_env_CPPFLAGS_set=
ac_cv_env_CPPFLAGS_value=
ac_cv_env_CXXCPP_set=
ac_cv_env_CXXCPP_value=
ac_cv_env_CXXFLAGS_set=
ac_cv_env_CXXFLAGS_value=
ac_cv_env_CXX_set=
ac_cv_env_CXX_value=
ac_cv_env_LDFLAGS_set=
ac_cv_env_LDFLAGS_value=
ac_cv_env_LIBS_set=
ac_cv_env_LIBS_value=
ac_cv_env_build_alias_set=
ac_cv_env_build_alias_value=
ac_cv_env_host_alias_set=
ac_cv_env_host_alias_value=
ac_cv_env_target_alias_set=
ac_cv_env_target_alias_value=

## ----------------- ##
## Output variables. ##
## ----------------- ##

ACLOCAL=''
AMDEPBACKSLASH=''
AMDEP_FALSE=''
AMDEP_TRUE=''
AMTAR=''
AUTOCONF=''
AUTOHEADER=''
AUTOMAKE=''
AWK=''
BUILD_CHROMA_REGRESSION_FALSE=''
BUILD_CHROMA_REGRESSION_TRUE=''
BUILD_COMMS_MPI_FALSE=''
BUILD_COMMS_MPI_TRUE=''
BUILD_COMMS_NONE_FALSE=''
BUILD_COMMS_NONE_TRUE=''
BUILD_COMMS_SHMEM_FALSE=''
BUILD_COMMS_SHMEM_TRUE=''
BUILD_ZMM_FALSE=''
BUILD_ZMM_TRUE=''
CC=''
CCDEPMODE=''
CFLAGS=''
CPPFLAGS=''
CXX=''
CXXCPP=''
CXXDEPMODE=''
CXXFLAGS=''
CYGPATH_W=''
DEFS=''
DEPDIR=''
ECHO_C=''
ECHO_N='-n'
ECHO_T=''
EGREP=''
EXEEXT=''
GREP=''
INSTALL_DATA=''
INSTALL_PROGRAM=''
INSTALL_SCRIPT=''
INSTALL_STRIP_PROGRAM=''
LDFLAGS=''
LIBOBJS=''
LIBS=''
LTLIBOBJS=''
MAKEINFO=''
MKDIR_P=''
OBJEXT=''
OPENMP_CXXFLAGS=''
PACKAGE=''
PACKAGE_BUGREPORT='paboyle@ph.ed.ac.uk'
PACKAGE_NAME='Grid'
PACKAGE_STRING='Grid 1.0'
PACKAGE_TARNAME='grid'
PACKAGE_VERSION='1.0'
PATH_SEPARATOR=':'
RANLIB=''
SET_MAKE=''
SHELL='/bin/sh'
SIMD_FLAGS=''
STRIP=''
USE_LAPACK_FALSE=''
USE_LAPACK_LIB_FALSE=''
USE_LAPACK_LIB_TRUE=''
USE_LAPACK_TRUE=''
VERSION=''
ac_ct_CC=''
ac_ct_CXX=''
am__fastdepCC_FALSE=''
am__fastdepCC_TRUE=''
am__fastdepCXX_FALSE=''
am__fastdepCXX_TRUE=''
am__include=''
am__isrc=''
am__leading_dot=''
am__quote=''
am__tar=''
am__untar=''
bindir='${exec_prefix}/bin'
build=''
build_alias=''
build_cpu=''
build_os=''
build_vendor=''
datadir='${datarootdir}'
datarootdir='${prefix}/share'
docdir='${datarootdir}/doc/${PACKAGE_TARNAME}'
dvidir='${docdir}'
exec_prefix='NONE'
host=''
host_alias=''
host_cpu=''
host_os=''
host_vendor=''
htmldir='${docdir}'
includedir='${prefix}/include'
infodir='${datarootdir}/info'
install_sh=''
libdir='${exec_prefix}/lib'
libexecdir='${exec_prefix}/libexec'
localedir='${datarootdir}/locale'
localstatedir='${prefix}/var'
mandir='${datarootdir}/man'
mkdir_p=''
oldincludedir='/usr/include'
pdfdir='${docdir}'
prefix='NONE'
program_transform_name='s,x,x,'
psdir='${docdir}'
sbindir='${exec_prefix}/sbin'
sharedstatedir='${prefix}/com'
sysconfdir='${prefix}/etc'
target=''
target_alias=''
target_cpu=''
target_os=''
target_vendor=''

## ----------- ##
## confdefs.h. ##
## ----------- ##

#define PACKAGE_NAME "Grid"

configure: caught signal 2
configure: exit 1

environment:
`[tkurth@cori08 src (develop)]$ module list
Currently Loaded Modulefiles:

  1. modules/3.2.6.7 7) udreg/2.3.2-4.6 13) job/1.5.5-3.58 19) craype-mic-knl
  2. nsg/1.2.0 8) ugni/6.0.12-2.1 14) dvs/2.5_0.9.0-2.155 20) cray-shmem/7.4.0
  3. modules/3.2.10.4 9) pmi/5.0.10-1.0000.11050.0.0.ari 15) alps/6.1.3-17.12 21) cray-mpich/7.4.0
  4. craype-network-aries 10) dmapp/7.1.0-12.37 16) rca/1.0.0-6.21 22) intel/17.0.0.098
  5. craype/2.5.5 11) gni-headers/5.0.7-3.1 17) atp/2.0.2 23) altd/2.0
  6. cray-libsci/16.06.1 12) xpmem/0.1-4.5 18) PrgEnv-intel/6.0.3 24) cray-memkind`

@azrael417
Copy link
Author

azrael417 commented Oct 19, 2016

That is the makefile output (relevant part)

make  all-am
make[2]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
depbase=`echo Init.o | sed 's|[^/]_$|.deps/&|;s|.o$||'`;\
g++ -DHAVE_CONFIG_H -I. -I../../src/lib    -I/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/src/include -mavx512f -mavx512pf -mavx512er -mavx512cd -fopenmp  -O3  -std=c++11 -MT Init.o -MD -MP -MF $depbase.Tpo -c -o Init.o ../../src/lib/Init.cc &&\
mv -f $depbase.Tpo $depbase.Po
g++: error: unrecognized command line option '-mavx512f'
g++: error: unrecognized command line option '-mavx512pf'
g++: error: unrecognized command line option '-mavx512er'
g++: error: unrecognized command line option '-mavx512cd'
Makefile:1059: recipe for target 'Init.o' failed
make[2]: *_\* [Init.o] Error 1
make[2]: Leaving directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
Makefile:784: recipe for target 'all' failed
make[1]: **\* [all] Error 2
make[1]: Leaving directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
Makefile:369: recipe for target 'all-recursive' failed
make: **\* [all-recursive] Error 1

it tries using g++/gcc, not CC/cc

@coppolachan
Copy link
Collaborator

  1. The config.log would be useful
  2. also the final summary of the output of the configure step.
  3. What version is your gcc, and why you are not using the intel compiler?

@azrael417
Copy link
Author

I have added the config log.
I want to use the cray compiler wrappers for intel CC/cc when PrgEnv-intel is loaded, so CC points to icpc and cc points to icc, but it wants to take gnu instead for the lib build. I think this is a bug. It should use the compiler selected by the user.

@coppolachan
Copy link
Collaborator

coppolachan commented Oct 19, 2016

We need the config log after your configure, the one posted says that the run command was just
./configure
I do not think this is the one you ran.

@azrael417
Copy link
Author

Oh, maybe my build script is buggy

Am 19.10.2016 um 13:09 schrieb Guido Cossu notifications@github.com:

We need the config log after your configure, the one posted says that the run command was just
./configure
I do no think this is the one you ran.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@azrael417
Copy link
Author

azrael417 commented Oct 19, 2016

Here it is (updated, now memkind loaded)

config.txt

@coppolachan
Copy link
Collaborator

This looks still a problem in the environment
/usr/bin/ld: cannot find -lmemkind

configure correctly recognized icpc but some libs are missing, maybe not in the libraries path.

@azrael417
Copy link
Author

I did that before but now I switched to intel 2016 and it seems to work. Previously I used a 2017 beta.

@azrael417
Copy link
Author

azrael417 commented Oct 19, 2016

Ok, we can close the issue, seems to work.

Before we do that: the bin folder only contains the Benchmarks, is that right?
Additionally, shall I run some benchmarks on Cori Peter? If yes, which ones are you most interested in?

@azrael417
Copy link
Author

one last thing: I can the comms benchmark test with:

srun -n 64 -c 68 --cpu_bind=cores numactl -p 1 ./Benchmark_comms --threads 64 --mpi 2.2.4.4 --grid 128.128.128.128

and it ran the test but when trying to compute the summary:

Grid : Message        : 24906 ms : 30       4       10368000        1198.7      2397.41
Grid : Message        : 26629 ms : 30       8       20736000        1199.82     2399.64
Grid : Message        : 30097 ms : 30       16      41472000        1208.64     2417.28
Grid : Message        : 30599 ms : 32       1       3145728     710.73      1421.46
Grid : Message        : 31088 ms : 32       2       6291456     1173.55     2347.1
Grid : Message        : 32166 ms : 32       4       12582912        1159.37     2318.73
Grid : Message        : 34211 ms : 32       8       25165824        1231.09     2462.19
Grid : Message        : 38414 ms : 32       16      50331648        1210.31     2420.63
Grid : Message        : 38500 ms : ====================================================================================================
Grid : Message        : 38500 ms : = Benchmarking sequential halo exchange in 4 dimensions
Grid : Message        : 38500 ms : ====================================================================================================
Grid : Message        : 38500 ms :   L           Ls         bytes       MB/s uni        MB/s bidi
srun: error: nid12126: task 23: Floating point exception

are the parameters chosen wrong?

Am 19.10.2016 um 13:35 schrieb Guido Cossu notifications@github.com:

This looks still a problem in the environment
/usr/bin/ld: cannot find -lmemkind

configure correctly recognized icpc but some libs are missing, maybe not in the libraries path.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub #57 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/ABAQ5pxFhrbAXfFCD_9QcIc0XM_gn3Ycks5q1n8PgaJpZM4KUY1L.

@coppolachan
Copy link
Collaborator

Can I ask you a couple of things?
Since the compilation issue seems solved can you summarize in few lines the solution, including your environment and the configure command?
It would be a good reference for other people in the same situation.

Could you open a new thread for the last request?

@azrael417
Copy link
Author

What worked was to execute bootstrap.sh, then having the following modules loaded:

`[tkurth@cori08 ~]$ module list
Currently Loaded Modulefiles:

  1. modules/3.2.6.7 9) pmi/5.0.10-1.0000.11050.0.0.ari 17) atp/2.0.2
  2. nsg/1.2.0 10) dmapp/7.1.0-12.37 18) PrgEnv-intel/6.0.3
  3. modules/3.2.10.4 11) gni-headers/5.0.7-3.1 19) craype-mic-knl
  4. craype-network-aries 12) xpmem/0.1-4.5 20) cray-shmem/7.4.0
  5. craype/2.5.5 13) job/1.5.5-3.58 21) cray-mpich/7.4.0
  6. cray-libsci/16.06.1 14) dvs/2.5_0.9.0-2.155 22) intel/17.0.0.098
  7. udreg/2.3.2-4.6 15) alps/6.1.3-17.12 23) altd/2.0
  8. ugni/6.0.12-2.1 16) rca/1.0.0-6.21 24) cray-memkind`

Then running the following configure

../src/configure --prefix=${installpath} \
    --enable-simd=AVX512MIC \
    --enable-precision=double \
    --enable-comms=mpi \
    --host=x86_64-unknown-linux \
    CXX="CC" \
    CXXFLAGS="-mkl -xMIC-AVX512 -std=c++11 -I/project/projectdirs/mpccc/tkurth/NESAP/GRID/src/include" \
    CC="cc" \
    CFLAGS="-mkl -xMIC=AVX512 -std=c99 -I/project/projectdirs/mpccc/tkurth/NESAP/GRID/src/include" \
    LDFLAGS="-mkl -lmemkind"

The -lmemkind linking can be required because in intel 2016 and 2017, MKL makes use of hbw_alloc calls but does not link against libmemkind by default. So if you don't link and run the "wrong routine", then you will see segfaults.
In MKL 2017, this problem was solved for most of the standard BLAS/LAPACK routines but not for the newly introduced Deep Learning optimized routines (such as convolutional routines, pooling routines etc.). I think GRID does not use any of that but it is good to make sure and link properly from the begin with.

@coppolachan coppolachan changed the title Compilation on cray machines Compilation on cray machines [SOLVED] Oct 20, 2016
@paboyle
Copy link
Owner

paboyle commented Oct 26, 2016

g++ is not yet known good on AVX512 intrinsics for us. Can you try current develop with ICPC?
Peter

On 19 Oct 2016, at 16:47, Thorsten Kurth notifications@github.com wrote:

That is the makefile output (relevant part)

``make[1]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
make all-am
make[2]: Entering directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
depbase=echo Init.o | sed 's|[^/]$|.deps/&|;s|.o$||'`;
g++ -DHAVE_CONFIG_H -I. -I../../src/lib -I/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/src/include -mavx512f -mavx512pf -mavx512er -mavx512cd -fopenmp -O3 -std=c++11 -MT Init.o -MD -MP -MF $depbase.Tpo -c -o Init.o ../../src/lib/Init.cc &&
mv -f $depbase.Tpo $depbase.Po
g++: error: unrecognized command line option '-mavx512f'
g++: error: unrecognized command line option '-mavx512pf'
g++: error: unrecognized command line option '-mavx512er'
g++: error: unrecognized command line option '-mavx512cd'
Makefile:1059: recipe for target 'Init.o' failed
make[2]: ** [Init.o] Error 1
make[2]: Leaving directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
Makefile:784: recipe for target 'all' failed
make[1]: *** [all] Error 2
make[1]: Leaving directory '/global/project/projectdirs/mpccc/tkurth/NESAP/GRID/build/lib'
Makefile:369: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1

it tries using g++/gcc, not CC/cc

You are receiving this because you commented.
Reply to this email directly, view it on GitHub #57 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AHMczV8wec23JYmBchmadXvgYlKsAUvCks5q1juSgaJpZM4KUY1L.

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants