Skip to content

3rd EasyBuild hackathon meeting minutes day 1

boegel edited this page Mar 21, 2013 · 11 revisions
Clone this wiki locally

(Monday Mar. 11th 2013, 10am-6pm)

The first day of the 3rd EasyBuild hackathon consisted of presentations, discussions and initial hands-on experience with EasyBuild for attendees new to the tool. These notes were mainly taken by Kenneth and Jens, with contributions by Fotis.

Attendees

  • Kenneth Hoste (HPC-UGent, EasyBuild developer and release manager)
  • Jens Timmerman (HPC-UGent, EasyBuild developer)
  • Fotis Georgatos (University of Luxembourg, HPC sysadmin and active contributor)
  • Jens Wiegand (The Cyprus Institute, LinkSCEEM project manager)
  • Thekla Loizou (The Cyprus Institute, HPC user support)
  • George Tsouloupas (The Cyprus Institute, HPC sysadmin and user support)
  • Stelios Erotokritou (The Cyprus Institute, HPC user support/PRACE)
  • Mohamed Gafaar (Bibliotheca Alexandrina, HPC sysadmin/user support)
  • Dina Mahmoud Ibrahim (Cairo University, HPC sysadmin/user support)
  • Alan O'Cais (Jülich Supercomputing Centre, HPC user support & LinkSCEEM)
  • Alexander Schnurpfeil (Jülich Supercomputing Center, HPC user suppot)
  • Nicolas Kanaris (The Cyprus Institute, HPC user (OpenFOAM))
  • George Fanourgakis (The Cyprus Institute, HPC user, molecular dynamics)
  • Demetris Charalambous (Cyprus Meteorological Servicei, HPC user support?, weather forcecasting (WRF, ...))
  • Ioanna Kalvari (University of Cyprus (bioinformatics), HPC user)
  • Ioannis Kirmitzoglou (University of Cyprus (bioinformatics), HPC user)
  • Adam DeConinck (NVIDIA Corporation, HPC sysadmin) [remote via Skype]

Program

  • [10am-10.10am] presentation on LinkSCEEM project
  • [10.10am-10.15am] introduction round: who's who?
  • [10.15am-12am] presentation on EasyBuild: Building Software With Ease
  • [1.30pm - 1.45pm] presentation by Jülich Supercomputing Cente (JSC) on current activities and plans with EasyBuild
  • [1.45pm - 2.30pm] presentation by Cyprus Institute on current activities and plans with EasyBuild
  • [2.30pm - 5pm] discussions + initial hands-on experience with EasyBuild
  • [5pm - 6pm] presentation by NVIDIA: Introduction to the CUDA Toolkit for Building Applications
  • [6pm - 8pm] aftermath: discussions w.r.t. CUDA support in EasyBuild

Presentation notes

LinkSCEEM presentation (Jens W.)

  • goal of LinkSCEEM project: establish HPC ecosystem in Eastern Middle-Terranean
    • resources, training, expertise, connectivity, ...
    • online training content is very important!
    • with support from NCSA, Jülich, ..
  • www.isgtw.org
  • CSC2013 PRACE conference in Cyprus
  • Alan's subject: performance analysis and optimisation of community codes

EasyBuild presentation

slides (PDF)

questions/remarks by George T.

  • --download-only command line option is missing (see framework#538)
    • can be done indirectly now:
    • using --stop fetch, but will fail after first download failed
    • using --regtest, and 'breaking' --job so no jobs are submitted for the builds
  • EasyBuild bootstrap script [question]
    • why not include fixed Python 2.7 version in bootstrap procedure, so we have full control of Python version being used
    • add Python module as a dependency for EasyBuild
  • why does something have to be part of a toolchain, and not just a dependency? [question]
    • can toolchain be extended dynamically?
    • e.g. include dependencies in toolchain as well?
    • ictce => ictcee (e for extended)?
    • creating yet another toolchain means that whole stack of dependencies needs to be rebuilt, which is a pain [question]
    • and results in further explosion of the set of available modules
    • create big fat toolchains and filter stuff out in toolchainopts with filter option?
  • are existing modules reused if they're needed? [answer: yes]

questions/remarks by Fotis

  • supporting alternative module naming schemes (see framework#173)
    • basically just provide multiple alternative views on the existing modules - let's escape from the "one size show" concept
    • flat (cray optimal) or hierarchical (lmod optimal), all can be valid
    • hierarchical can be top-down (compiler->libraries->apps) or vice versa (software on top, that is what the users care about)
  • setting up mirrors for sources
    • initial mirror prototype already in development/use at Uni.Lu; ~36GBs of software (open source & closed)
    • The split between redistributable and non-, is unavoidable for a public service; zsync could help with either
    • --try-amend source URL should be supported via EasyBuild configuration file (see framework#462)
    • currently already supported via EASYBUILD_TRY_AMEND env var
  • Trilinos and other such multi-dep libs, should be in bold on slide with supported software
  • can build_in_install_dir be specified in easyconfig file? It might reduce needs for easyblocks in bioinfo packages (see framework#539)
  • ictce/3.2.2.u3 toolchain sources are no longer available, so use other toolchain in examples (WRF)
    • also there are bugs in icc/11.1.07* & flexlm, which are time-consuming to debug (EB processes are affected)
  • chroot/jail into installation prefix, as a proper containment solution when building software: (see framework#540)
    • it makes it much safer to build 1000s of packages coming from pkgsrc (at least, at build time!)
    • it will allow to catch osdependencies that now escape unnoticed
    • it seems to be the correct thing to do also in relation to hashdist
    • Fotis: no need to build safety features inside easyconfigs, IMHO, that has no clear benefit
  • possible directions on how to modify the current namespace (eg. think of lower-case modules):
    • customize modulefiles with sed and change what is needed (can be tricky or risky to be correct)
    • manually cultivate a symlink farm (only good for the first load, deps will be like before)

questions/remarks by Mohamed

  • can EasyBuild cope with existing modules (e.g., OpenMPI), or do they need to be rebuilt?
    • will not work out of the box, because those modules will be missing things like EBROOT, EBVERSION
    • rebuilding them is the best idea, and will enable you to roll out software again after reinstall of system with different OS

questions/remarks by Alan

  • adding support for BlueGene Q system
    • perfect timeframe since it's quite new
    • specific characteristic: crosscompilation, IBM XLC compiler, ...
  • on BlueGene systems, running of tests will need to be skipped or done differently (remotely)
    • skipping can be done with e.g. --try-amend=skipsteps=test,test_cases
  • OpenFOAM is a pain because of large difference in system characteristics
  • error/warning log parser now spits out lots of false positives (see framework#541)
    • regular expression used needs to be documented well
    • need to enhance regex to reduce amount of false positives
  • bbcp can never work if the required ports are not open
    • add a test case for this?
    • support a way of spitting out a warning about this at the end of the installation (see framework#542)

notes by Jens T.

  • configuration file (and .eb files) will not be executed anymore => needs to be followed up (w/ Stelios)
  • more stuff needs to be shoved into toolchains: Python (George T.), zlib, ...

general notes

Presentation by JSC (Alan)

  • supercomputing training portal: http://linksceem.eu/ls2/component/content/article/198
    • big fat training cluster via VMs w/ terminal emulation in web browser
    • rely on EasyBuild to get software installed on this
  • should function well in a low-bandwidth environment (important for LinkSCEEM)
  • documentation portal w.r.t. supercomputers, just collect links to useful documentation around the web
  • try and get a toolchain working for BlueGene Q systems
  • categories of software used in PRACE

Presentation by Cyprus Institute (George T./Fotis G.)

N.B. This presentation incorporates also the 3rd-party needs of Uni.Lu

  • strong commitment from Cyprus Institute to EasyBuild
  • was really useful to quickly set up a software stack
    • GPU software, even though not all apps were there
  • useful for conformity across LinkSCEEM institutions - very important
  • also for setting up post-processing nodes (w/ different OS)
  • robot should not depend on filename, but on contents of easyconfig
  • need support for multiple source paths (and dependencies? => Jens T.)
  • --download-only (from mirrors), needed in relation to jumpstart procedures
  • document jail tool (and add it to bootstrap)
  • goolf: OpenMPI >=1.5, OpenBLAS, FFTW w/ --enable-avx
  • custom variables in module files (see modextravars)
  • CUDA support: in toolchain or not?
    • different CUDA versions, dependency chain, ...
  • PGI toolchain is also important for CyI
  • OFED vs no-OFED: easy way to get rid of -no-OFED version suffix? via EB config file?
  • user environment: multiple source paths, custom version suffix for 'tagging' your own builds
  • FFTW single/double precision
    • separate module (and thus separate toolchains) vs 'fat' build
    • support both single/double in a single FFTW module requires running configure/make/make install twice; doable? Yes.
  • local climate group requirements: Ferret, ... (see George F.)
  • Python as a part of the toolchain?
  • managing multiple EB versions
    • need a guidelines and best practices for EasyBuild
    • what should sites customize? what should be left untouched (e.g. becaue it'll break in the future with new EB versions)
  • how to override default use of /tmp (set $TMPDIR, which will be picked up by Python)
  • explosion of available modules
    • we need a good way to handle that
    • in combination with flexible module naming scheme?
  • figure out exact build options that were used
    • document querying log for e.g. configure options
    • or provide tools for it (via eb)
    • will be useful in the future as part of any kind of EasyDoc activity, to post information on a website
  • issue of OS dependencies: portable way of specifying them
    • making sure we catch all dependencies (jail tool provided by HashDist)
  • goalf/ictce versioning schemes need to be documented
  • OpenMPI version: let's bump to v1.5, or later
    • ABI is guaranteed to be compatible at version onwards (ie. change on the fly the OpenMPI module version, without need to rebuild code)
    • part of new goalf (v2.x?)
  • GROMACS on top of new goalf (goolf)
  • (Fotis took over for the remainder)
  • PRACE production environment
    • largely done; mainly a set of environment variables is missing (JT: could use modextravars for that)
    • EasyBuild enables you to set up a PRACE environment in user space (huge selling point: you only need GCC/EnvMod/Py)
  • support for custom site-specific environment variables
    • fi. *_LICENSE_FILE, DEBUGGERS environment and so on;
    • see modextravars stated above
  • HPCBIOS pitch: ie. a standardization effort, collection of policies, see http://hpcbios.readthedocs.org/en/latest/
    • re-usable high-level documentation (intention is to attach it as .pdf to Users' Welcome Letters)
    • provide "standard" working environment for e.g. climate science

NVIDIA presentation (Adam)

slides (PDF)

  • [Alan] documentation for building CUDA applications provided by NVIDIA is very useful and hard to come by!
  • NVIDIA CUDA with OpenMPI: K20 + Mellanox IB
  • 'drop-in' libraries: cuBLAS
    • actually a misnomer, since it provides different function names
  • CUDA (C)
    • compilers + tools
    • nvcc should always be used, unlike mpicc which is optional (can handle linking with MPI libs yourself)
    • runtime API (libcudart.so)
    • IDE, visual profiler
    • collection of libraries (CUBLAS, CUFFT, Thrust, ...)
    • quite similar to e.g. MPI (?)
  • usually there's a configure option like --enable-cuda, but no standard
    • similar for installation prefix for CUDA, e.g. CUDA_HOME (quite similar across apps)
  • some apps ship their own runtime
    • be careful with setting LD_LIBRARY_PATH in a CUDA context
  • nvcc treats C code like C++ (!)
  • -use-fast-math mostly targets single-precision stuff
    • can be broken up into parts
  • -Xptxas=iv can always be set for having debug info
  • most MPI implementations support CUDA (except for Intel MPI)
    • things may break for older versions
    • significant performance gain if communication layer supports DMA to GPU memory (GPU Direct, requires IB QDR/DFR and Mellanox)
    • FG sidenote: ie. MVAPICH + CUDA is the likely the preferred testing ground
  • test examples: matrixMul, simpleMPI
  • CMake 2.8.7+ for good CUDA support
  • object code for correct device architecture should be used for best performance
    • make sure PTX is not being used, which is JIT-compiled and thus leads to startup performance loss
    • default is to target oldest PTX/object code: --gencode default to compute_10,code=sm_10
    • usually something is set in the application Makefile
    • OPENCCFLAGS=gencode...
    • PTXARGS=--arch,...
  • can CUDA toolkit be queried for what options are optimal for current device architecture?
  • possible options for gencode depend on CUDA compiler version
  • build for "everything under the sun"
    • will fail on older nvcc systems
    • larger binary
  • OpenACC: very similar to OpenMP (only PGI, Cray, CAPS compilers for now)
    • for PGI:
    • -acc -> use OpenACC
    • -Minfo=accel -> compile for accelerator (not CPU, which is also possible)
    • -ta=nvidia (vs AMD or Intel accelerators)
    • default architectures targetted are current GPU + major versions (1.0, 2.0, 3.0), but can be tuned via command line options
  • CUDA compiler commands depend on compiler being used, e.g. pgfortran (PGI) for CUDA Fortran
  • nvcc is actually an LLVM frontend
  • follow-up conf. call
    • packaging of CUDA toolkit (redhat, fedora, ubuntu, ...)
    • reason to not have a monolitic install is OS-specific stuff like paths for files, driver, etc.
    • "let user provide CUDA toolkit installation instead of through EasyBuild" (not a good idea)
    • FG sidenote: optimal practice @LU & @CY converges in the direction that:
      • CUDA toolkit & software installation is user space => this has to be done with easybuild, multiple versions OK.
      • driver installation must be done in root space => non-easybuild business, configuration management instead
    • use -silent for only toolkit (not driver)
    • OS-independent install package is being looked into
    • can notes for NAMD be provided as well?
    • CUDA vs Xeon Phi build process
    • Xeon Phi is via magic options in Intel compilers (cfr. PGI)
    • different modes: native mode (all on Xeon Phi), offload mode (host + Phi as accelerator), OpenACC (via pragmas in code), x86-only (run x86 binary on Phi)
  • open questions
    • standard variables for CUDA compilers (e.g. CUDA_CC)
    • Intel compilers + CUDA?
    • which gencode options should be set for CUDA-enabled toolchain?
    • performance issues with 'fat' binaries (multiple device architectures) due to instruction cache bottleneck?
    • can compiled binary be queried for which options were used? (George T.)

Hackathon notes

  • environment modules may be Tcl-only version, which only provides modulecmd.tcl
    • EasyBuild needs to be able to handle that
  • modulecmd may not be in path, but hardcoded in module
  • make bootstrap script work offline too, i.e. add option to supply it the required source tarballs
  • goolf v1.5.10
    • GCC 4.7.2
    • OpenMPI >=1.6.3 (1.7rc8 not production ready + requires GCC > v4.8!)
    • OpenBLAS 0.2.6
    • LAPACK 3.4.2
    • FFTW 3.3.3 (single/double)
    • ScaLAPACK 2.0.2
  • problems with EasyBuild bootstrap script during training exercises
    • use modulecmd help instead of -H (latter doesn't work with Tcl environment modules?)
    • warn about installing with root
    • lib vs lib64
    • offline mode for bootstrap script required, e.g. login nodes can not go online (Mohamed)
    • but workernodes can :)
Something went wrong with that request. Please try again.