Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stabilize Cray support #1390

Closed
15 of 17 tasks
boegel opened this issue Sep 15, 2015 · 37 comments
Closed
15 of 17 tasks

stabilize Cray support #1390

boegel opened this issue Sep 15, 2015 · 37 comments
Milestone

Comments

@boegel
Copy link
Member

boegel commented Sep 15, 2015

optional:

  • generate .pc (pkgconfig) files via Cray tool
    cc @pforai, @gppezzi, @bavier
  • add check to make sure the environment is in the 'starting state', if a Cray toolchain is used

not (for now):

test systems

  • CSCS
    • XC30: Daint
    • XC40: Dora
    • XC30: Santis (TDS Daint)
    • XC40: Brisi (TDS Dora)
  • CSC.fi
    • XC40: Sisu
  • Cray
    • XC40: Swan
@boegel boegel added this to the v2.4.0 milestone Sep 16, 2015
@fgeorgatos
Copy link
Collaborator

ref. <year>.<month>-<variant> :

fyi. in fact, the YYYYMMDD tag seen elsewhere is effectively trying to meet this. fi. the DD is practically merely a variant and selection of it has no better intention other than differentiating cases, with the occasional benefit of sorting along what is latest (which is much more of a side-effect). An advantage of this technique is that it would align well with (potential) future daily CI processes. With automation, builds are cheap - you may as well put the silicon do the grunt work.

@boegel
Copy link
Member Author

boegel commented Sep 16, 2015

The motivation for the <year>.<month> bit is that Cray apparently updates their programming environment (PE) more-or-less monthly (according to Tim R.).

The -<variant> part should be descriptive, to discriminate between different flavors like GPGPU, Phi (or both), or even haswell vs sandybridge (once we include a toolchain option to specify which craype-* module should be loaded).

I'm sure we can come up with a scheme that benefits from the sorting aspect, yet has a mapping to real world stuff like monthly updating of the PE by Cray.

@boegel boegel modified the milestones: v2.4.0, v2.5.0 Oct 28, 2015
@boegel
Copy link
Member Author

boegel commented Nov 18, 2015

feedback from Cray guy at Cray booth at SC15:

  • CrayGNU could swap a particular version of gcc/libsci after loading PrgEnv module
  • Cray-provided CUDA via cray-accel-nvidia35 (which loads craype-libsci_acc and cudatoolkit)
  • cray-mpich should be part of CrayGNU
  • RPM database of Cray modules => generate external modules metadata file

@pforai
Copy link
Contributor

pforai commented Nov 18, 2015

Kewl stuff!

How do we know wich version of cray-mpich or libsci we want? Also consider that even GCC is loaded through defaults! So first figure out version of gcc (for CrayGNU) and then libsci and cray-mpich. I think you also should (as in all caps SHOULD) swap the libsci instead of unload/load due to Cray module file internal machinery.

@boegel
Copy link
Member Author

boegel commented Nov 18, 2015

@pforai: picking a version of GCC, libsci and cray-mpich isn't ay different on the Cray than on others systems, as it turns out, other than Cray providing the modules for it

Basically, we should go with the latest recommended version available at that time, based on the Cray documentation.

@pforai
Copy link
Contributor

pforai commented Nov 19, 2015

After Speaking to Heidi Poxon from Cray Inc. we have more pointers that are valuable:

  • Cray either provides or could provide a "blessed" list of software that should be used with a specific PE version on the system that is made available in a machine readable format after PE upgrades have been installed on the system /cc @gppezzi

@gppezzi
Copy link
Contributor

gppezzi commented Nov 24, 2015

I've discussed with someone from Cray today (as we upgraded Piz Daint this week) and I was told that the version after PrgEnv-* is just a hint about the CLE version installed. It has absolutely no relation with the programming environment and they recommend to keep the latest as default when we upgrade, even if we didn't change other of our defaults (gcc/cuda/libsci).

This scheme completely breaks our current easyconfig files, because at some point we will switch the default for cuda from 6.5 to 7.0 and we will keep the same PrgEnv version. So we will probably need a solution for that at short-term.

I think we can either implement it doing swaps after loading the PrgEnv as you mentioned above or create our own 'blessed version list'. The swap solution is flexible but I don't know how much trouble is to implement it and a static list should work for most cases (at least for us is not a problem to maintain that).

This is actually what we did for CS-STORM: we created a copy of the PrgEnv module and pinned the dependency versions for each specific PE. We could do the same for the XC series (while waiting for a solution from Cray) and assign a version based on the PE release: "15.11" for Nov/2015 (or use whatever is the standard for versioning on EB). My question is how to ship such a modulefile with EB?

Any thoughts?

@pforai
Copy link
Contributor

pforai commented Nov 24, 2015

@gppezzi I think I can issue you a PR some time this evening, as I'm jetlagged anyway and cant sleep properly. Regarding the implementation, we MUST swap after the PE module is swapped as this is the only Cray supported way to have a working environment. That means first we figure out wich PE is loaded, then swap that to the desired PE and then swap each of the other components (compiler, MPI, libsci, etc.) as well. This is required by the internal structure of the modules since they contain the logic how to swap to eg the MPI that was compiled with a different compiler than the perviously loaded one etc.

No comments from Heidi et al from Cray at this time. Heidi Poxon from Cray was telling us that they keep an internal list of blessed versions for each PE release that they use for testing of builds. The version list is different to the PE release notes in the sense that PE relnotes only contain NEW components modified within that PE release and that the blessed verison is the full stack of components and their versions. That is roughly the gist of what she was telling me and @boegel.

@boegel we should ping Brett Bode about access to Blue Waters btw.

@boegel
Copy link
Member Author

boegel commented Nov 24, 2015

We should definitely pin down the versions of all components involved in our Cray* toolchains; this should go together with a different versioning scheme for Cray* toolchains (see above).

For the PrgEnv module, we should just unload all known PrgEnv modules (versionless) in the Cray* toolchain module itself (cfr. the easyblock for Cray toolchains we were working on during the hackathon).

For other components, where we typically only need to change the version, we should swap in the module we want, for example module swap fftw fftw/3.3.4.2. This should probably also make it in the Cray* toolchain module we generate, since you want to make sure that loading that module works when done by users too.

@pforai
Copy link
Contributor

pforai commented Nov 24, 2015

Why unload? We should swap PE, and the components.

OK, if a user loaded the EB generated module the environment should be the same as EB prepares it. This makes sense.

@boegel
Copy link
Member Author

boegel commented Nov 24, 2015

The thing is that this should be done by the Cray* module, not by EB, otherwise loading the Cray* module will not work for users...

Unloading all PrgEnv-* modules, and then loading the one we want, is basically equivalent to a swap (unloading something that is not loaded always works).

@pforai
Copy link
Contributor

pforai commented Nov 24, 2015

I wouldn't trust Cmod that module swap X/1.2.3 to Y/1.2.3 is the same as unload X/1.2.3, load Y/1.2.3. Will need to play with this, but I agree on the fact that once you manually load the EB module that it should work as you expect it to be and not rely on TC internal machinery to change the env implicitly behind closed curtains ( ie not running without debug and knowing internals).

@boegel
Copy link
Member Author

boegel commented Nov 24, 2015

@pforai: as far as I know, a swap is literally an unload followed by a load; we can ask our expert ;-)

@pforai
Copy link
Contributor

pforai commented Nov 24, 2015

I was just thinking of looping in Robert here, but my assumption was that the answer is 'it depends' and that stuff isn't as simple as it might seemingly appear ;)

@boegel
Copy link
Member Author

boegel commented Nov 25, 2015

@rtmclay: Is a module swap A B always equivalent to module unload A; module load B (both with tmod and Lmod) or are things more complex than that?

@rtmclay
Copy link

rtmclay commented Nov 26, 2015

Yes it is exactly the same. I just checked Lmod and the pure TCL version of Tmod.

@gppezzi
Copy link
Contributor

gppezzi commented Dec 14, 2015

At CSCS most of our builds work only with dynamic = true (see craype.py). I don't know if that is the case for other sites, but, if nobody sees a problem, maybe it would be better to change the default at toolchain level to True. What do you think?

@boegel boegel modified the milestones: v2.5.0, v2.6.0 Dec 14, 2015
@boegel
Copy link
Member Author

boegel commented Dec 15, 2015

@gppezzi: makes sense to me... @pforai?

@pforai
Copy link
Contributor

pforai commented Dec 15, 2015

Yeah, we should set this as a default but give a heads up to the user that this is the default. Old school Cray users might expect that platform defaults (static) are also used by EB.

@pforai
Copy link
Contributor

pforai commented Mar 25, 2016

After the 11th Hackathon here at CSCS we should revisit the functionality to 'unwrap' the wrapper and use the contained compiler directly. There seems to be a case to do this for VTK on Cray.
/cc @gppezzi

@gppezzi
Copy link
Contributor

gppezzi commented Mar 25, 2016

Please close this issue, it makes me cry. thanks. ;)

@boegel boegel modified the milestones: v2.7.0, v2.8.0 Apr 1, 2016
@boegel boegel modified the milestones: v2.8.0, v2.9.0 May 10, 2016
@robertdfrench
Copy link

Hi! We are interested in evaluating EasyBuild on Titan (Cray XK7). Is there anything I can do to contribute to this issue?

@pforai
Copy link
Contributor

pforai commented May 17, 2016

Just bootstrap and start building ;)

Ping @boegel, me or @gppezzi in case you get stuck! Also Brett Bode (@brettbode) should have some experience building software with EasyBuild on BlueWaters.

@boegel
Copy link
Member Author

boegel commented May 17, 2016

@robertdfrench a pending update of the documentation w.r.t. Cray support is at http://gppezzi-eb.readthedocs.io/en/cray/Cray_support.html .

Like @pforai mentioned: give it a shot, let us know how it turns out.
See http://easybuild.readthedocs.io/en/latest/Installation.html#bootstrapping-easybuild for documentation on the bootstrap installation procedure (easiest way to get started quickly).

The EasyBuild mailing list (https://lists.ugent.be/wws/info/easybuild) is a good place to share experiences or discuss problems.

Maybe it's also interesting to set up a short conf call together with @gppezzi.

To contribute back beyond feedback: try new builds on top of the Cray* toolchains, and issue pull requests for their easyconfigs if they work out.

@robertdfrench
Copy link

Cool, this seems to make sense. What is the canonical way to determine which versions of fftw, cray-netcdf, etc. belong to each version of a PrgEnv? For example, I have PrgEnv-gnu/5.2.82, but the packages I have tried (CP2K, Gromacs, WRF as suggested on the Cray support page) seem to expect different (some newer, some older) versions of Cray-provided libraries than what we have.

I reckon I could resolve this by producing new easyconfigs for each of these packages which specify the versions of the libraries we happen to have installed, but I'm sure I am reaching for that because I don't know a better route. Is there a more "EasyBuild-like" way to capture a differing set of cray libraries for, say, WRF in particular?

(Also, should I move this question to a different venue?)

Thanks so much for your time, this is a very interesting tool!

@boegel
Copy link
Member Author

boegel commented May 17, 2016

@robertdfrench typically, they are the latest version on the system that the easyconfig file was created on...

Unfortunately, the versions that are available are very different between Cray sites, despite that in theory everyone should be able to provide the latest versions of those modules.

I think the intention is to work towards a more community-oriented way of picking the versions of the dependent Cray modules.

Maybe @gppezzi can shed some light on this? He's been at the forefront of the Cray support in EasyBuild in CSCS (and likely the person you saw talking at CUG'16?).

@gppezzi
Copy link
Contributor

gppezzi commented May 17, 2016

@robertdfrench PrgEnv-gnu/5.2.82 is only saying which version of CLE you are running, so it won't help you to discover which versions of the PE dependencies you have available.

To find that out, you need to either ask your sysadmins or compare the module avail list with the PE release notes.

BTW, from 2016.04 Cray is now providing a module called cdt that can be used to set proper defaults for each PE release (but it won't help you for older versions).

@uhaehner has already built a couple of packages on Titan using CrayGNU/2015.11 (and probably also 2015.06), so I'm curious to know which module versions are missing when installing CrayGNU-2015.11.eb and CrayGNU-2015.06.eb. (could you please report here #1769 ?)

If you really don't have any of those PEs installed, a solution is to

  • (1) create a new "CrayGNU-2015.XX.eb" with dependencies that match with your system
  • (2) ask EasyBuild to build your APP using new toolchain
eb WRF-3.6.1-CrayGNU-2015.11-dmpar.eb  --try-toolchain-version=2015.XX

@robertdfrench
Copy link

Ah, okay, the machine I was on is using PE 15.09. Thank you for clarifying, I did not understand that PE and CLE were versioned separately.

Will EasyBuild try to override WRF's dependencies with those specified in the CrayGNU-2015.09.eb toolchain? Can you help me find the relevant section in the documentation so I can educate myself?

Thank you again

@gppezzi
Copy link
Contributor

gppezzi commented May 18, 2016

The --try-toolchain-version mechanism will actually create new .eb files for WRF + dependencies, replacing the 2015.11 by 2015.09, and then eb will trigger the build on top of those new files.

$ eb WRF-3.6.1-CrayGNU-2015.11-dmpar.eb  --try-toolchain-version=2015.09 -r

If the build is successful you can find the .eb recipes inside the installation directory (under a folder named easybuild). In case of failure, EasyBuild will point you out to the temporary location of those recipes. You can then tweak them and retry (without adding the try-toolchain-version flag).

Before issuing the command above make sure that you have CrayGNU-2015.09.eb on your robot search path or that you have the module already installed. You can install it by running:

$ eb CrayGNU-2015.09.eb

@pforai
Copy link
Contributor

pforai commented May 18, 2016

We've asked ORNL for an account on Titan before and due to all the legal setup bs were unable to sign required documents, so maybe we all should have a short conf call to get you going? I think this would be the easiest.

There's some small setup involved like configuring EasyBuild for your setup, tweaking the external modules configuration file etc. I think we could easily help you out with that and shorten the time you need to get into building meaningful apps.

Apart from that did you skim our paper on EasyBuild on Cray? You can checkout slides from @gppezzi presentation and other material from CSCS hackathons.

@boegel boegel modified the milestones: v2.9.0, v3.0 Sep 16, 2016
@boegel boegel modified the milestones: v3.0, v3.1 Nov 10, 2016
@boegel
Copy link
Member Author

boegel commented Dec 16, 2016

docs are now available at http://easybuild.readthedocs.io/en/latest/Cray-support.html, so I'm closing this

@boegel boegel closed this as completed Dec 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants