New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues building CP2K #27
Comments
You are certainly giving MPItrampoline a good workout. Thank you for your patience. |
I tried again this morning with the updated PR but it is still failing. I realised my snippet above is not really showing all the issues that the compilation was running into so I've uploaded a complete gist of the build. The Sorry for barraging you over the last week, but I would really like to do some performance verification checks for MPItrampoline with real applications (and ultimately support it as part of a toolchain in EasyBuild). |
I'm now using Spack (sorry!) to build CP2K against MPItrampoline. I can reproduce your errors. These errors are reported because the MPI standard technically violates the Fortran standard, and newer GNU Fortran compilers report these errors. There are, of course, command line flags or function attributes that one can use to circumvent these errors. I am now looking into ways to automate this, so that people using MPItrampoline don't have to manually specify these flags. |
Here is a patch to make CP2K build with MPItrampoline. In some cases CP2K violates the MPI standard (in a way that is harmless for other MPI implementations), other changes are only necessary for MPItrampoline's Fortran interface. I have also released a new version of MPItrampoline that has some missing features added. |
Confirmed that worked for me. I had to make a tiny change to the patch for version 8.2. I also had to enable
(this might already be fixed with the most recent release 9.1) |
Yes, I forgot about |
I can add it automatically for EasyBuild, it is already triggered in some scenarios (GCC 10+, CP2K < 7.1). |
Unfortunately the test suite is segfaulting on every execution, an example:
|
The test suite is somewhat notorious but I'd expect we should be able to match the results given with OpenMPI. |
I will have a look. How do you run the test suite? |
TBH I'm not sure, it runs automatically with EB. Here's the steps:
and
|
Not all of the tests are failing straight away, I see some are running for quite some time. I'll need to do a comparision build with OpenMPI to fully check. Unfortunately the test suite takes ages, so I won't be able to report back for quite a while. |
The regressions tests with OpenMPI took 3 hours (and had 10 failures). The regression tests with MPItrampoline are still running (over 5 hours) and look to have more than 1000 failures. Here's a backtrace on one of the errors:
I might bump everything to later compilers and OpenMPI next week to see if the problems still exist with out latest toolchains. |
I would expect that the regression test failures are either errors in the MPItrampoline Fortran bindings, or errors in my changes to CP2K. So far, I have built the OpenMPI CP2K tests in a Docker container; my next steps would be to convert these to MPItrampoline tests so that I can run these tests locally. If you have a reproducible setup that I can use, then that could save me some time. |
What I have is reproducible but tedious for you I suspect (it would involved building everything down to the compiler and requires customisations for MPItrampoline that are currently not merged in an EasyBuild release yet). My job with the MPItrampoline tests was killed after 13 hours of testing :( |
You can find the docs on the tests at https://www.cp2k.org/dev:regtesting . If you have an existing build you should be able to run the tests using that build (after getting the sources and then starting from Step 2 using the There is a section there also about the directory structure you need. |
I tried a
Apparently there is a makefile that needs to be somewhere. I also tried the Docker container I mentioned earlier (with OpenMPI, no changes, straight from the checkout). This led to many failures (55 out of 60). Building everything locally wouldn't be a problem, e.g. the Docker containers started by building GCC. But I am looking for instructions (someone "holding my hand") to reproduce the issue. Either a Dockerfile or a shell script for macOS or Linux would work. |
Assuming that you have the sources and a build of CP2K already, here were my steps
I could then run the tests on the cluster with
|
Ok, I think this may be simpler than needing to run the full test suite. I downloaded a
but with the MPItrampoline version it hangs when gathering statistics at the end:
Given the backtrace above, I wonder if it is a specfic problem with MPI_Allreduce? I also found an issue in the issue in the CP2K repo about the clash between the Fortran and MPI standards: cp2k/cp2k#1019 |
I'll have a look. What architecture are you using (x86_64?), and what MPI implementation (MPICH?)? |
Yes |
To leave a comment here, I did manage to get a patch that worked in a few cases but it was quite invasive and you would need quite a bit of knowledge (both programming language and use case) to get it right. The core problem is (it seems to me) that CP2K is using MPI constants to do variable initialisations and MPItrampoline can't allow that since it (and therefore the compiler) doesn't know what those constants should be until runtime. This looks like it might be a wider issue since I've seen the same type of problem appear for other (Fortran) applications. @eschnett made the suggestion that perhaps (for Fortran at least) MPItrampoline should set it's own constants and then do runtime translation of those constants for the actual MPI runtime used. |
@eschnett I was trying to consider a way to get around this. Much as I want to, for our case it is very hard to consider using MPItrampoline as part of a toolchain if there will be key Fortran applications that won't work. As a compromise, I was wondering if there would be a way to allow us to fix the MPI constant values when using the MPItrampoline compiler wrappers. There are only two key variants that I can think of, OpenMPI and MPICH, so perhaps an option to the compiler wrappers that allows us to use a specific set of values? That would allow me create two variants of problem applications like CP2K, one for an OpenMPI compatibility use case and one for MPICH compatibility use case. This would cover every scenario that I can currently think of (and would be extensible for ones I can't). |
@ocaisa This would be a good compromise. Let me think about this. |
I thought I would give this a full test with Fortran, and CP2K is a good benchmark for that. The build (v8.2) is failing with:
The text was updated successfully, but these errors were encountered: