Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding sysimage generation to mpi tests #48

Merged
merged 15 commits into from
Oct 22, 2021

Conversation

amartinhuertas
Copy link
Member

@amartinhuertas amartinhuertas commented Oct 21, 2021

Solve Medium Priority Task in #39

IMPORTANT NOTE: This PR MUST BE reviewed after #47

@codecov-commenter
Copy link

codecov-commenter commented Oct 21, 2021

Codecov Report

Merging #48 (120e0a4) into release-0.2 (73de176) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           release-0.2     #48   +/-   ##
===========================================
  Coverage         0.00%   0.00%           
===========================================
  Files               33      33           
  Lines             3205    3205           
===========================================
  Misses            3205    3205           
Impacted Files Coverage Δ
src/Visualization.jl 0.00% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 73de176...120e0a4. Read the comment docs.

Copy link
Member

@fverdugo fverdugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this. Introduces considerable complexity and it has not a positive impact in the test times. In fact, by looking the gh actions it seems to even lead to slower times.

Perhaps separating the sequential and mpi tests is enough for now.

@amartinhuertas
Copy link
Member Author

I am not sure about this. Introduces considerable complexity and it has not a positive impact in the test times. In fact, by looking the gh actions it seems to even lead to slower times.

Yes, by now it does not improve much. If we want to perform tests with different values of the number of processors (apart from 1 and 4), this will be very helfpul.

@amartinhuertas
Copy link
Member Author

I am not sure about this. Introduces considerable complexity and it has not a positive impact in the test times. In fact, by looking the gh actions it seems to even lead to slower times.

Besides, we will be evaluating the ability to precompile GridapDistributed.jl. Such action for clusters is a must.

…into adding_sysimage_generation_to_mpi_tests
@fverdugo
Copy link
Member

Besides, we will be evaluating the ability to precompile GridapDistributed.jl. Such action for clusters is a must.

yes, but precompilation needs to be done in the final app, because we cannot anticipate which other packages is going to call the user.

With PackageCompiler generating a system image is straight forward once you have the the final app. I think, it is not our job to provide a precompilaiton mechanism having PackageCompiler at our disposal.

In any case, perhaps it is a good idea to create a new repo GridapBenchmarks or whatever, where we can add both serial and parallel benchs. And we definitively want precompilation scripts in that repo.

@amartinhuertas
Copy link
Member Author

I think it makes a lot of sense to test the workflow that you are going to face when dealing with a cluster. The repo of apps is going to be abandoned, and not tested frequently (we have already many other experiences of similar repos).

@fverdugo
Copy link
Member

I think it makes a lot of sense to test the workflow that you are going to face when dealing with a cluster. The repo of apps is going to be abandoned, and not tested frequently (we have already many other experiences of similar repos).

This is a very good point! but as the sysimage is created now, it is not testing the actual workflow that one needs to follow in a cluster. To check the workflow, you need an app, i.e., you need a (or some) "main" function(s) inside a Package and then create a sys image for those functions.

We have several options:

  1. Move the "main" functions of the tests inside the src dir and then create a system image by calling them on a single MPI task
  2. Create a dummy "test app" in the test dir and move the "main" test functions to the src dir of the "test app" and then precompile the "test app".

I would go for 2 definitively, even though it has some more boilerplate code, since it actually mimics what a user needs to do in a cluster.

What do you think?

@amartinhuertas
Copy link
Member Author

yes, but precompilation needs to be done in the final app, because we cannot anticipate which other packages is going to call the user.

I agree the solution that we currently have in this branch is not definitive, in the sense that it is not testing exactly what will happen with the actual app, with all dependencies etc.. But it may happen that even with the current solution we detect a failure of precompilation after a commit. This can give us some early useful information, that would be otherwise very hard to obtain if the issue happens when you deploy the software in the cluster.

@fverdugo
Copy link
Member

and BTW the precompilation scrips should be in the "test app" (i.e. the final app) as usual

@amartinhuertas
Copy link
Member Author

What do you think?

Ok. Let us go for 2. Can we use the sysimage of the test app to accelerate the mpi tests or would you avoid that?

@amartinhuertas
Copy link
Member Author

@fverdugo ... all tests are now passing. PR ready to review/merge.

The CI script generates the TestApp.so sysimage, and then leverages it to run the MPI tests with 1x and 4x MPI tasks. The generation of the image takes 21 min according to github CI. On the other hand, the parallel MPI tests runtime is missleading in Gitlub CI: https://github.com/gridap/GridapDistributed.jl/runs/3970159570?check_suite_focus=true I think it only measures the time spent in the runtests.jl script, not the time spent in the mpirun processes forked out of it.

@fverdugo fverdugo merged commit d7ccea9 into release-0.2 Oct 22, 2021
@fverdugo fverdugo deleted the adding_sysimage_generation_to_mpi_tests branch October 22, 2021 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants