Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charm 6.9 Support and Charm 6.8 Removal #1190

Closed
wants to merge 2 commits into from

Conversation

nilsdeppe
Copy link
Member

@nilsdeppe nilsdeppe commented Nov 20, 2018

Proposed changes

  • Add support for Charm++ v6.9 and drop support for v6.8 (v6.8 support will be available by checking out old commits). With Charm++ v6.9 many more STL containers are serializable directly by Charm++ so the PupStlCpp11 header has been removed. There is no longer a need to patch Charm++ because all of our patches have been included upstream. The only thing being patched currently is array indices that are class templates, for which there appears to be some support in Charm++ but I have not yet had any luck with it. Nevertheless, we can now just run charmc on our .ci interface files and only patch the array ones slightly.
    The docker image currently contains both v6.8 and v6.9. I'm doing this so that current PRs are not broken. Once we have merged this PR I will wait a week or two to allow for PRs to be rebased and then remove v6.8 from the Docker image. Hopefully this will provide a fairly smooth transition.
  • Restore support for Charm++ projections. With v6.9 projections correctly distinguishes between entry method templates so we no longer need to work around the Charm++ tracing infrastructure to get a clear representation of what the code is doing. closes Profiling does not work #925

Breaks:

  • Anything that includes PupStlCpp11.hpp will no longer need that
  • Anyone using the containers will need to (temporarily) point their build to /work/charm_69/multicore-...
    This directory will be renamed in the future to charm (probably in about a month or two) to make the transition between versions easier.

Types of changes:

  • Bugfix
  • New feature

Component:

  • Code
  • Documentation
  • Build system
  • Continuous integration

Code review checklist

  • The PR passes all checks, including unit tests, clang-tidy and IWYU. For
    instructions on how to perform the CI checks locally refer to the Dev guide
    on the Travis CI
    .
  • The code is documented and the documentation renders correctly. Run make doc
    to generate the documentation locally into BUILD_DIR/docs/html. Then open
    index.html.
  • The code follows the stylistic and code quality guidelines listed in the
    code review guide.

Further comments

@nilsdeppe nilsdeppe added the in progress Don't review, used for sharing code and getting feedback label Nov 20, 2018
@nilsdeppe nilsdeppe removed the in progress Don't review, used for sharing code and getting feedback label Nov 20, 2018
Copy link
Contributor

@fmahebert fmahebert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a real review, but some questions I had while reading through this :)

-D SPECTRE_CHARM_PROJECTIONS \
-D SPECTRE_CHARM_NON_ACTION_WALLTIME_EVENT_ID=1000 \
-D SPECTRE_CHARM_RECEIVE_MAP_DATA_EVENT_ID=1001"
)
if (PROJECTIONS_PAPI_COUNTERS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to keep PAPI stuff around if it's no longer documented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence. It might be easier to revive it if we want to, but I'm not entirely sure we'll need/want to, in which case it's just baggage... Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your reasoning? Have you used PAPI counters and found them helpful? (Not objecting, just asking for more info :) )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not used them before. So if we leave it in I could play around with it.
Also, as you said if we were to revive it, it will be easier if we already have this in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that sounds good. My vote is to leave it in then :) I can add some comments as to what the code should look like that uses this. Would that be helpful?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, why not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I added some words to the CMake file. Since it's not a fully-implemented feature I don't think it should go in the actual documentation.

namespace Parallel {
namespace detail {
// Allow 64 inline entry method calls before we fall back to Charm++. This is
// done to avoid blowing the stack.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for my education: is it the inlining that avoids blowing the stack, or is it having a cap on the inlining? If I understand this correctly it should be the inlining, because the inlined function shouldn't go onto the stack. But if that is correct, what is the role of capping at 64 inlines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an amazingly mind blowing question :D In the Charm++ context "inline" means "don't go through the Charm++ runtime system but go straight to the local object if it exists." Other than here, any suggestions on where to best document this? Maybe the parallelization group?

Copy link
Contributor

@fmahebert fmahebert Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally missed the relevant text in the PR description, sorry! Okay this makes more sense now. But I'm still not sure what the cap at 64 is for. Is the idea to allow 64 nested local object calls (roughly 64 deep on stack), but then give control back to the runtime system which will be making calls one-by-one and therefore avoiding putting too much on the stack?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find a comment like this one more useful at the call site (so around the if/else block in receive_data below), because then I can better see how it relates to the charm++. But this does mean duplicating the comment at different call sites... perhaps a compromise is to give a slightly more detailed explanation in the Doxygen group, and a brief summary at each call site? What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think know I agree :) I'll add the comments and you can tell me how to make the clearer :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, added comments :)

Copy link
Contributor

@fmahebert fmahebert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New comments are good 👍. Here's a few tiny wording suggestions and also a new question...

@@ -1025,6 +1025,16 @@ The `receive_data` function always takes a `ReceiveTag`, which is set in the
actions `inbox_tags` type alias as described above. The first argument is the
temporal identifier, and the second is the data to be sent.

Normally when remote functions are invoked they go through the Charm++ runtime
system, which adds some overhead. The `receive_data` function elides the call to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it fair to say "elides" -> "tries to elide" ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

system, which adds some overhead. The `receive_data` function elides the call to
the Charm++ RTS for calls into array components. Charm++ refers to
these types of remote calls as "inline entry methods". With the Charm++ method
of eliding the RTS the code becomes susceptible to stack overflows because
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comma after RTS

docs/DevGuide/ProfilingWithProjections.md Show resolved Hide resolved
@nilsdeppe
Copy link
Member Author

@fmahebert I pushed fixups :) Thanks for the feedback! :)

@fmahebert
Copy link
Contributor

Looks good to me! I am happy for you to squash. I'm hesitant to give a green checkmark because I only partially understand the Charm++ interfacing, but I approve of I do understand 👍

@nilsdeppe
Copy link
Member Author

Alright, done. Thanks for the feedback @fmahebert :D

@nilsdeppe nilsdeppe force-pushed the charm_69 branch 2 times, most recently from c2aa087 to bb4a0c6 Compare December 5, 2018 01:53
@nilsdeppe
Copy link
Member Author

Rebased on develop after #1216 was merged

@nilsdeppe nilsdeppe force-pushed the charm_69 branch 3 times, most recently from fdc234a to 48fb895 Compare December 8, 2018 01:23
@kidder
Copy link
Member

kidder commented Dec 8, 2018

@wthrowe please look at

@kidder
Copy link
Member

kidder commented Dec 8, 2018

did you check installing on wheeler, blue waters, ... ?

@kidder
Copy link
Member

kidder commented Dec 8, 2018

one of the builds consistently times out

@nilsdeppe
Copy link
Member Author

Charm++ v6.9 fails to compile on BlueWaters because it looks for some python things or something. I've filed an issue upstream about them remove the Python requirement since it'll generally make it much more difficult to run anywhere.

Copy link
Member

@wthrowe wthrowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried building 6.9 yet, so I haven't done any actual tests that things work for me, but I had a few minor comments from looking at the code. (Always nice to see a net -250 diff.)

@@ -89,13 +90,14 @@ void print_info() {

// clang-tidy: google-runtime-references
PeGroupReporter::PeGroupReporter(
CkCallback& cb_start_node_group_check) { // NOLINT
const CkCallback cb_start_node_group_check) { // NOLINT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more NOLINT.

print_info();
this->contribute(cb_start_node_group_check);
}

// clang-tidy: google-runtime-references
NodeGroupReporter::NodeGroupReporter(CkCallback& cb_end_report) { // NOLINT
NodeGroupReporter::NodeGroupReporter(
const CkCallback cb_end_report) { // NOLINT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more NOLINT.

@@ -45,12 +46,12 @@ class ParallelInfo : public CBase_ParallelInfo {
class PeGroupReporter : public Group {
public:
// clang-tidy: non-const reference, Charm++ interface
explicit PeGroupReporter(CkCallback& cb_start_node_group_check); // NOLINT
explicit PeGroupReporter(CkCallback cb_start_node_group_check); // NOLINT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again.

};

class NodeGroupReporter : public NodeGroup {
public:
// clang-tidy: non-const reference, Charm++ interface
explicit NodeGroupReporter(CkCallback& cb_end_report); // NOLINT
explicit NodeGroupReporter(CkCallback cb_end_report); // NOLINT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again.

#include <string>
#include <tuple>
#include <unordered_map>
#include <unordered_set>
#include <vector>

#include "Parallel/CharmPupable.hpp"
#include "Parallel/PupStlCpp11.hpp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the file is gone, do we still want to keep the tests for it even though the code has moved upstream?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think we should not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests already found a bug in Charm++'s code a while back, so I think we should test for it.

@kidder kidder added the in progress Don't review, used for sharing code and getting feedback label Jan 28, 2019
@nilsdeppe nilsdeppe removed the in progress Don't review, used for sharing code and getting feedback label Feb 22, 2019
Copy link
Member

@kidder kidder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

joint review with @wthrowe

@@ -15,7 +15,7 @@ installation_on_clusters "Installation on clusters" page.
* [GCC](https://gcc.gnu.org/) 5.4 or later,
[Clang](https://clang.llvm.org/) 3.6 or later, or AppleClang 6.0 or later
* [CMake](https://cmake.org/) 3.3.2 or later
* [Charm++](http://charm.cs.illinois.edu/) 6.8 or newer (must be compiled from source)
* [Charm++](http://charm.cs.illinois.edu/) 6.9 or newer (must be compiled from source)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove (must...)

@@ -298,7 +295,7 @@ Follow these steps:
* For more details on building Charm++, see the directions
[here](http://charm.cs.illinois.edu/manuals/html/charm++/A.html)
The correct target is `charm++` and, for a personal machine, the
correct target architecture is likely to be `multicore-linux64`
correct target architecture is likely to be `multicore-linux-x86_64`
(or `multicore-darwin-x86_64` on macOS).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kidder see if this works on a mac

kidder
kidder previously approved these changes Feb 22, 2019
kidder
kidder previously approved these changes Feb 22, 2019
@kidder kidder requested a review from wthrowe February 23, 2019 06:09
@fmahebert
Copy link
Contributor

fmahebert commented Feb 23, 2019

What is the status of compiling Charm++ v6.9 BlueWaters?

Edit: I guess it doesn't matter with BW going away soon...

@kidder
Copy link
Member

kidder commented Feb 23, 2019

we don't care about BWs for spectre anymore...

@nilsdeppe
Copy link
Member Author

Fixed the clang-tidy issue

kidder
kidder previously approved these changes Feb 23, 2019
@nilsdeppe
Copy link
Member Author

@wthrowe this is ready for you now :)

@kidder
Copy link
Member

kidder commented Feb 28, 2019

@nilsdeppe this needs a rebase @wthrowe please look at

Most of the things we were previously patching are now directly supported by
Charm++ so a lot of code is removed.
Removes documentation for using PAPI with projections. We need to revisit using
PAPI later, both in terms of how to best do it and its merits.
@wthrowe
Copy link
Member

wthrowe commented Mar 1, 2019

I am seeing a significant slowdown in the test suite from this change. Run times in seconds with clang-6.0 builds:

X 6.8.2 6.9.0 ratio difference
Debug -j1 91.5 113.2 1.2 21.7
Debug -j30 8.2 17.1 2.1 8.9
Opt -j1 40.8 61.1 1.5 20.3
Opt -j30 5.0 13.0 2.6 9.0

It looks consistent with a constant overhead independent of SpECTRE's configuration, so probably something in the charm libraries. (I assume the differences based on parallelization are caused by some sort of resource contention on my machine or ctest overhead or something.)

I have not tried a long-running test, so I don't know if this is a startup cost or something that will accumulate during a run.

@kidder
Copy link
Member

kidder commented Mar 11, 2019

Current status:

@kidder kidder added the outdated Don't review, code needs significant changes label Oct 21, 2019
@kidder
Copy link
Member

kidder commented Oct 28, 2019

closing this as we will just update to 6.10 which is currently on release candidate 2

@kidder kidder closed this Oct 28, 2019
@nilsdeppe nilsdeppe deleted the charm_69 branch December 16, 2020 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Changes in the PR may break other PRs outdated Don't review, code needs significant changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Profiling does not work
5 participants