Charm 6.9 Support and Charm 6.8 Removal #1190

nilsdeppe · 2018-11-20T03:16:37Z

Proposed changes

Add support for Charm++ v6.9 and drop support for v6.8 (v6.8 support will be available by checking out old commits). With Charm++ v6.9 many more STL containers are serializable directly by Charm++ so the PupStlCpp11 header has been removed. There is no longer a need to patch Charm++ because all of our patches have been included upstream. The only thing being patched currently is array indices that are class templates, for which there appears to be some support in Charm++ but I have not yet had any luck with it. Nevertheless, we can now just run charmc on our .ci interface files and only patch the array ones slightly.
The docker image currently contains both v6.8 and v6.9. I'm doing this so that current PRs are not broken. Once we have merged this PR I will wait a week or two to allow for PRs to be rebased and then remove v6.8 from the Docker image. Hopefully this will provide a fairly smooth transition.
Restore support for Charm++ projections. With v6.9 projections correctly distinguishes between entry method templates so we no longer need to work around the Charm++ tracing infrastructure to get a clear representation of what the code is doing. closes Profiling does not work #925

Breaks:

Anything that includes PupStlCpp11.hpp will no longer need that
Anyone using the containers will need to (temporarily) point their build to /work/charm_69/multicore-...
This directory will be renamed in the future to charm (probably in about a month or two) to make the transition between versions easier.

Types of changes:

Bugfix
New feature

Component:

Code
Documentation
Build system
Continuous integration

Code review checklist

The PR passes all checks, including unit tests, clang-tidy and IWYU. For
instructions on how to perform the CI checks locally refer to the Dev guide
on the Travis CI.
The code is documented and the documentation renders correctly. Run make doc
to generate the documentation locally into BUILD_DIR/docs/html. Then open
index.html.
The code follows the stylistic and code quality guidelines listed in the
code review guide.

Further comments

fmahebert

Not a real review, but some questions I had while reading through this :)

fmahebert · 2018-11-29T01:47:51Z

cmake/SetupCharmProjections.cmake

- -D SPECTRE_CHARM_PROJECTIONS \
- -D SPECTRE_CHARM_NON_ACTION_WALLTIME_EVENT_ID=1000 \
- -D SPECTRE_CHARM_RECEIVE_MAP_DATA_EVENT_ID=1001"
-  )
  if (PROJECTIONS_PAPI_COUNTERS)


Does it make sense to keep PAPI stuff around if it's no longer documented?

I'm on the fence. It might be easier to revive it if we want to, but I'm not entirely sure we'll need/want to, in which case it's just baggage... Thoughts?

I think we should keep it.

What's your reasoning? Have you used PAPI counters and found them helpful? (Not objecting, just asking for more info :) )

I have not used them before. So if we leave it in I could play around with it.
Also, as you said if we were to revive it, it will be easier if we already have this in.

Okay, that sounds good. My vote is to leave it in then :) I can add some comments as to what the code should look like that uses this. Would that be helpful?

Sure, why not.

Okay, I added some words to the CMake file. Since it's not a fully-implemented feature I don't think it should go in the actual documentation.

fmahebert · 2018-11-29T01:53:24Z

src/Parallel/Invoke.hpp

 namespace Parallel {
+namespace detail {
+// Allow 64 inline entry method calls before we fall back to Charm++. This is
+// done to avoid blowing the stack.


Question for my education: is it the inlining that avoids blowing the stack, or is it having a cap on the inlining? If I understand this correctly it should be the inlining, because the inlined function shouldn't go onto the stack. But if that is correct, what is the role of capping at 64 inlines?

This was an amazingly mind blowing question :D In the Charm++ context "inline" means "don't go through the Charm++ runtime system but go straight to the local object if it exists." Other than here, any suggestions on where to best document this? Maybe the parallelization group?

I totally missed the relevant text in the PR description, sorry! Okay this makes more sense now. But I'm still not sure what the cap at 64 is for. Is the idea to allow 64 nested local object calls (roughly 64 deep on stack), but then give control back to the runtime system which will be making calls one-by-one and therefore avoiding putting too much on the stack?

I would find a comment like this one more useful at the call site (so around the if/else block in receive_data below), because then I can better see how it relates to the charm++. But this does mean duplicating the comment at different call sites... perhaps a compromise is to give a slightly more detailed explanation in the Doxygen group, and a brief summary at each call site? What do you think?

I ~~think~~ know I agree :) I'll add the comments and you can tell me how to make the clearer :)

Okay, added comments :)

fmahebert

New comments are good 👍. Here's a few tiny wording suggestions and also a new question...

fmahebert · 2018-12-02T19:43:46Z

docs/GroupDefs.hpp

@@ -1025,6 +1025,16 @@ The `receive_data` function always takes a `ReceiveTag`, which is set in the
 actions `inbox_tags` type alias as described above.  The first argument is the
 temporal identifier, and the second is the data to be sent.

+Normally when remote functions are invoked they go through the Charm++ runtime
+system, which adds some overhead. The `receive_data` function elides the call to


is it fair to say "elides" -> "tries to elide" ?

fmahebert · 2018-12-02T19:45:08Z

docs/GroupDefs.hpp

+system, which adds some overhead. The `receive_data` function elides the call to
+the Charm++ RTS for calls into array components. Charm++ refers to
+these types of remote calls as "inline entry methods". With the Charm++ method
+of eliding the RTS the code becomes susceptible to stack overflows because


add comma after RTS

docs/DevGuide/ProfilingWithProjections.md

nilsdeppe · 2018-12-02T21:29:05Z

@fmahebert I pushed fixups :) Thanks for the feedback! :)

fmahebert · 2018-12-02T23:03:25Z

Looks good to me! I am happy for you to squash. I'm hesitant to give a green checkmark because I only partially understand the Charm++ interfacing, but I approve of I do understand 👍

nilsdeppe · 2018-12-03T00:35:16Z

Alright, done. Thanks for the feedback @fmahebert :D

nilsdeppe · 2018-12-05T01:54:13Z

Rebased on develop after #1216 was merged

cmake/SetupCharmProjections.cmake

kidder · 2018-12-08T03:38:41Z

@wthrowe please look at

kidder · 2018-12-08T04:38:17Z

did you check installing on wheeler, blue waters, ... ?

kidder · 2018-12-08T19:11:36Z

one of the builds consistently times out

nilsdeppe · 2018-12-08T20:39:20Z

Charm++ v6.9 fails to compile on BlueWaters because it looks for some python things or something. I've filed an issue upstream about them remove the Python requirement since it'll generally make it much more difficult to run anywhere.

wthrowe

I haven't tried building 6.9 yet, so I haven't done any actual tests that things work for me, but I had a few minor comments from looking at the code. (Always nice to see a net -250 diff.)

wthrowe · 2018-12-17T04:00:37Z

src/Executables/ParallelInfo/ParallelInfo.cpp

@@ -89,13 +90,14 @@ void print_info() {

 // clang-tidy: google-runtime-references
 PeGroupReporter::PeGroupReporter(
-    CkCallback& cb_start_node_group_check) {  // NOLINT
+    const CkCallback cb_start_node_group_check) {  // NOLINT


No more NOLINT.

wthrowe · 2018-12-17T04:00:53Z

src/Executables/ParallelInfo/ParallelInfo.cpp

  print_info();
  this->contribute(cb_start_node_group_check);
 }

 // clang-tidy: google-runtime-references
-NodeGroupReporter::NodeGroupReporter(CkCallback& cb_end_report) {  // NOLINT
+NodeGroupReporter::NodeGroupReporter(
+    const CkCallback cb_end_report) {  // NOLINT


No more NOLINT.

wthrowe · 2018-12-17T04:01:08Z

src/Executables/ParallelInfo/ParallelInfo.hpp

@@ -45,12 +46,12 @@ class ParallelInfo : public CBase_ParallelInfo {
 class PeGroupReporter : public Group {
 public:
  // clang-tidy: non-const reference, Charm++ interface
-  explicit PeGroupReporter(CkCallback& cb_start_node_group_check);  // NOLINT
+  explicit PeGroupReporter(CkCallback cb_start_node_group_check);  // NOLINT


wthrowe · 2018-12-17T04:01:18Z

src/Executables/ParallelInfo/ParallelInfo.hpp

 };

 class NodeGroupReporter : public NodeGroup {
 public:
  // clang-tidy: non-const reference, Charm++ interface
-  explicit NodeGroupReporter(CkCallback& cb_end_report);  // NOLINT
+  explicit NodeGroupReporter(CkCallback cb_end_report);  // NOLINT


wthrowe · 2018-12-17T04:07:08Z

tests/Unit/Parallel/Test_PupStlCpp11.cpp

 #include <string>
 #include <tuple>
 #include <unordered_map>
 #include <unordered_set>
 #include <vector>

 #include "Parallel/CharmPupable.hpp"
-#include "Parallel/PupStlCpp11.hpp"


Since the file is gone, do we still want to keep the tests for it even though the code has moved upstream?

Good point. I think we should not.

These tests already found a bug in Charm++'s code a while back, so I think we should test for it.

kidder

joint review with @wthrowe

kidder · 2019-02-22T22:42:29Z

docs/Installation/Installation.md

@@ -15,7 +15,7 @@ installation_on_clusters "Installation on clusters" page.
 * [GCC](https://gcc.gnu.org/) 5.4 or later,
 [Clang](https://clang.llvm.org/) 3.6 or later, or AppleClang 6.0 or later
 * [CMake](https://cmake.org/) 3.3.2 or later
-* [Charm++](http://charm.cs.illinois.edu/) 6.8 or newer (must be compiled from source)
+* [Charm++](http://charm.cs.illinois.edu/) 6.9 or newer (must be compiled from source)


remove (must...)

kidder · 2019-02-22T22:43:11Z

docs/Installation/Installation.md

@@ -298,7 +295,7 @@ Follow these steps:
 * For more details on building Charm++, see the directions
  [here](http://charm.cs.illinois.edu/manuals/html/charm++/A.html)
  The correct target is `charm++` and, for a personal machine, the
-  correct target architecture is likely to be `multicore-linux64`
+  correct target architecture is likely to be `multicore-linux-x86_64`
  (or `multicore-darwin-x86_64` on macOS).


@kidder see if this works on a mac

fmahebert · 2019-02-23T07:27:50Z

What is the status of compiling Charm++ v6.9 BlueWaters?

Edit: I guess it doesn't matter with BW going away soon...

kidder · 2019-02-23T14:28:05Z

we don't care about BWs for spectre anymore...

nilsdeppe · 2019-02-23T16:01:09Z

Fixed the clang-tidy issue

nilsdeppe · 2019-02-23T17:19:02Z

@wthrowe this is ready for you now :)

kidder · 2019-02-28T05:26:56Z

@nilsdeppe this needs a rebase @wthrowe please look at

Most of the things we were previously patching are now directly supported by Charm++ so a lot of code is removed.

Removes documentation for using PAPI with projections. We need to revisit using PAPI later, both in terms of how to best do it and its merits.

wthrowe · 2019-03-01T04:26:57Z

I am seeing a significant slowdown in the test suite from this change. Run times in seconds with clang-6.0 builds:

X	6.8.2	6.9.0	ratio	difference
Debug -j1	91.5	113.2	1.2	21.7
Debug -j30	8.2	17.1	2.1	8.9
Opt -j1	40.8	61.1	1.5	20.3
Opt -j30	5.0	13.0	2.6	9.0

It looks consistent with a constant overhead independent of SpECTRE's configuration, so probably something in the charm libraries. (I assume the differences based on parallelization are caused by some sort of resource contention on my machine or ctest overhead or something.)

I have not tried a long-running test, so I don't know if this is a startup cost or something that will accumulate during a run.

kidder · 2019-03-11T21:35:29Z

Current status:

@nilsdeppe is going to investigate the slowdown

kidder · 2019-10-28T17:02:19Z

closing this as we will just update to 6.10 which is currently on release candidate 2

nilsdeppe added the in progress Don't review, used for sharing code and getting feedback label Nov 20, 2018

nilsdeppe force-pushed the charm_69 branch from f06ebc2 to 59311d2 Compare November 20, 2018 18:50

nilsdeppe removed the in progress Don't review, used for sharing code and getting feedback label Nov 20, 2018

nilsdeppe force-pushed the charm_69 branch from 59311d2 to 4408b4a Compare November 20, 2018 20:38

fmahebert reviewed Nov 29, 2018

View reviewed changes

nilsdeppe force-pushed the charm_69 branch from 4408b4a to 931d593 Compare December 2, 2018 19:05

fmahebert reviewed Dec 2, 2018

View reviewed changes

nilsdeppe force-pushed the charm_69 branch from 59f3f10 to c5c1a7a Compare December 3, 2018 00:33

nilsdeppe force-pushed the charm_69 branch 2 times, most recently from c2aa087 to bb4a0c6 Compare December 5, 2018 01:53

nilsdeppe force-pushed the charm_69 branch 3 times, most recently from fdc234a to 48fb895 Compare December 8, 2018 01:23

kidder reviewed Dec 8, 2018

View reviewed changes

cmake/SetupCharmProjections.cmake Show resolved Hide resolved

nilsdeppe force-pushed the charm_69 branch from 48fb895 to 091f9bc Compare December 8, 2018 20:37

wthrowe requested changes Dec 17, 2018

View reviewed changes

nilsdeppe force-pushed the charm_69 branch from 091f9bc to 9d5fa58 Compare December 21, 2018 19:46

kidder added the in progress Don't review, used for sharing code and getting feedback label Jan 28, 2019

nilsdeppe force-pushed the charm_69 branch from 9d5fa58 to b54848d Compare February 15, 2019 14:49

nilsdeppe force-pushed the charm_69 branch from b54848d to 3b87bf9 Compare February 22, 2019 22:36

nilsdeppe removed the in progress Don't review, used for sharing code and getting feedback label Feb 22, 2019

kidder requested changes Feb 22, 2019

View reviewed changes

kidder previously approved these changes Feb 22, 2019

View reviewed changes

nilsdeppe dismissed kidder’s stale review via 257188e February 22, 2019 22:54

nilsdeppe force-pushed the charm_69 branch from 7d21342 to 257188e Compare February 22, 2019 22:54

kidder previously approved these changes Feb 22, 2019

View reviewed changes

kidder requested a review from wthrowe February 23, 2019 06:09

nilsdeppe added the breaking change Changes in the PR may break other PRs label Feb 23, 2019

nilsdeppe dismissed kidder’s stale review via e2cb869 February 23, 2019 16:00

nilsdeppe force-pushed the charm_69 branch from 257188e to e2cb869 Compare February 23, 2019 16:00

kidder previously approved these changes Feb 23, 2019

View reviewed changes

nilsdeppe added 2 commits February 28, 2019 09:14

Add support Charm++ v6.9

0c723bc

Most of the things we were previously patching are now directly supported by Charm++ so a lot of code is removed.

Enable Charm++ Projections support

e666cbd

Removes documentation for using PAPI with projections. We need to revisit using PAPI later, both in terms of how to best do it and its merits.

nilsdeppe dismissed kidder’s stale review via e666cbd February 28, 2019 14:14

nilsdeppe force-pushed the charm_69 branch from e2cb869 to e666cbd Compare February 28, 2019 14:14

kidder added the outdated Don't review, code needs significant changes label Oct 21, 2019

kidder closed this Oct 28, 2019

nilsdeppe deleted the charm_69 branch December 16, 2020 01:04

Charm 6.9 Support and Charm 6.8 Removal #1190

Charm 6.9 Support and Charm 6.8 Removal #1190

Conversation

nilsdeppe commented Nov 20, 2018 • edited

Proposed changes

Types of changes:

Component:

Code review checklist

Further comments

fmahebert left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmahebert Nov 29, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmahebert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nilsdeppe commented Dec 2, 2018

fmahebert commented Dec 2, 2018

nilsdeppe commented Dec 3, 2018

nilsdeppe commented Dec 5, 2018

kidder commented Dec 8, 2018

kidder commented Dec 8, 2018

kidder commented Dec 8, 2018

nilsdeppe commented Dec 8, 2018

wthrowe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kidder left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmahebert commented Feb 23, 2019 • edited

kidder commented Feb 23, 2019

nilsdeppe commented Feb 23, 2019

nilsdeppe commented Feb 23, 2019

kidder commented Feb 28, 2019

wthrowe commented Mar 1, 2019

kidder commented Mar 11, 2019

kidder commented Oct 28, 2019

nilsdeppe commented Nov 20, 2018 •

edited

fmahebert left a comment •

edited

fmahebert Nov 29, 2018 •

edited

fmahebert commented Feb 23, 2019 •

edited