add progress messages to drm #270

dsikich · 2019-05-01T21:00:35Z

Use start, update, and complete functions to add progress messages to drm.

Signed-off-by: Danielle Sikich <sikich1@llnl.gov>

dsikich · 2019-05-01T21:02:55Z

@adammoody This includes latest updates with new condition checks for when the ireduce completes.

adammoody · 2019-05-01T23:12:21Z

Thanks @dsikich . Let's look at rank 0 now for a bit. In the update, we have this code:

if (*req1 == MPI_REQUEST_NULL) {
  MPI_Ibcast(keep_going, 1, MPI_INT, 0, dupcomm, req1);
  MPI_Ireduce(values, global_vals, 2, MPI_INT, MPI_SUM, 0, dupcomm, req2);
} else {
  MPI_Test(req1, &done1, MPI_STATUS_IGNORE);
  MPI_Test(req2, &done2, MPI_STATUS_IGNORE);
  if (done2) {
    printf("items removed: %d\n", global_vals[0]);
    fflush(stdout);
    *current = time(NULL);
  }
}

So if there is no outstanding bcast, it will start a bcast and a reduce. If there is an outstanding bcast, it will test both the bcast and the reduce. Let's also assume we have an outstanding reduce when we get to that part of the code, then four things can happen:

neither the bcast nor the reduce complete, so both tests fail, and we exit the call to come back later -- I think we're ok here
both the bcast and reduce complete, so both tests succeed, and we'll print a count and capture a new current time -- I think this is good too.
the bcast completes, but the reduce does not -- here we'll hit a problem if we call update again. since there is no longer an outstanding bcast, we'll kick off a new bcast and a new reduce without having waited on our previous reduce to finish
the reduce complete, but the bcast does not -- here we would print the message and grab the current time. future calls to update would continue to test against the bcast until it finishes. I think that case is ok as written. the bcast would eventually complete, then the next call to update would start a new bcast/reduce round, which is ok.

We need to fix case three (bcast completes, but reduce does not). It may be enough to check that both requests are NULL in the if condition.

EDIT: Oh, case 4 also has a minor problem. Since we are updating current, future calls to update on rank 0 will return immediately until our timeout expires again. Only after the timeout expires, will we start testing the bast again, which means we'd have to call update twice before rank 0 kicks off a new bcast/reduce pair. We should think about how to fix that too.

adammoody · 2019-05-01T23:17:36Z

After that, take a close look at the complete code for rank 0 and think through rank 0 calling complete in all four cases:

outstanding bcast and outstanding reduce
nothing outstanding
outstanding bcast, but no reduce
outstanding reduce, but no bcast

I haven't done this yet, so unsure whether it needs to be changed.

adammoody · 2019-05-01T23:29:46Z

Found this in case it's useful:

https://www.mcs.anl.gov/research/projects/mpi/mpi-standard/mpi-report-1.1/node47.htm

One is allowed to call MPI_WAIT with a null or inactive request argument. In this case the operation returns immediately with empty status.

One is allowed to call MPI_TEST with a null or inactive request argument. In such a case the operation returns with flag = true and empty status.

dsikich · 2019-05-01T23:35:36Z

@adammoody Thanks, that is good to know. In this case though, we were talking about how it might be better to have the checks in there so that it is obvious what state the requests are in?

adammoody · 2019-05-01T23:39:46Z

Yep, whichever way you'd like to do it is fine. If you do want to leave out some if checks, we can add comments to remind the people reading the code that things work even if the requests are NULL.

adammoody · 2019-05-01T23:43:35Z

This text says that it's ok to test/wait on collectives out of order, they must only be started in order:

https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node126.htm

All completion calls (e.g., MPI_WAIT) described in Section Communication Completion are supported for nonblocking collective operations. Similarly to the blocking case, nonblocking collective operations are considered to be complete when the local part of the operation is finished, i.e., for the caller, the semantics of the operation are guaranteed and all buffers can be safely accessed and modified. Completion does not indicate that other processes have completed or even started the operation (unless otherwise implied by the description of the operation). Completion of a particular nonblocking collective operation also does not indicate completion of any other posted nonblocking collective (or send-receive) operations, whether they are posted before or after the completed operation.

Unlike point-to-point operations, nonblocking collective operations do not match with blocking collective operations, and collective operations do not have a tag argument. All processes must call collective operations (blocking and nonblocking) in the same order per communicator. In particular, once a process calls a collective operation, all other processes in the communicator must eventually call the same collective operation, and no other collective operation with the same communicator in between. This is consistent with the ordering rules for blocking collective operations in threaded environments.

dsikich · 2019-05-01T23:47:49Z

@adammoody sounds like you were right about that then!

adammoody · 2019-05-01T23:49:00Z

Yeah, just wanted to double check. I helped write that text, so good that I remembered it :-)

dsikich · 2019-05-01T23:50:55Z

@adammoody looks like travis is failing on the MPI_Ibcast and MPI_Ireduce. It is using openmpi from what I can see.. does openmpi support these non-blocking collectives? Maybe we need to update the version we are installing in travis.

adammoody · 2019-05-01T23:52:44Z

Open MPI should have those. Yes, we likely need to bump the Open MPI version that travis is using.

Signed-off-by: Danielle Sikich <sikich1@llnl.gov>

adammoody · 2019-05-09T19:26:51Z

Great work on the cleanup! The logic looks good to me after my first pass.

Instead of passing in the comm_rank and comm_size, let's just make calls to MPI_Comm_rank and MPI_Comm_size from within the functions. That will simplify the interface.

Also, let's pass in the final count value as input to the complete call, like you have in the update call. This will ensure our total adds up to the full final count.

dsikich · 2019-05-09T21:22:39Z

@adammoody Ok, did one final pass with those updates. I would just do more testing. I've testing so far with two processes, but it should be tested more extensively before production use.

…ments

…pdate

adammoody · 2019-05-13T18:06:48Z

Working this into different functions now. Here's a sample of the output from dcp now:

[2019-05-13T10:59:08] Copying data.
[2019-05-13T10:59:30] Copied 30.501 GB in 21.593206 secs (1.413 GB/s) ...
[2019-05-13T10:59:42] Copied 54.019 GB in 33.571640 secs (1.609 GB/s) ...
[2019-05-13T10:59:55] Copied 73.710 GB in 47.459370 secs (1.553 GB/s) ...
[2019-05-13T11:00:08] Copied 100.707 GB in 59.480167 secs (1.693 GB/s) ...
[2019-05-13T11:00:20] Copied 118.359 GB in 72.232474 secs (1.639 GB/s) ...
[2019-05-13T11:00:28] Copied 137.463 GB in 80.298619 secs (1.712 GB/s) ...
[2019-05-13T11:00:28] Copied 141.541 GB in 80.298805 secs (1.763 GB/s) done
[2019-05-13T11:00:28] Copied 141.541 GB in 80.298859 secs (1.763 GB/s) done
[2019-05-13T11:00:28] Copy data: 141.541 GB (151978344724 bytes)
[2019-05-13T11:00:28] Copy rate: 1.763 GB/s (151978344724 bytes in 80.298898 seconds)

We can also print percent complete and estimated time left if we pre-compute the total amount of work to be done. That will take a few extra steps.

adammoody · 2019-05-13T19:00:56Z

Sample messages from remove operation. This one has percent complete and estimated time remaining:

[2019-05-13T11:57:45] Walked 100001 items in 0.234298 seconds (426811.171786 files/sec)
[2019-05-13T11:57:45] Removing 100001 items
[2019-05-13T11:57:55] Removed 11351 of 100001 items in 10.005774 secs (1134.445023 items/sec) 11.35% complete 78 secs remaining...
[2019-05-13T11:58:05] Removed 23173 of 100001 items in 20.007553 secs (1158.212623 items/sec) 23.17% complete 66 secs remaining...
[2019-05-13T11:58:15] Removed 35128 of 100001 items in 30.009201 secs (1170.574326 items/sec) 35.13% complete 55 secs remaining...
[2019-05-13T11:58:25] Removed 46959 of 100001 items in 40.009633 secs (1173.692358 items/sec) 46.96% complete 45 secs remaining...
[2019-05-13T11:58:35] Removed 59668 of 100001 items in 50.010395 secs (1193.111951 items/sec) 59.67% complete 33 secs remaining...
[2019-05-13T11:58:45] Removed 72914 of 100001 items in 60.011518 secs (1215.000099 items/sec) 72.91% complete 22 secs remaining...
[2019-05-13T11:58:55] Removed 84876 of 100001 items in 70.012076 secs (1212.305139 items/sec) 84.88% complete 12 secs remaining...
[2019-05-13T11:59:05] Removed 96655 of 100001 items in 80.013420 secs (1207.984855 items/sec) 96.65% complete 2 secs remaining...
[2019-05-13T11:59:08] level=5 min=50000 max=50000 sum=100000 rate=1197.965291 secs=83.474873
[2019-05-13T11:59:08] level=4 min=0 max=1 sum=1 rate=68.652165 secs=0.014566
[2019-05-13T11:59:08] Removed 100001 of 100001 items in 83.489614 secs (1197.765748 items/sec) 100.00% complete
[2019-05-13T11:59:08] Removed 100001 of 100001 items in 83.489650 secs (1197.765225 items/sec) 100.00% complete
[2019-05-13T11:59:08] Removed 100001 items in 83.526148 seconds (1197.241856 items/sec)

adammoody

Thanks @dsikich ! Found a couple of things to clean up when testing with longer timeouts that we didn't think about when white boarding the algorithm. Then did some code refactoring.

dsikich force-pushed the rm-progress branch from 8a4a019 to 52e7417 Compare May 1, 2019 21:01

rm progress functions

67b0bfe

Signed-off-by: Danielle Sikich <sikich1@llnl.gov>

dsikich force-pushed the rm-progress branch from 52e7417 to 67b0bfe Compare May 1, 2019 21:01

dsikich added the Work In Progress label May 1, 2019

dsikich requested a review from adammoody May 1, 2019 21:02

Danielle Sikich added 2 commits May 1, 2019 18:53

add condition check for bcast completes but reduce does not

5285b2c

Signed-off-by: Danielle Sikich <sikich1@llnl.gov>

cleanup and move progress functions to common library

2344a09

Signed-off-by: Danielle Sikich <sikich1@llnl.gov>

dsikich force-pushed the rm-progress branch from aa18525 to 2344a09 Compare May 9, 2019 05:05

more cleanup and updated function interface

cc6f55a

adammoody added 8 commits May 11, 2019 12:57

drop unused rank/size variables in remove progress

1293ecb

progress: dup communicator in start, free in complete

67142cf

progress: allocate struct in start, free in complete

046772c

progress: wait for bcast and reduce to complete to print msg, add com…

da63ddd

…ments

progress: set timeout in start

bcf337b

progress: change reduction type to uint64_t

7cad18d

progress: only update reduce buffers before each reduce

92db9d6

progress: shorten names

02936a0

adammoody added 13 commits May 11, 2019 15:31

progress: record start time, rename current to time_last

e38a796

progress: allow arbitrary number of values

29c29ba

progress: forgot to set count

fe48802

progress: merge new/delete logic into start/complete

30d4e0a

progress: add progress callback function

ac8849c

progress: call bcast as soon as find that all have completed

176b720

progress: move reduce lines to a function

077457c

progress: rename msgs to prg

572c1ba

progress: move timeout check on rank 0 in complete to non rank 0 in u…

d13e1cb

…pdate

progress: switch from time() to MPI_Wtime()

f25645b

progress: change message when complete vs in progress

8d83507

progress: move code to new mfu_progress.c/h files

dc44b21

progress: delete dead code, unnecessary headers

761dcb6

adammoody added 2 commits May 13, 2019 11:12

progress: add progress messages to flist_copy

06350d9

progress: disable progress messages if timeout is set to 0

180a2d6

adammoody added 3 commits May 13, 2019 13:50

progress: add progress to dcmp and dsync

9b33c5e

progress: shorten remove progress message

82f2721

travis: update to mpich-3.3 for MPI 3 functions

dca71e4

adammoody removed the Work In Progress label May 13, 2019

adammoody approved these changes May 13, 2019

View reviewed changes

adammoody merged commit 975d851 into master May 13, 2019

adammoody deleted the rm-progress branch May 13, 2019 23:58

This was referenced May 14, 2019

drm: Provide progress messages #195

Closed

dcmp: better progress messages. #185

Closed

dcp: add periodic progress message #52

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add progress messages to drm #270

add progress messages to drm #270

dsikich commented May 1, 2019

dsikich commented May 1, 2019

adammoody commented May 1, 2019 •

edited

Loading

adammoody commented May 1, 2019 •

edited

Loading

adammoody commented May 1, 2019

dsikich commented May 1, 2019

adammoody commented May 1, 2019

adammoody commented May 1, 2019

dsikich commented May 1, 2019

adammoody commented May 1, 2019

dsikich commented May 1, 2019

adammoody commented May 1, 2019

adammoody commented May 9, 2019 •

edited

Loading

dsikich commented May 9, 2019

adammoody commented May 13, 2019

adammoody commented May 13, 2019 •

edited

Loading

adammoody left a comment

add progress messages to drm #270

add progress messages to drm #270

Conversation

dsikich commented May 1, 2019

dsikich commented May 1, 2019

adammoody commented May 1, 2019 • edited Loading

adammoody commented May 1, 2019 • edited Loading

adammoody commented May 1, 2019

dsikich commented May 1, 2019

adammoody commented May 1, 2019

adammoody commented May 1, 2019

dsikich commented May 1, 2019

adammoody commented May 1, 2019

dsikich commented May 1, 2019

adammoody commented May 1, 2019

adammoody commented May 9, 2019 • edited Loading

dsikich commented May 9, 2019

adammoody commented May 13, 2019

adammoody commented May 13, 2019 • edited Loading

adammoody left a comment

Choose a reason for hiding this comment

adammoody commented May 1, 2019 •

edited

Loading

adammoody commented May 1, 2019 •

edited

Loading

adammoody commented May 9, 2019 •

edited

Loading

adammoody commented May 13, 2019 •

edited

Loading