Add DistributedMemlet node and scheduling function #120

orausch · 2022-08-17T14:03:22Z

This change adds the DistributedMemlet library node and the scheduling
function for distributed computation.

This allows you to distribute the work in the top-level map of the SDFG
by specifying block sizes. The lowering function will analyze the SDFG
and try to find MPI nodes that implement the required communication.

Created using spr 1.3.5-beta.1

codecov · 2022-08-17T14:10:24Z

Codecov Report

Merging #120 (b87a9bd) into master (3c799c5) will increase coverage by 1.17%.
The diff coverage is 95.14%.

@@            Coverage Diff             @@
##           master     #120      +/-   ##
==========================================
+ Coverage   69.70%   70.87%   +1.17%     
==========================================
  Files          65       70       +5     
  Lines        7232     7621     +389     
==========================================
+ Hits         5041     5401     +360     
- Misses       2191     2220      +29

Impacted Files	Coverage Δ
daceml/distributed/utils.py	`62.96% <62.96%> (ø)`
daceml/util/utils.py	`74.84% <95.00%> (+2.99%)`	⬆️
daceml/distributed/communication/subarrays.py	`97.06% <97.06%> (ø)`
daceml/distributed/schedule.py	`97.79% <97.79%> (ø)`
daceml/distributed/__init__.py	`100.00% <100.00%> (ø)`
daceml/distributed/communication/node.py	`100.00% <100.00%> (ø)`
...ml/onnx/op_implementations/pure_implementations.py	`72.25% <0.00%> (-1.71%)`	⬇️
daceml/onnx/op_implementations/utils.py	`100.00% <0.00%> (ø)`
daceml/autodiff/analysis.py	`95.24% <0.00%> (+2.38%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Created using spr 1.3.5-beta.1

tbennun

Minor comments only :) I'm a bit worried about size_exact being used a lot but it's fine for now

Makefile

daceml/distributed/communication/node.py

daceml/distributed/utils.py

tests/distributed/mpi_mute.py

tbennun · 2022-08-18T05:29:39Z

tests/distributed/test_lower_schedule.py

+    commworld = MPI.COMM_WORLD
+    rank = commworld.Get_rank()
+    size = commworld.Get_size()
+    if size < utils.prod(sizes):


did you know that the pytest dist plugin supports giving a number of ranks as a marker?

Yes, but here I'd rather fail than skip. Also depends on the schedule sizes.

daceml/distributed/communication/subarrays.py

daceml/distributed/schedule.py

tbennun · 2022-08-19T10:13:09Z

daceml/distributed/schedule.py

+    for is_input, result in zip([True, False], results):
+
+        # gather internal memlets by the out array they write to
+        internal_memlets: Dict[


Maybe look in scope_subgraph

What do you mean? In case there is a global write in the subgraph? Is that allowed?

daceml/distributed/schedule.py

Created using spr 1.3.5-beta.1

This change adds the DistributedMemlet library node and the scheduling function for distributed computation. This allows you to distribute the work in the top-level map of the SDFG by specifying block sizes. The lowering function will analyze the SDFG and try to find MPI nodes that implement the required communication. Pull Request: #120

orausch added 2 commits August 17, 2022 16:03

[𝘀𝗽𝗿] initial version

bb3306c

Created using spr 1.3.5-beta.1

Remove unpushed communication_solver

411aa86

Created using spr 1.3.5-beta.1

orausch added the no-ci label Aug 17, 2022

orausch added 6 commits August 17, 2022 16:18

Add distributed CI

665d20a

Created using spr 1.3.5-beta.1

Change mpirun option

ce4984f

Created using spr 1.3.5-beta.1

Add mpi mute plugin

f36e216

Created using spr 1.3.5-beta.1

Update makefile for CI

05a0af4

Created using spr 1.3.5-beta.1

Run coverage in parallel mode

4af16ad

Created using spr 1.3.5-beta.1

Fix coverage arguments

2bc15a7

Created using spr 1.3.5-beta.1

orausch removed the no-ci label Aug 17, 2022

Uncomment CI skip

c3b9ecd

Created using spr 1.3.5-beta.1

orausch requested a review from tbennun August 17, 2022 18:23

Clean up comments

ae45e3e

Created using spr 1.3.5-beta.1

tbennun requested changes Aug 19, 2022

View reviewed changes

orausch added 2 commits August 19, 2022 16:06

Review fixes

2873358

Created using spr 1.3.5-beta.1

Docstring + formatting

b87a9bd

Created using spr 1.3.5-beta.1

orausch closed this Aug 20, 2022

orausch reopened this Aug 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add DistributedMemlet node and scheduling function #120

Add DistributedMemlet node and scheduling function #120

orausch commented Aug 17, 2022

codecov bot commented Aug 17, 2022 •

edited

tbennun left a comment

tbennun Aug 18, 2022

orausch Aug 19, 2022

tbennun Aug 19, 2022

orausch Aug 19, 2022

Add DistributedMemlet node and scheduling function #120

Are you sure you want to change the base?

Add DistributedMemlet node and scheduling function #120

Conversation

orausch commented Aug 17, 2022

codecov bot commented Aug 17, 2022 • edited

Codecov Report

tbennun left a comment

Choose a reason for hiding this comment

tbennun Aug 18, 2022

Choose a reason for hiding this comment

orausch Aug 19, 2022

Choose a reason for hiding this comment

tbennun Aug 19, 2022

Choose a reason for hiding this comment

orausch Aug 19, 2022

Choose a reason for hiding this comment

codecov bot commented Aug 17, 2022 •

edited