Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DistributedMemlet node and scheduling function #120

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

orausch
Copy link
Collaborator

@orausch orausch commented Aug 17, 2022

This change adds the DistributedMemlet library node and the scheduling
function for distributed computation.

This allows you to distribute the work in the top-level map of the SDFG
by specifying block sizes. The lowering function will analyze the SDFG
and try to find MPI nodes that implement the required communication.

Created using spr 1.3.5-beta.1
Created using spr 1.3.5-beta.1
@codecov
Copy link

codecov bot commented Aug 17, 2022

Codecov Report

Merging #120 (b87a9bd) into master (3c799c5) will increase coverage by 1.17%.
The diff coverage is 95.14%.

@@            Coverage Diff             @@
##           master     #120      +/-   ##
==========================================
+ Coverage   69.70%   70.87%   +1.17%     
==========================================
  Files          65       70       +5     
  Lines        7232     7621     +389     
==========================================
+ Hits         5041     5401     +360     
- Misses       2191     2220      +29     
Impacted Files Coverage Δ
daceml/distributed/utils.py 62.96% <62.96%> (ø)
daceml/util/utils.py 74.84% <95.00%> (+2.99%) ⬆️
daceml/distributed/communication/subarrays.py 97.06% <97.06%> (ø)
daceml/distributed/schedule.py 97.79% <97.79%> (ø)
daceml/distributed/__init__.py 100.00% <100.00%> (ø)
daceml/distributed/communication/node.py 100.00% <100.00%> (ø)
...ml/onnx/op_implementations/pure_implementations.py 72.25% <0.00%> (-1.71%) ⬇️
daceml/onnx/op_implementations/utils.py 100.00% <0.00%> (ø)
daceml/autodiff/analysis.py 95.24% <0.00%> (+2.38%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@orausch orausch added the no-ci label Aug 17, 2022
Created using spr 1.3.5-beta.1
Created using spr 1.3.5-beta.1
Created using spr 1.3.5-beta.1
Created using spr 1.3.5-beta.1
Created using spr 1.3.5-beta.1
Created using spr 1.3.5-beta.1
@orausch orausch removed the no-ci label Aug 17, 2022
Created using spr 1.3.5-beta.1
@orausch orausch requested a review from tbennun August 17, 2022 18:23
Created using spr 1.3.5-beta.1
Copy link
Contributor

@tbennun tbennun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments only :) I'm a bit worried about size_exact being used a lot but it's fine for now

Makefile Outdated Show resolved Hide resolved
daceml/distributed/communication/node.py Show resolved Hide resolved
daceml/distributed/utils.py Outdated Show resolved Hide resolved
tests/distributed/mpi_mute.py Show resolved Hide resolved
commworld = MPI.COMM_WORLD
rank = commworld.Get_rank()
size = commworld.Get_size()
if size < utils.prod(sizes):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you know that the pytest dist plugin supports giving a number of ranks as a marker?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but here I'd rather fail than skip. Also depends on the schedule sizes.

daceml/distributed/communication/subarrays.py Show resolved Hide resolved
daceml/distributed/schedule.py Show resolved Hide resolved
for is_input, result in zip([True, False], results):

# gather internal memlets by the out array they write to
internal_memlets: Dict[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe look in scope_subgraph

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? In case there is a global write in the subgraph? Is that allowed?

daceml/distributed/schedule.py Show resolved Hide resolved
daceml/distributed/schedule.py Outdated Show resolved Hide resolved
Created using spr 1.3.5-beta.1
Created using spr 1.3.5-beta.1
@orausch orausch closed this Aug 20, 2022
@orausch orausch reopened this Aug 20, 2022
orausch added a commit that referenced this pull request Aug 28, 2022
This change adds the DistributedMemlet library node and the scheduling
function for distributed computation.

This allows you to distribute the work in the top-level map of the SDFG
by specifying block sizes. The lowering function will analyze the SDFG
and try to find MPI nodes that implement the required communication.

Pull Request: #120
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants