Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Request Completion 2 #122

Merged
merged 2 commits into from Aug 3, 2022
Merged

Conversation

jtronge
Copy link
Collaborator

@jtronge jtronge commented Jul 28, 2022

Hello, I've been working with @hppritcha to attempt adding multiple request completion support. This is partly based on @AndrewGaspar's original PR #36 but is slightly different. Based on some of the discussion in #29, I experimented with attaching references to each request as a way to allow the user to immediately perform some computation on the data as soon as the request has been waited on.

Using the modified request object as a basis I added a RequestCollection object, similar to the former PR, but including support for attached references. Requests can be added to the collection and then waited on using the wait_() or test_() methods. To ensure that requests have completed, I included a new multiple_scope() function which passes in both a reference to a scope object and a request collection.

* Attach data references to Request object
* Add RequestCollection object for interfacing with MPI wait_*() and test_*() functions
* Add three new examples/tests for this functionality
@hppritcha
Copy link

could someone kick off the CI for this PR @bsteinb @jedbrown thanks.

Copy link
Contributor

@jedbrown jedbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. I see some cargo fmt noise and have some questions about how to expose the completions.

src/request.rs Outdated Show resolved Hide resolved
examples/immediate_multiple_requests.rs Outdated Show resolved Hide resolved
@jtronge
Copy link
Collaborator Author

jtronge commented Jul 30, 2022

Thanks for taking a look at this. I'll fix the lints and take a closer look at the wait_some/test_some code at the start of next week.

@jedbrown
Copy link
Contributor

jedbrown commented Aug 2, 2022

I'd like to release version 0.6 soon so other crates can publish. Do you think this should be part of 0.6 and if so, how long do you anticipate needing?

@jedbrown
Copy link
Contributor

jedbrown commented Aug 2, 2022

Heads up that there are some new clippy warnings. This is clear on main.

$ cargo clippy -- --allow clippy::missing_safety_doc

@jtronge jtronge force-pushed the multi-completion2 branch 2 times, most recently from a510882 to 980a537 Compare August 2, 2022 18:12
@jtronge
Copy link
Collaborator Author

jtronge commented Aug 2, 2022

Ok, thanks. I think I managed to fix all of the warnings. I've updated the code and examples to use &mut Vec.

Copy link
Contributor

@jedbrown jedbrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just a couple comments to consider. And I'll run a pipeline.

examples/immediate_multiple_requests.rs Outdated Show resolved Hide resolved
examples/immediate_multiple_requests.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
@jedbrown
Copy link
Contributor

jedbrown commented Aug 3, 2022

I don't think this is from your branch (it passed on re-running), but do you understand the intermittent failure here? It looks like perhaps std::panic::catch_unwind not working correctly.

https://github.com/rsmpi/rsmpi/runs/7658512360?check_suite_focus=true#step:8:120

@jedbrown
Copy link
Contributor

jedbrown commented Aug 3, 2022

Many thanks for this important contribution. I'll merge now.

@jedbrown jedbrown merged commit e6ce575 into rsmpi:main Aug 3, 2022
@jtronge
Copy link
Collaborator Author

jtronge commented Aug 3, 2022

I don't think this is from your branch (it passed on re-running), but do you understand the intermittent failure here? It looks like perhaps std::panic::catch_unwind not working correctly.

https://github.com/rsmpi/rsmpi/runs/7658512360?check_suite_focus=true#step:8:120

Hmmn, I'm not sure. I've run into a couple strange errors like this when Rust panics, or the c code that rust is calling aborts. It looks like openmpi is failing here and I'm not sure where the Rust code is handling this, or if it even is.

@jedbrown
Copy link
Contributor

jedbrown commented Aug 3, 2022

There is output from where Rust panics (that's supposed to be caught), but the unlink error is quite strange. It'll be something to watch, especially if we find a way to reproduce it.

@jtronge
Copy link
Collaborator Author

jtronge commented Aug 3, 2022

Talking with @hppritcha, he pointed me to this issue: open-mpi/ompi#7393. Seems like we might want to set TMPDIR for macOS at least. He also suggested adding a sleep between tests.

@jedbrown
Copy link
Contributor

jedbrown commented Aug 4, 2022

Thanks, I haven't seen this failure since #125.

I've been seeing non-deterministic MSMPI failures recently, but I don't think they're related to recent changes to rsmpi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants