Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DynamicView::resize_serial #5220

Merged
merged 2 commits into from
Jul 19, 2022

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented Jul 14, 2022

I was running into a case where the call operator for DynamicView would deadlock in

// Allocation of this chunk is in progress
// so wait for allocation to complete.
while (nullptr == *ch)
;
. As discussed on slack, the code there was only useful before the removal of resize_parallel in 768f657.
In particular, it's just an error if the chunk pointer we need to access is a nullptr. I took the freedom to remove all that logic in 6b7a1a0 to only check if for out-of-bounds errors and not checking the pointer to be a nullptr at all.

The origin of the issue was seeing was that the deep copy in resize_serial hadn't finished before we were trying to access the View. Conceivably, this can happen if the access happens on a different execution space instance than the one used for deep copy. In this case, it was the same execution space instance but the deep copy wasn't properly enqueued (also see #5065). Since we are supposed to fence after calling DeepCopy internally anyway, I didn't bother with adding the workaround there as well.
In the end, it was enough to just fence the deep copy operation in resize_serial, and to do that I had to provide ChunkManager::deep_copy_to with an execution space instance argument. This also allowed (and enforced) using an execution space instance in a couple of other places using deep_copy_to.

@masterleinad masterleinad marked this pull request as ready for review July 14, 2022 17:15
@masterleinad
Copy link
Contributor Author

OPENMPTARGET-Clang failing with

4: [ RUN      ] openmptarget.reducers_struct
4: /var/jenkins/workspace/Kokkos/core/unit_test/TestReducers.hpp:367: Failure
4: Expected equality of these values:
4:   sum_view_scalar
4:     Which is: 32-byte object <00-B6 51-47 00-B6 51-47 00-60 D7-46 00-C0 58-46 00-B6 51-47 00-B6 51-47 00-B6 51-47 00-B6 51-47>
4:   reference_sum
4:     Which is: 32-byte object <00-B6 51-47 00-B6 51-47 00-B6 51-47 00-B6 51-47 00-B6 51-47 00-B6 51-47 00-B6 51-47 00-B6 51-47>
4: [  FAILED  ] openmptarget.reducers_struct (4 ms)

in openmptarget.reducers_struct looks unrelated/spurious.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this integral constant thing is the approved dummy object ...

@crtrott crtrott merged commit be70180 into kokkos:develop Jul 19, 2022
@masterleinad masterleinad deleted the fix_dynamicview_resize_serial branch July 19, 2022 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants