Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#4517: DynamicView: Fix deallocation extent #4533

Merged
merged 1 commit into from
Nov 18, 2021

Conversation

lifflander
Copy link
Contributor

Fixes #4517

@lifflander
Copy link
Contributor Author

This change fixes the performance issues that were observed in EMPIRE

@PhilMiller
Copy link
Contributor

Was the code previously just deallocating some huge number of garbage entries that it shouldn't have been?

@rppawlo
Copy link
Contributor

rppawlo commented Nov 16, 2021

Thanks @lifflander for finding this! We have verified that this fixes the empire issues.

@ndellingwood - when this merges into kokkos, can you snapshot into trilinos. This shoudl get the empire sync of trilinos going again!

@lifflander
Copy link
Contributor Author

Was the code previously just deallocating some huge number of garbage entries that it shouldn't have been?

Yes, they are nullptr... so just wasting time. Before I changed the code, the deallocator was going over all the entries also but it tested for nullptr before actually calling the deallocator.

PhilMiller
PhilMiller previously approved these changes Nov 16, 2021
@masterleinad
Copy link
Contributor

Looks like this makes the unit tests segfault.

@@ -173,7 +173,8 @@ struct ChunkedArrayManager {
void execute() {
// Destroy the array of chunk pointers.
// Two entries beyond the max chunks are allocation counters.
for (unsigned i = 0; i < m_chunk_max; i++) {
auto const len = *reinterpret_cast<size_t*>(m_chunks + m_chunk_max + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels so sketchy. Why is it not something like

Suggested change
auto const len = *reinterpret_cast<size_t*>(m_chunks + m_chunk_max + 1);
size_t const len = *(m_chunks + m_chunk_max + 1);

Also what was wrong with unsigned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_chunks is a pointer type that is abused to store a numerical value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t size() const noexcept {
size_t extent_0 =
*reinterpret_cast<const size_t*>(m_chunks_host + m_chunk_max + 1);
return extent_0;
}

@PhilMiller PhilMiller dismissed their stale review November 16, 2021 22:47

Something's clearly not right/working here

@lifflander
Copy link
Contributor Author

I accidentally picked the wrong entry for deallocation:

  • m_chunks + m_chunk_max -> number of chunks used
  • m_chunks + m_chunk_max + 1 -> extent (number of chunks * chunk_size)

@PhilMiller
Copy link
Contributor

PhilMiller commented Nov 17, 2021

The CI failure from Jenkins is spurious relative to this PR -

1: [ RUN      ] openmp.reducers_half_t
1: /var/jenkins/workspace/Kokkos/core/unit_test/TestReducers.hpp:398: Failure
1: Expected equality of these values:
1:   prod_scalar
1:     Which is: 13120.000000
1:   reference_prod
1:     Which is: 13128.000000
1: [  FAILED  ] openmp.reducers_half_t (0 ms)

It probably represents an independent issue to be investigated, though.

https://cloud.cees.ornl.gov/jenkins-ci/blue/rest/organizations/jenkins/pipelines/Kokkos/runs/7162/nodes/47/steps/207/log/?start=0

@@ -173,7 +173,8 @@ struct ChunkedArrayManager {
void execute() {
// Destroy the array of chunk pointers.
// Two entries beyond the max chunks are allocation counters.
for (unsigned i = 0; i < m_chunk_max; i++) {
size_t const len = *reinterpret_cast<size_t*>(m_chunks + m_chunk_max);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size_t const len = *reinterpret_cast<size_t*>(m_chunks + m_chunk_max);
uintptr_t const len = *reinterpret_cast<uintptr_t*>(m_chunks + m_chunk_max);

just to be conforming with access in other locations. Also, I would very much prefer if the allocation counters are stored in separate variables (inside ChunkedArrayManager). The need to reinterpret_cast here feels pretty sketchy and is confusing regarding readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've made the change to uintptr_t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that allocating them separately would be increase code readability, but this PR is not making this worse (i.e., DynamicView has been this way since the beginning). Due to how critical this is for EMPIRE right now, I would like to get this moved forward as is and then push for refactoring later to improve code readability.

@PhilMiller
Copy link
Contributor

Retest this please

@crtrott crtrott merged commit 55aeec4 into kokkos:develop Nov 18, 2021
ndellingwood pushed a commit to ndellingwood/kokkos that referenced this pull request Nov 18, 2021
…extent

kokkos#4517: DynamicView: Fix deallocation extent

(cherry picked from commit 55aeec4)
@ndellingwood
Copy link
Contributor

Cherry-picked to 3.5 with #4538

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants