Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error message in view memory access violations #4950

Merged

Conversation

dalg24
Copy link
Member

@dalg24 dalg24 commented Apr 8, 2022

Include view label in error message when possible

@dalg24 dalg24 force-pushed the view_memory_access_violation_improve_err_msg branch 3 times, most recently from 3453fa6 to d0de588 Compare April 12, 2022 21:56
@dalg24 dalg24 force-pushed the view_memory_access_violation_improve_err_msg branch 3 times, most recently from 5adb02a to fc7d02e Compare April 14, 2022 20:14
char const unmanaged[] = "**UNMANAGED**";
char const unavailable[] = "**UNAVAILABLE**";
Map const&) {
char err[1024] = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the UB? .c_str() should be stable unless the source string is modified/destroyed.

Also, I prefer strncat over strcat, because if the source is too long, it gets truncated instead of UB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the UB? .c_str() should be stable unless the source string is modified/destroyed.

Exactly

Also, I prefer strncat over strcat, because if the source is too long, it gets truncated instead of UB.

Ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit iffy, you essentially increase the stack frame size for any bounds checking by 1kB, this may crash codes in Debug mode now which didn't before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would need to increase the stack size for the GPU to make it work again, if they were close before.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider

#include <Kokkos_Core.hpp>

template <class Space>
struct Foo {
  Kokkos::View<int, Space> m_v {"v"};
  Kokkos::View<float, Space> m_w {"w"};
  Kokkos::View<double*, Space> m_u {reinterpret_cast<double*>(0xDEADBEEF), 1};
  KOKKOS_FUNCTION void operator()(int) const {
    ++m_v();
    ++m_w();
    ++m_u(0);
  }
};

int main(int argc, char* argv[]) {
  Kokkos::initialize(argc, argv);
  {
    Kokkos::parallel_for(1, Foo<Kokkos::DefaultExecutionSpace>()); // OK
    Kokkos::parallel_for(1, Foo<Kokkos::DefaultHostExecutionSpace>()); // memory access violation occurs

  }
  Kokkos::finalize();
}

Passing --resource-usage to NVCC yields

ptxas info    : 32 bytes gmem, 32768 bytes cmem[3]
ptxas info    : Compiling entry function '_ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForI3FooINS_6SerialEENS_11RangePolicyIJNS_4CudaEEEES7_EEEEvT_' for 'sm_70'
ptxas info    : Function properties for _ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForI3FooINS_6SerialEENS_11RangePolicyIJNS_4CudaEEEES7_EEEEvT_
    1040 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 24 registers, 456 bytes cmem[0]
ptxas info    : Compiling entry function '_ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForI3FooINS_4CudaEENS_11RangePolicyIJS4_EEES4_EEEEvT_' for 'sm_70'
ptxas info    : Function properties for _ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForI3FooINS_4CudaEENS_11RangePolicyIJS4_EEES4_EEEEvT_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 18 registers, 456 bytes cmem[0]
ptxas info    : Compiling entry function '_ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForINS0_16ViewValueFunctorINS_6DeviceINS_4CudaENS_9CudaSpaceEEEfLb1EEENS_11RangePolicyIJS5_NS_9IndexTypeIlEEEEES5_EEEEvT_' for 'sm_70'
ptxas info    : Function properties for _ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForINS0_16ViewValueFunctorINS_6DeviceINS_4CudaENS_9CudaSpaceEEEfLb1EEENS_11RangePolicyIJS5_NS_9IndexTypeIlEEEEES5_EEEEvT_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 12 registers, 472 bytes cmem[0]
ptxas info    : Compiling entry function '_ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForINS0_16ViewValueFunctorINS_6DeviceINS_4CudaENS_9CudaSpaceEEEiLb1EEENS_11RangePolicyIJS5_NS_9IndexTypeIlEEEEES5_EEEEvT_' for 'sm_70'
ptxas info    : Function properties for _ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForINS0_16ViewValueFunctorINS_6DeviceINS_4CudaENS_9CudaSpaceEEEiLb1EEENS_11RangePolicyIJS5_NS_9IndexTypeIlEEEEES5_EEEEvT_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 12 registers, 472 bytes cmem[0]
ptxas info    : Compiling entry function '_ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForINS0_16ViewValueFunctorINS_6DeviceINS_4CudaENS_9CudaSpaceEEEjLb1EEENS_11RangePolicyIJS5_NS_9IndexTypeIlEEEEES5_EEEEvT_' for 'sm_70'
ptxas info    : Function properties for _ZN6Kokkos4Impl33cuda_parallel_launch_local_memoryINS0_11ParallelForINS0_16ViewValueFunctorINS_6DeviceINS_4CudaENS_9CudaSpaceEEEjLb1EEENS_11RangePolicyIJS5_NS_9IndexTypeIlEEEEES5_EEEEvT_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 12 registers, 472 bytes cmem[0]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack frame size is only affected when there is illegal memory accesses in the code.
That said I agree 1 kilobyte is excessive. I will reduce.

@dalg24 dalg24 force-pushed the view_memory_access_violation_improve_err_msg branch from fc7d02e to f14fd60 Compare April 15, 2022 02:55
@dalg24
Copy link
Member Author

dalg24 commented Apr 15, 2022

Retest this please.
Could not reproduce OpenMPTarget failure with a fresh install of LLVM+Clang on Ampere

@dalg24
Copy link
Member Author

dalg24 commented Apr 15, 2022

4: [ RUN      ] openmptarget.unique_token_instance
4: /var/jenkins/workspace/Kokkos/core/unit_test/TestUniqueToken.hpp:151: Failure
4: Expected equality of these values:
4:   sum
4:     Which is: 9999990
4:   int64_t(N) * R
4:     Which is: 10000000
4: [  FAILED  ] openmptarget.unique_token_instance (191 ms)

Retest this please...

@rgayatri23 there is an insane amount of

`num_teams` clause was not respected.
`num_teams` clause was not respected.
[...]

being printed to the point the log gets really big. We need to handle these. Would you please look into it?

@dalg24 dalg24 marked this pull request as ready for review April 15, 2022 17:16
@dalg24 dalg24 requested review from nliber and janciesko April 15, 2022 17:16
@rgayatri23
Copy link
Contributor

4: [ RUN      ] openmptarget.unique_token_instance
4: /var/jenkins/workspace/Kokkos/core/unit_test/TestUniqueToken.hpp:151: Failure
4: Expected equality of these values:
4:   sum
4:     Which is: 9999990
4:   int64_t(N) * R
4:     Which is: 10000000
4: [  FAILED  ] openmptarget.unique_token_instance (191 ms)

Retest this please...

@rgayatri23 there is an insane amount of

`num_teams` clause was not respected.
`num_teams` clause was not respected.
[...]

being printed to the point the log gets really big. We need to handle these. Would you please look into it?

I will look into it more deeply but if that message is printed then the correctness will definitely fail as the kernel wont be executed.


template <std::size_t... Is>
KOKKOS_FUNCTION decltype(auto) bad_access(std::index_sequence<Is...>) const {
return v((Is * 0)...);
Copy link
Contributor

@janciesko janciesko Apr 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is Is multiplied by zero?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing v(0, 0, 0, ...)

template <class View, class LblOrPtr>
auto make_view(LblOrPtr x) {
return make_view_impl<View>(std::move(x),
std::make_index_sequence<View::rank>());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this.

char const unmanaged[] = "**UNMANAGED**";
char const unavailable[] = "**UNAVAILABLE**";
Map const&) {
char err[1024] = "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit iffy, you essentially increase the stack frame size for any bounds checking by 1kB, this may crash codes in Debug mode now which didn't before.

char const unmanaged[] = "**UNMANAGED**";
char const unavailable[] = "**UNAVAILABLE**";
Map const&) {
char err[1024] = "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would need to increase the stack size for the GPU to make it work again, if they were close before.

@crtrott crtrott merged commit e5cbef5 into kokkos:develop Apr 26, 2022
@dalg24 dalg24 deleted the view_memory_access_violation_improve_err_msg branch April 26, 2022 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants