Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of UB when copying to and from empty unmanaged views #1968

Merged
merged 3 commits into from
Jan 24, 2019

Conversation

dalg24
Copy link
Member

@dalg24 dalg24 commented Jan 18, 2019

fix #1967

Note I copy/pasted the code that throws if the destination and source views dimension do not match which is not ideal...
Let me know if you want me to change that.

@dalg24 dalg24 changed the title [WIP] Get rid of UB when copying to and from empty unmanaged views Get rid of UB when copying to and from empty unmanaged views Jan 18, 2019
@dalg24
Copy link
Member Author

dalg24 commented Jan 19, 2019

For the record

View<int*, MemoryTraits<Unmanaged>> v1( nullptr, 0 );
View<int*, MemoryTraits<Unmanaged>> v2( reinterpret_cast<int*>(-1), 0 );
View<int*> v3("v3", 0);
View<int*> v4("v4", 10);
#ifdef KOKKOS_ENABLE_DEPRECATED_CODE
deep_copy(v1, v2); // fixed
deep_copy(v2, v1); // fixed
deep_copy(v1, v3); // ok
deep_copy(v3, v1); // ok
deep_copy(v2, v3); // fixed
deep_copy(v3, v2); // fixed
ASSERT_NO_THROW( deep_copy(v1, v4) );
ASSERT_NO_THROW( deep_copy(v4, v1) );
ASSERT_NO_THROW( deep_copy(v2, v4) );
ASSERT_NO_THROW( deep_copy(v4, v2) );
ASSERT_NO_THROW( deep_copy(v3, v4) );
ASSERT_NO_THROW( deep_copy(v4, v3) );
#else
deep_copy(v1, v2); // fixed
deep_copy(v2, v1); // fixed
deep_copy(v1, v3); // fixed
deep_copy(v3, v1); // fixed
deep_copy(v2, v3); // ok
deep_copy(v3, v2); // ok
using DimensionMismatchError = std::runtime_error;
ASSERT_THROW( deep_copy(v1, v4), DimensionMismatchError );
ASSERT_THROW( deep_copy(v4, v1), DimensionMismatchError );
ASSERT_THROW( deep_copy(v2, v4), DimensionMismatchError );
ASSERT_THROW( deep_copy(v4, v2), DimensionMismatchError );
ASSERT_THROW( deep_copy(v3, v4), DimensionMismatchError );
ASSERT_THROW( deep_copy(v4, v3), DimensionMismatchError );
#endif

@dalg24
Copy link
Member Author

dalg24 commented Jan 19, 2019

#define ASSERT_NO_THROW(expression) { \
    try { \
        expression; \
    } \
    catch (...) { \
        std::cerr<<__FILE__<<":"<<__LINE__<<" "<<#expression<<" did throw an error\n"; \
        std::abort(); \
    } \
}
#define ASSERT_THROW(expression, exception_type) { \
    try { \
        expression; \
        std::cerr<<__FILE__<<":"<<__LINE__<<" "<<#expression<<" did not throw "<<#exception_type<<"\n"; \
        std::abort(); \
    } \
    catch (exception_type const &) { \
        /*success*/ \
    } \
    catch (...) { \
        std::cerr<<__FILE__<<":"<<__LINE__<<" "<<#expression<<" did throw but exception type was not "<<#exception_type<<"\n"; \
        std::abort(); \
    } \
}

@crtrott
Copy link
Member

crtrott commented Jan 23, 2019

This is ok. We need to run the spot-check, then get it in.

(src.extent(7) != dst.extent(7))
) {
std::string message("Deprecation Error: Kokkos::deep_copy extents of views don't match: ");
message += dst.label(); message += "(";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see this "print all the extents" feature being useful elsewhere. In a future refactor, consider making this a separate function.

@@ -1401,7 +1401,33 @@ void deep_copy
typedef typename src_type::memory_space src_memory_space ;
typedef typename dst_type::value_type dst_value_type ;
typedef typename src_type::value_type src_value_type ;
if(dst.data() == NULL && src.data() == NULL) {
if(dst.data() == NULL || src.data() == NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question not relevant to this fix: Why can't this specialization of deep_copy just call the one that takes an execution space instance as its first argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point Mark. I just went over the source code and I believe calling the other one with the execution space of the destination would do the trick.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would remove ~200 lines of essentially duplicated code. I am trying it locally, building the tests at the moment. Haven't looked into the other 3 pairs of deep_copy(dst, src) and deep_copy(exec_space, dst, src) in details to see if there was more opportunity to downsize the amount of code to maintain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have a merge conflict with #1919 , but in general is in the spirit of making Kokkos respect execution space instances throughout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhoemmen Some tests were failing now. It is not obvious to me what went wrong, whether I overlooked some subtle difference between the two functions (most likely the issue...) or if I messed up when I implemented the changes you suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dalg24 I remember trying this once before and having the same issue. Thanks for trying, though! It's fine to duplicate that code for now.

@crtrott
Copy link
Member

crtrott commented Jan 24, 2019

We also need to add a unit test which tests the behaviour described in #1967 so it doesn't break again by accident in the future.

@ndellingwood
Copy link
Contributor

I ran the spot-check on kokkos-dev, all tests pass:

[ndellin@kokkos-dev TestAllSandia]$ tail nohup.out 
  Starting job cuda-8.0.44-Cuda_OpenMP-release
  PASSED cuda-8.0.44-Cuda_OpenMP-release
#######################################################
PASSED TESTS
#######################################################
clang-4.0.1-Pthread_Serial-release build_time=191 run_time=470
cuda-8.0.44-Cuda_OpenMP-release build_time=557 run_time=611
gcc-5.3.0-OpenMP-release build_time=180 run_time=103
gcc-7.3.0-Serial-release build_time=165 run_time=206
intel-17.0.1-OpenMP-release build_time=507 run_time=160

@crtrott
Copy link
Member

crtrott commented Jan 24, 2019

Nathan: please post the whole output so it includes the SHA.

@dalg24
Copy link
Member Author

dalg24 commented Jan 24, 2019

We also need to add a unit test which tests the behaviour described in #1967 so it doesn't break again by accident in the future.

I am not so familiar with you automated testing setup. Do you have any of the builds using Clang sanitizers? That would definitely expose the bug with the code I posted.

@ndellingwood
Copy link
Contributor

@crtrott Here's the full output

Running on machine: sems
Repository Status:  79f3584a1c3c898a8fd91a98482c8dee50647650 Fix undefined behavior when copying from and to empty unmanaged views


Going to test compilers:  gcc/5.3.0 gcc/7.3.0 intel/17.0.1 clang/4.0.1 cuda/8.0.44
Testing compiler gcc/5.3.0
Testing compiler gcc/7.3.0
  Starting job gcc-5.3.0-OpenMP-release
  PASSED gcc-5.3.0-OpenMP-release
Testing compiler intel/17.0.1
  Starting job gcc-7.3.0-Serial-release
  PASSED gcc-7.3.0-Serial-release
Testing compiler clang/4.0.1
  Starting job intel-17.0.1-OpenMP-release
  PASSED intel-17.0.1-OpenMP-release
Testing compiler cuda/8.0.44
  Starting job clang-4.0.1-Pthread_Serial-release
  PASSED clang-4.0.1-Pthread_Serial-release
  Starting job cuda-8.0.44-Cuda_OpenMP-release
  PASSED cuda-8.0.44-Cuda_OpenMP-release
#######################################################
PASSED TESTS
#######################################################
clang-4.0.1-Pthread_Serial-release build_time=191 run_time=470
cuda-8.0.44-Cuda_OpenMP-release build_time=557 run_time=611
gcc-5.3.0-OpenMP-release build_time=180 run_time=103
gcc-7.3.0-Serial-release build_time=165 run_time=206
intel-17.0.1-OpenMP-release build_time=507 run_time=160

@crtrott
Copy link
Member

crtrott commented Jan 24, 2019

Adding UnitTest and rerunning spot-check right now.

Kokkos::deep_copy(v_m_1, v_um_def_2);
Kokkos::deep_copy(v_m_1, v_um_2);
Kokkos::deep_copy(v_m_1, v_m_def_2);
Kokkos::deep_copy(v_m_1, v_m_2);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crtrott You might want to include the part where I make sure that an exception is thrown if attempting to copy from or to empty views where the other operand is some view with extents that do not match.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not a bad idea. I think we need to make this a part of a larger push to test for asserts where asserts are expected. We have precious little of this right now.

@crtrott
Copy link
Member

crtrott commented Jan 24, 2019

Running on machine: apollo
WARNING!! THE FOLLOWING CHANGES ARE UNCOMMITTED!! :
?? patch_teamvectorrange

Repository Status:  fdcbb00af8e06eca975b23b71b2861ec3581761c Adding unit test for degenerated view copy


Going to test compilers:  gcc/4.8.4 gcc/5.3.0 intel/16.0.1 clang/3.9.0 clang/6.0 cuda/9.1
Testing compiler gcc/4.8.4
  Starting job gcc-4.8.4-OpenMP-release
  PASSED gcc-4.8.4-OpenMP-release
Testing compiler gcc/5.3.0
  Starting job gcc-4.8.4-Pthread-release
  PASSED gcc-4.8.4-Pthread-release
Testing compiler intel/16.0.1
  Starting job gcc-5.3.0-Serial-release
  PASSED gcc-5.3.0-Serial-release
Testing compiler clang/3.9.0
  Starting job intel-16.0.1-OpenMP-release
  PASSED intel-16.0.1-OpenMP-release
Testing compiler clang/6.0
  Starting job clang-3.9.0-Pthread_Serial-release
  PASSED clang-3.9.0-Pthread_Serial-release
  Starting job clang-6.0-Cuda_Pthread-release
  PASSED clang-6.0-Cuda_Pthread-release
Testing compiler cuda/9.1
  Starting job clang-6.0-OpenMP-release
  PASSED clang-6.0-OpenMP-release
  Starting job cuda-9.1-Cuda_OpenMP-release
  PASSED cuda-9.1-Cuda_OpenMP-release
#######################################################
PASSED TESTS
#######################################################
clang-3.9.0-Pthread_Serial-release build_time=127 run_time=418
clang-6.0-Cuda_Pthread-release build_time=221 run_time=283
clang-6.0-OpenMP-release build_time=112 run_time=80
cuda-9.1-Cuda_OpenMP-release build_time=305 run_time=161
gcc-4.8.4-OpenMP-release build_time=103 run_time=103
gcc-4.8.4-Pthread-release build_time=97 run_time=308
gcc-5.3.0-Serial-release build_time=112 run_time=161
intel-16.0.1-OpenMP-release build_time=330 run_time=140

@crtrott
Copy link
Member

crtrott commented Jan 24, 2019

Had accidentally amended a commit. But this was now tested. Just waiting for travis to go through than we can merge this in.

@crtrott crtrott merged commit 7af7f80 into kokkos:develop Jan 24, 2019
@dalg24 dalg24 deleted the ub_deep_copy branch May 17, 2019 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants