Skip to content

Commit

Permalink
Tpetra::CrsMatrix::localSolve: Fix CUDA debug-mode run-time errors
Browse files Browse the repository at this point in the history
@trilinos/tpetra @trilinos/ifpack2

Ifpack2's unit tests were failing for me in a debug build
(Kokkos_ENABLE_DEBUG:BOOL=ON), because Kokkos::DualView was
complaining about "concurrent modification" on device and host.  The
issue was that Tpetra::CrsMatrix::localSolve was sync'ing to device,
then running KokkosSparse::trsv.  The latter function currently runs
on host only, however.  This commit fixes the test failure by changing
localSolve, first to sync to host before the local solve, then to sync
back to device after the local solve.

General Tpetra preference is to move away from implementing things
like sparse triangular solve in Tpetra.  Ifpack2 has already started
supporting this.  See also the relevant kokkos-kernels issue,
kokkos/kokkos-kernels#48.
  • Loading branch information
Mark Hoemmen committed Aug 21, 2017
1 parent 7015333 commit ba53e48
Showing 1 changed file with 22 additions and 6 deletions.
28 changes: 22 additions & 6 deletions packages/tpetra/core/src/Tpetra_CrsMatrix_decl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -3114,6 +3114,10 @@ namespace Tpetra {
using Teuchos::CONJ_TRANS;
using Teuchos::NO_TRANS;
using Teuchos::TRANS;
typedef MultiVector<RangeScalar, LocalOrdinal,
GlobalOrdinal, Node, classic> RMV;
typedef Kokkos::HostSpace host_memory_space;
typedef typename device_type::memory_space dev_memory_space;
const char tfecfFuncName[] = "localSolve: ";

TEUCHOS_TEST_FOR_EXCEPTION_CLASS_FUNC
Expand Down Expand Up @@ -3163,25 +3167,37 @@ namespace Tpetra {
(getNodeNumDiags () < getNodeNumRows ()) ? "U" : "N";

local_matrix_type A_lcl = this->getLocalMatrix ();
X.template modify<device_type> (); // we will write to X

// NOTE (mfh 20 Aug 2017): KokkosSparse::trsv currently is a
// sequential, host-only code. See
// https://github.com/kokkos/kokkos-kernels/issues/48. This
// means that we need to sync to host, then sync back to device
// when done.
X.template sync<host_memory_space> ();
const_cast<RMV&> (Y).template sync<host_memory_space> ();
X.template modify<host_memory_space> (); // we will write to X

if (X.isConstantStride () && Y.isConstantStride ()) {
auto X_lcl = X.template getLocalView<device_type> ();
auto Y_lcl = Y.template getLocalView<device_type> ();
auto X_lcl = X.template getLocalView<host_memory_space> ();
auto Y_lcl = Y.template getLocalView<host_memory_space> ();
KokkosSparse::trsv (uplo.c_str (), trans.c_str (), diag.c_str (),
A_lcl, Y_lcl, X_lcl);
}
else {
const size_t numVecs = std::min (X.getNumVectors (), Y.getNumVectors ());
const size_t numVecs =
std::min (X.getNumVectors (), Y.getNumVectors ());
for (size_t j = 0; j < numVecs; ++j) {
auto X_j = X.getVector (j);
auto Y_j = X.getVector (j);
auto X_lcl = X_j->template getLocalView<device_type> ();
auto Y_lcl = Y_j->template getLocalView<device_type> ();
auto X_lcl = X_j->template getLocalView<host_memory_space> ();
auto Y_lcl = Y_j->template getLocalView<host_memory_space> ();
KokkosSparse::trsv (uplo.c_str (), trans.c_str (),
diag.c_str (), A_lcl, Y_lcl, X_lcl);
}
}

X.template sync<dev_memory_space> ();
const_cast<RMV&> (Y).template sync<dev_memory_space> ();
}

/// \brief Return another CrsMatrix with the same entries, but
Expand Down

0 comments on commit ba53e48

Please sign in to comment.