Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kokkos::StaticCrsGraph does not propagate memory traits (e.g., Unmanaged) #1581

Closed
mhoemmen opened this issue Apr 23, 2018 · 14 comments
Closed
Assignees
Labels
Feature Request Create new capability; will potentially require voting
Milestone

Comments

@mhoemmen
Copy link
Contributor

@rppawlo has been working on a "View of StaticCrsGraph" pattern. It's been crashing unexpectedly. I investigated and noticed that Kokkos::StaticCrsGraph does not propagate memory traits (e.g., Unmanaged, which is what matters here) to its Views. The row_map_type, entries_type, and row_block_type typedefs only use the layout and device type.

We could fix this by using the traits typedef in StaticCrsGraph. The traits typedef takes into account the Arg1Type and Arg2Type template parameters of StaticCrsGraph.

@crtrott
Copy link
Member

crtrott commented Apr 23, 2018

Uhm, why do you need Unmanaged at compile time for this? The StaticCrsGraph could just have runtime unmanaged views couldn't it? From the top of my head I can't think of a reason why this should have any effect like crashing.

@ndellingwood
Copy link
Contributor

@mhoemmen An extra defaulted template parameter like Arg3=void may be necessary to inject and propogate MemoryTraits info into the StaticCrsGraph, it looks like Arg1Type is reserved for either a Device type or Layout type, and if a Layout type is given then Arg2Type is reserved for the Device type (if given). I can add this, is this trivial enough of a change to not effect compile times with an extra template parameter? Adding @crtrott

@mhoemmen
Copy link
Contributor Author

@crtrott wrote:

Uhm, why do you need Unmanaged at compile time for this?

I think what's happening is that @rppawlo is doing something like this:

using local_matrix_type = KokkosSparse::CrsMatrix<double, int, inner_device_type, Kokkos::MemoryUnmanaged, size_type>;
Kokkos::View<local_matrix_type**, outer_device_type> localMatrices ("matrices", M, N);
// ...
for (int i = 0; i < M; ++i) {
  for (int j = 0; j < N; ++j) {
    new (localMatrices(i,j)) local_matrix_type (getTpetraGlobalMatrix(i,j)->getLocalMatrix ());
  }
}

This invokes the KokkosSparse::CrsMatrix copy constructor, but since local_matrix_type "loses" the unmanaged property, the result still has ownership.

@mhoemmen
Copy link
Contributor Author

Roger could still work around by creating unowned Views manually, but the above behavior is surprising.

@rppawlo
Copy link
Contributor

rppawlo commented Apr 24, 2018

It also seems like quite a bit of extra work to do this. Mark's example is exactly what I am doing. Maybe I am wrong, but doesn't rebuilding the runtime unmanaged CrsMatrix require rebuilding all the views used in the graph, then rebuild the graph, then rebuild all views in the CrsMatrix to make sure everything is unmanaged? The code above jumps from a few lines to quite a lot.

@mhoemmen
Copy link
Contributor Author

@rppawlo That's correct, the above example would expand quite a bit. The sensible thing to do for now would be to write a "getUnmanagedLocalMatrix" function that encapsulates all that code.

@rppawlo
Copy link
Contributor

rppawlo commented Apr 24, 2018

Just to follow up, here's the difference in the code. If the unmanaged template parameter worked through all objects, the inner loop looks like this:

const auto crsMatrix = rcp_dynamic_cast<Tpetra::CrsMatrix<double,LO,GO,NodeT>>(thyraTpetraOperator->getTpetraOperator(),true);                                                                                                                         
new (&hostJacTpetraBlocks(row,col)) KokkosSparse::CrsMatrix<double,LO,PHX::Device,Kokkos::MemoryTraits<Kokkos::Unmanaged>> (crsMatrix->getLocalMatrix());                                                                                              

Instead, with the runtime interface the inner loop looks like:

// Grab the local managed matrix and graph                                                                                                                                                                                                              
const auto tpetraCrsMatrix = rcp_dynamic_cast<Tpetra::CrsMatrix<double,LO,GO,NodeT>>(thyraTpetraOperator->getTpetraOperator(),true);
const auto managedMatrix = tpetraCrsMatrix->getLocalMatrix();
const auto managedGraph = managedMatrix.graph;

// Create runtime unmanaged versions                                                                                                                                                                                                                    
using StaticCrsGraphType = typename LocalMatrixType::StaticCrsGraphType;
StaticCrsGraphType unmanagedGraph;
unmanagedGraph.entries = typename StaticCrsGraphType::entries_type(managedGraph.entries.data(),managedGraph.entries.extent(0));
unmanagedGraph.row_map = typename StaticCrsGraphType::row_map_type(managedGraph.row_map.data(),managedGraph.row_map.extent(0));
unmanagedGraph.row_block_offsets = typename StaticCrsGraphType::row_block_type(managedGraph.row_block_offsets.data(),managedGraph.row_block_offsets.extent(0));

typename LocalMatrixType::values_type unmanagedValues(managedMatrix.values.data(),managedMatrix.values.extent(0));
LocalMatrixType unmanagedMatrix(managedMatrix.values.label(), managedMatrix.numCols(), unmanagedValues, unmanagedGraph);
new (&hostJacTpetraBlocks(row,col)) LocalMatrixType(unmanagedMatrix);

@crtrott
Copy link
Member

crtrott commented Apr 24, 2018

I see. Nathan if you want to take this, give it a shot. It shouldn't be all that hard (I hope ;-) ).

@ndellingwood
Copy link
Contributor

I'll work on it, the KokkosKernels changes will need to be synced/follow the Kokkos changes though. It looks like the the StaticCrsGraph needs an extra template parameter for the memory traits, then the KokkosSparse::CrsMatrix can pass the info along properly.

@ndellingwood ndellingwood self-assigned this Apr 24, 2018
@ndellingwood ndellingwood added the Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) label Apr 24, 2018
@ndellingwood ndellingwood added this to the 2018 April milestone Apr 24, 2018
@crtrott crtrott added Feature Request Create new capability; will potentially require voting and removed Bug Broken / incorrect code; it could be Kokkos' responsibility, or others’ (e.g., Trilinos) labels Apr 24, 2018
rppawlo added a commit to rppawlo/kokkos that referenced this issue Apr 25, 2018
@rppawlo
Copy link
Contributor

rppawlo commented Apr 25, 2018

The use case above didn't work on cuda. Had to go through and mark a bunch of functions in the CrsMatrix and CrsGraph and sub-structs with KOKKOS_INLINE_FUNCTION.

PRs for kokkos:

#1587

PR for kokkos-kernels:

kokkos/kokkos-kernels#220

The kokkos-kernels PR will probably require the kokkos PR to be accepted first. If these are both accepted in the kokkos repos, can I push the corresponding changes to the current Trilinos snapshot?

crtrott added a commit that referenced this issue Apr 25, 2018
Changes required to build a view of CrsMatrices #1581
@ndellingwood
Copy link
Contributor

ndellingwood commented Apr 26, 2018

I put in PR #1595 with first go at adding support for unmanaged memory traits, KokkosKernels CrsMatrix and BlockCrsMatrix changes will follow once this issue is resolved.
@rppawlo, @mhoemmen can you take a look and see if this is on track for what you need?

Edit: Fixed the PR link.

@mhoemmen
Copy link
Contributor Author

@ndellingwood I reviewed #1595. Short answer: this will break backwards compatibility; I'm OK with that, but if it's a problem, we could use a variadic type alias (see review for details).

@ndellingwood
Copy link
Contributor

@rppawlo Kokkos PR #1595 to address this issue was merged, I'm adding this comment to point out that the order of the memory traits template parameter Arg3Type follows the SizeType for builds that enable deprecated code (currently the default in Kokkos until the next promotion) to maintain backward compatibility, but the order will change to match the View and KokkosKernels CrsMatrix API when deprecated code is disabled.

@rppawlo
Copy link
Contributor

rppawlo commented May 4, 2018

Thanks @ndellingwood ! I will change over on the next snapshot into Trilinos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Create new capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

4 participants