Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DualView.template view() better matches for Devices in UVMSpace cases #3857

Merged

Conversation

DavidPoliakoff
Copy link
Contributor

@DavidPoliakoff DavidPoliakoff commented Mar 17, 2021

Attempt to resolve: #3850 . My description is basically mimicking that, I recommend reading the bug there (h/t @brian-kelley )

In the past, we looked at whether a passed Device's memory space matched the memory space of t_dev (the device View), and returned something of the t_dev type if it did. This breaks in the case of

my_dual_view.view<Kokkos::Device<Kokkos::Serial, Kokkos::UVMSpace>()

The device's memory space matches t_dev's memory space, so we merrily return a View whose device is <Cuda,UVMSpace>. My PR adds checks for exact Kokkos::Device matches, and returns the appropriate one if a Device is passed. More info later if you need it, but that's what this does

Emphasizing this comment as loudly as I can: a Kokkos-side reviewer should think about whether this is an acceptable change. I think every change we've made here makes things more correct. But they are changes, and if somebody relied on our broken behavior, they might see differences and not be happy about them

edit: additional fix (necessary for tests to pass): modify functions now do nothing when there is only one device (note: not only one view)

@brian-kelley
Copy link
Contributor

@DavidPoliakoff Can you try building my #3850 reproducer/example with this PR? I am getting a segfault with it at the dv.view<h_device>() that doesn't happen if I do dv.view_host(). It doesn't make sense because the backtrace looks like this:

#0  0x0000000000409dc3 in SharedAllocationTracker (enable_tracking=<optimized out>, rhs=..., this=0x7fffffffd6e0)
    at /home/bmkelle/KK_Clean/build/kokkos-install/include/impl/Kokkos_SharedAlloc.hpp:524
#1  Kokkos::Impl::ViewTracker<Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace> > >::ViewTracker (
    this=0x7fffffffd6e0, vt=...) at /home/bmkelle/KK_Clean/build/kokkos-install/include/impl/Kokkos_ViewTracker.hpp:77
#2  0x0000000000409581 in Kokkos::View<double*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Serial, Kokkos::CudaUVMSpace> >::View (this=0x7fffffffd6e0)
    at /home/bmkelle/KK_Clean/build/kokkos-install/include/Kokkos_View.hpp:1426
#3  0x0000000000406fde in main () at /ascldap/users/bmkelle/KK_Clean/kokkos-kernels/perf_test/sparse/bmk.cpp:52

bmk.cpp:52 is auto v_h = dv.template view<h_device>(); where h_device is Device<Serial, CudaUVMSpace>. Previously, this is the line that had the wrong behavior, but it didn't segfault.

I'm also wondering if it's possible to make this work with execution spaces, where the execution space isn't an exact device match. For this example, dv.template modify<Exec> and dv.template sync<Exec> work for Exec = Cuda or Serial. In this PR, both of those are returning the 'host' view:

Type of dv.view<Serial>: N6Kokkos4ViewIPdJNS_10LayoutLeftENS_6DeviceINS_6SerialENS_12CudaUVMSpaceEEEEEE
Type of dv.view<Cuda>: N6Kokkos4ViewIPdJNS_10LayoutLeftENS_6DeviceINS_6SerialENS_12CudaUVMSpaceEEEEEE

@DavidPoliakoff
Copy link
Contributor Author

@brian-kelley : oof. Back to the drawing board, I think. I think the actual logic we need is

if this thing is a Kokkos::Device
  if it matches the device
    return the device's view
if it's an exec space
  if the exec space's memory space matches
    return the memory space's view
if it's a memory space
  return that memory space's view

Thanks for letting me know, I'll integrate your example

@DavidPoliakoff
Copy link
Contributor Author

Basically there's a lot of squishiness, where memory space and execution space both define a "device" type trait, and devices and memory spaces both define a "memory space" type trait. I'll keep at it though

@brian-kelley
Copy link
Contributor

@DavidPoliakoff I agree, some of these choices will be kind of arbitrary. I don't care too much which view actually gets returned in these cases, I just want the three templated functions to agree about each type (whether it means device or host).

@DavidPoliakoff
Copy link
Contributor Author

@brian-kelley I figured out the segfault. For some reason the view template function returns a reference, that reference is actually to a local (due to a weirdness of if_c?), so it's completely broken. Making the template view return the View by value fixes this. I'll work on the other bugs

@DavidPoliakoff
Copy link
Contributor Author

@brian-kelley : okay, I have a candidate fix started. Note that I turned your repro into a test, and put it in TestDefaultDeviceDevelop.cpp. I'm game to talk over fixes, but you can also just push breaking tests into my test, if this doesn't do what you need. I'll polish it off for HIP and other backends later, but if we can get these tests passing with CUDA, I'll add it to our actual set of tests

@brian-kelley
Copy link
Contributor

@DavidPoliakoff The test looks good and complete, there are just 2 small patches I would add:

  • checks that need_sync_host() and need_sync_device() are both 0 after the "modify-device, sync-host" and "modify-host, sync-device" patterns.
  • Define UVM_ENABLED_BUILD when Cuda is enabled, not just when UVM is Cuda's default memspace. Device<Cuda,CudaUVMSpace> should work with DualView the same way whether force_uvm is set or not.

I emailed you the two patches since I can't push to your fork. With those, the test builds and passes for me in Cuda+Serial+force_uvm, Cuda+OpenMP+force_uvm, and Cuda+Serial.

@DavidPoliakoff
Copy link
Contributor Author

@brian-kelley , oh, I misunderstood what ENABLE_UVM does. I thought you couldn't use UVM without it. Good catch

@DavidPoliakoff
Copy link
Contributor Author

Also, a Kokkos-side reviewer should think about whether this is an acceptable change. I think every change we've made here makes things more correct. But they are changes, and if somebody relied on our broken behavior, they might see differences and not be happy about them

@brian-kelley
Copy link
Contributor

@brian-kelley , oh, I misunderstood what ENABLE_UVM does. I thought you couldn't use UVM without it. Good catch

We had the same confusion about that in tpetra.

@DavidPoliakoff DavidPoliakoff added the Blocks Promotion Overview issue for release-blocking bugs label Mar 18, 2021
@DavidPoliakoff
Copy link
Contributor Author

Retest this please

@crtrott crtrott merged commit d025999 into kokkos:develop Mar 22, 2021
kddevin added a commit to trilinos/Trilinos that referenced this pull request Apr 16, 2021
kddevin added a commit to trilinos/Trilinos that referenced this pull request Apr 27, 2021
…ement (#8821)

* Tpetra: add new user-friendly MV view access

Also add new "owningView_" DualView member that refers to
the actual original DV (not a subview of anything else). This
is the DualView to sync in order to maintain consistency regardless
of how MultiVectors alias each other.

4 new view accessor functions: getLocalView[Host|Device][Non]Const()

- Respect constness
- Manage syncs and modifies for the user
- Prevent taking out a view in one space while any view in the other
space is live.
- Existing getLocalView()/getLocalViewHost()/getLocalViewDevice() just
have the reference count checking added (no sync/modify). This has no
effect for HostSpace or CudaUVMSpace since those host mirrors match the
device views.

* Tpetra - fix MV test 14.

* Tpetra - fix item 17

* Tpetra - fix item 20

* Tpetra - fix item 23

* Tpetra - fix item 28

* Tpetra - fix item 29

* Tpetra - fix item 35

* Tpetra - workaround for item 30

* Tpetra: Modifying Bug7758 test to use the new getLocalViewHostConst (which will make sure things are actually sync'd)

* Tpetra: fix MV [un]pack to respect host/device refcounts

* fix nonconst in Bug7745

* Tpetra: stashing

* Tpetra - issue 354 fix

* Tpetra: refactor sameObject so it doesn't simultaneously ask for host and device views

* Tpetra: remove static_assert, fix getLocalView() ret type

Remove bad static_assert that tripped for Cuda/CudaUVMSpace build.
Correct MultiVector::getLocalView() return type to be exactly consistent with
DualView::view().

* tpetra:  fixed error in MultiVector pack that caused failures with UVM=ON

* tpetra:  Fix for FEMultivector -- rather than take the subview of a
DualView and create a new vector with it, use the MultiVector
constructor that gets "offset" views of a vector (in which
@brian-kelley has the owningView_ working correctly).
While I was at it, I added a swap of the owningView_ to the MultiVector
swap() function.

* Tpetra: Fixing ImportExport/Issue3968:  The tests uses sync_to* without changing the modify flags, which mucks up our internal tracking

* tpetra:  fix to work without UVM

* tpetra:  changed getLocalViewHost/Device to new Const/NonConst versions
as appropriate. #8591
Did not change getLocalView as the Const/NonConst versions of
getLocalView do not exist yet
Did not change MV_reduce_strided to avoid creating conflicts for
@brian-kelley

* tpetra: change getLocalViewHost to appropriate Const/NonConst version #8591

* Tpetra: Modifying MultiVector to remove all references to old getLocalViewX functions

* Tpetra: More getLocalView mods

* Tpetra: Lots and lots of fixes to tests to use the new getLocalView<thing>Const/NonConst functions

* Tpetra: Fixing scaleBlockDiagonal signature as per Brian

* Tpetra: Fixes to the BlockView test to work correctly with UVM=OFF

* Tpetra: Fixing MultiVector print outs for help with non-unified memory debugging

* Tpetra - missing getlocal view "device"

* Tpetra: public Access:: ReadOnly/ReadWrite/WriteOnly

Make WithLocalAccess use these tags instead of internal Details:: ones.
These will also be used for the new MultiVector view access interface.

* moving from getLocalView... to getLocalView...(Tpetra::Accesspattern)

* Tpetra - get1dview logic change

* Tpetra, WIP: using new tagged view access

* Tpetra: use new interface for all MV getLocalView

* tpetra:  removed unneeded include file

* Tpetra: Tags!

* Tpetra: Tags!

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* tpetra: copied implementation of getLocalViewHost and getLocalViewDevice
from templated getLocalView, as the getLocalView version does not work.
This commit may be temporary, but it allows us to make progress on other
bugs while someone figures out the template-fu.
Sorry for the debugging statements; we'll get rid of those eventually.

* adding localview tests

* tpetra:  getLocalView<template> now works.
cleaned up my obnoxious print statements
kept Host and Device implementations that do NOT use getLocalView.

* tpetra:  added Tpetra::Access to many getLocalView<> instances
Tests still pass with UVM=ON.

* Tpetra: Removing the dreaded parantheses from the Access tags

* Manually intercept UVM allocations, throw exception

Effectively makes it impossible for any UVM allocations to
exist (except for Stokhos, which calls cudaMallocManaged directly)

* Tpetra: Deprecate old getLocalView functions

* Allow UVM allocations when Kokkos_ENABLE_CUDA_UVM=ON

* tpetra:  changed getLocalView to use access tags and getLocalViewDevice

* tpetra:  added access tags to getLocalView(); fixed scope of some pointers

* xpetra:  fixes to allow compilation

* WIP: deprecate getLocalBlock and start adding tagged overloads

* Tpetra: rewrite allReduceView to work with non-UVM

allReduceView had one bug and one sub-optimal thing:
- Tried to make a view copy with both layout and device different -
  Kokkos can't do that in a single deep_copy
- If a LayoutStride -> contiguous copy needed to be made, it always used
  LayoutLeft. If one of the input/output views was LayoutStride and the
  other was LayoutRight, they would both be copied to LayoutLeft. Now, use
  LayoutRight in this case.

Some utilities to help manage layouts and MPI + Kokkos views in general
are in the new file temporaryViewUtils.hpp: layout unification,
making a contiguous view, and making an MPI-safe view.
In the future these can be used to clean up idot and
iallreduce without losing efficiency.

* Tpetra:  Block MultiVector correctly uses getLocalView; removed stored pointer

* fix host device type for const_little_host_vec_type

* tpetra:  clean up of BlockMultiVector fixes

* Tpetra:  deprecated held pointer mvData_

* tpetra:  removed modifies without syncs; fixed MueLu tests

* Tpetra - removing sync in ScaleAndAssign test

* Tpetra - unit test is okay without modify and sync flags

* Tpetra - test passes without modify and sync operations

* Tpetra - remove unnecessary sync modify clear state flags

* Tpetra - remove multi vector sync/modify/ things

* Tpetra - remove sync modify things in other places

* Tpetra: remove withLocalAccess, for_each, transform

The new MV::getLocalView interface is a simpler substitute for these.

* Issue 8391. Switched to C++17 standard for GCC 8.3 build.

* FROSch: Convert enum NullSpaceType to scoped enum

By converting the enum to an enum class NullSpaceType, one is forced to
use the enum class and cannot replace it with integers anymore. This
guarantees, that the expressive enum class is used in implementations
rather than the implicitly encoded integers.

* Patch in KokkosKernels #872

(fix #8727, TeamPolicy team size too large in sort_crs_*)
Adds the KokkosKernels unit test that replicated this issue.

* MueLu: Adding Aggregate size percentiles to AggregateQuality

* Moved Tpetra CRS GS into Ifpack2 Relaxation

* Moved BlockCrs GS functionality into Relaxation

* Enabled new local GS code for CRS

* Reduce redundant code in CRS (GS/SGS use same fn)

* Using refactored block CRS local apply, unify GS/SGS

* More refactoring to get rid of redundant functions

* Added required syncs/modifies for vectors

* Removed unneeded !constantStride paths

* Use cached MV to replace getColumnMapMV from CrsMatrix

* Ifpack2: remove unneeded includes

* Ifpack2: undo some find-and-replace in comments

Undoing some "Node" -> "node_type"

* MueLu: undo CMake change, should be its own PR

* MueLu: in configure, print out missing ETI setting

During configure, MueLu prints out the type combinations to ETI.
Add <complex, int, long long> to this, since it was missing.

* tpetra:  treat WriteOnly of subviews as ReadOnly.

* Ifpack2: in RBILUK, use tagged BMV::getLocalBlock

* Tpetra: add comment with caveat

on BMV::getLocalBlock(i, j, WriteOnly)

* tpetra: separated BugTests.cpp into separate test files so that we can
disable them separately (since they exercise different classes).

* Ifpack2: update BMV getLocalBlock calls

to use tagged access, and not use manual sync/modify (which has been
removed). With UVM, all Tpetra,Belos,Ifpack2,MueLu tests pass.

* more test changes

* mv localview tests

* wrapped up 6 tests for new behaviors

* tpetra:  scoping fix for Bug7234.cpp;
more output from getLocalView* when error occurs, as in parallel runs,
throw messages weren't always printed (e.g., from doExport when only
3/4 processors failed)

* Tpetra: add MV::aliases(const MV& other)

This allows a user to see if two MVs overlap, without actually getting
the local views and possibly hitting the reference count checker.

* Ifpack2: const correctness, use new getLocalView

- Throughout Ifpack2, remove manual sync/modify and calls to deprecated
  getLocalView. Use tagged getLocalView instead.
- In BlockRelaxation and the Containers, change interfaces to use const
  on views and multivectors that aren't actually modified

* Tpetra: fix one MV LocalView test, comment out another

We will make sure fix is OK, then uncomment and fix the other

* tpetra:  enable some Tpetra tests without UVM

* tpetra:  fix test for non-Cuda builds

* Ifpack2: fix more constness of apply vectors

* Kokkos: allow CudaUVMSpace::allocate again

Roll back change that made CudaUVMSpace::allocate throw
when UVM was not the default memory space for Cuda.

* tpetra:  changes needed to build with DEPRECATED_CODE=OFF #8821

* fix remaining test

* Tpetra - fix for nox failure

* Thyra: added missing fences to euclidean apply operations used
in MvTimesMatAddMv; the fences resolve test failures with
CUDA_LAUNCH_BLOCKING=0 and cleaner sync/modify in tpetra @rppawlo

Tpetra: the fences above provide a more surgical fix to the test
errors seen in #8821; this commit removes fences from
getLocalView*(ReadOnly).  @kyungjoo-kim

Belos: preventive fence added with @hkthorn's blessing
to mimic those in Thyra.

* tpetra: added fence between device kernels and retrieving blocks on host #8821

* Ifpack2: Minor fix

* DualView: make fencing behavior in sync consistent

sync<Device>() does extra exec space fences if the dev/host memory
spaces are the same. This was missing in sync_host/sync_device, so
this adds it there. Makes all Ifpack2 tests for UVM without launch
blocking.

* tpetra:  exercise the Teuchos-based interfaces, too

* changed access control from WriteOnly to OverwriteAll because semantics mean things

* WIP: fixing idot for MV dualview refactor

And some udpates to ifpack2 and amesos2 about that.
Working around Kokkos issue #3850 where the templated getLocalView was
used.

* WIP: idot/iallreduce cleanup

* Tpetra: finish idot/iallreduce refactor

* Fixed iallreduce test for non-uvm device

* Belos: use new Tpetra MV view interface

* Cleanup

* Remove extra dualview sync fences

* Ifpack2 passes without launch blocking

except RBILUK.

* Ifpack2: add temporary fence in RBILUK for BlockCrs

Later it should be possible to replace this fence with a refactored
DualView interface to BlockCrs.

* Tpetra: add a global reduce to a test so it will fail when only one proc is failing

* Tpetra: fix some typos in a Map unit test

* Tpetra: remove deprecated sync/modify calls from a unit test

* Ifpack2: fix impl_scalar/scalar mismatch

* Tpetra: remove/update remaining mentions of Gauss-Seidel

* Tpetra: fix iallreduce for builds without MPI

* Ifpack2: revert commenting out try/catch

Was causing unused var warning

* Ifpack2: Fixing vector mode mistake

* tpetra, ifpack2:  fixing several access mode errors

* Tpetra: use new MV view interface in Bug8794 test

* Amesos2: revert using tagged Tpetra MV getLocalView

for some reason, using ReadOnly tag to access MV view in
TpetraMultivecAdapter caused solve solution to not get copied back to
the Tpetra multivector. This is surprising because the views were just
used as the source for a Kokkos deep copy, and this caused
BlockRelaxation in Ifpack2 to fail for serial node (in which DualViews
are trivial, and all kernels are synchronous)

* Ifpack2: add back tag clobbered by merge

* kokkos:  patch from kokkos/kokkos#3857

* comment out all the instances of TPETRA_DEPRECATED (#9023)

* MueLu: add fence for recent intrepid2 changes

Fixes MueLu-Intrepid2 unit tests, uvm, no launch blocking.

* Tpetra: restore MV_reduce_strided test.

Key: use the MV (map, dualview, orig_dualview) constructor instead of the
(map, dualview) constructor. If $dualview is noncontiguous, the first one
lets you pass orig_dualview as the contiguous super-view containing
dualview, and orig_dualview can be sync'd without problems.

Also modify TempView::toLayout() to test span_is_contiguous, rather than
assuming that (Layout != LayoutStride) implies contiguous.

* tpetra:  Removed deprecated sync_device calls

* Tpetra: Remove some MultiVector that were checking modification state (#9032)

* Tpetra: Deprecate need_sync* in MultiVector

* Tpetra: for now, we won't deprecate need_sync_host/device

* tpetra:  removed instantiations of removed tests

* Tpetra: don't use CudaSpace in nonblocking collectives

OpenMPI does not support Cuda device buffers for nonblocking collectives
like MPI_Iallreduce, even with a Cuda-aware installation.

* Fix old typo in Ifpack2_UnitTestBlockRelaxation

* Fix access tag: OverwriteAll -> ReadWrite

Tpetra::COPY takes src then dst (opposite order to Kokkos deep_copy) so Y_cur is being read at first and written later.

* Undo bad DualView merge

Co-authored-by: Brian Kelley <bmkelle@sandia.gov>
Co-authored-by: Kyungjoo Kim <kyukim@sandia.gov>
Co-authored-by: Chris Siefert <csiefer@sandia.gov>
Co-authored-by: Geoff Danielson <gcdanie@sandia.gov>
Co-authored-by: Timothy A. Smith <tasmit@sandia.gov>
Co-authored-by: James M. Willenbring <jmwille@sandia.gov>
Co-authored-by: Matthias Mayr <matthias.mayr@unibw.de>
Co-authored-by: Timothy Smith <58484958+tasmith4@users.noreply.github.com>
kddevin added a commit to trilinos/Trilinos that referenced this pull request May 4, 2021
* Tempus: Remove ParameterList from IntegratorBasic

Remove all the internal uses of ParameterList from IntegratorBasic.
This means moving the variables in the IntegratorBasic ParameterList
to member data. Integrator will not longer inherit from
Teuchos::ParameterListAcceptor. However IntegratorBasic can still
be built from a ParameterList, and will still provide a valid
ParameterList.

 * To break up these changes, created a copy of IntegratorBasic
   (i.e., IntegratorBasicOld) for the sensitivity analysis integrators
    - IntegratorAdjointSensitivity
    - IntegratorForwardSensitivity
    - IntegratorPseudoTransientAdjointSensitivity
    - IntegratorPseudoTransientForwardSensitivity
   so these can be upgraded in another PR.
 * IntegratorBasic is no longer inherited from ParameterListAcceptor.
    - Removed setParameterList.
    - IntegratorBasic constructors using ParameterLists have moved to
      nonmember constructors, e.g.,
      . integratorBasic(pl, model) --> createIntegratorBasic(pl, model)
      . IntegratorBasic(pl, model) --> createIntegratorBasic(pl, model)
    - Member data ParameterLists are removed.
    - Kept getValidParameters(), which now returns a ParameterList
      with the current values. Still matches ParameterListAcceptor
      signature.
 * Ensured that ParameterList names were correctly set, so
   getValidParameters() could be used to create nested ParameterLists,
   e.g., IntegratorBasic->Stepper->Solver.
 * Made getValidParametersBasic() a member functions of Stepper class.
 * Simplified setting the Stepper to just setStepper(stepper).
 * Added method to set model on stepper in IntegratorBasic,
   i.e., setModel(model).
 * The integrator observer is no longer a composite observer.
   It is simply a base class observer.
 * All internal IntegratorBasic references to member ParameterLists
   are changed to member data.
 * Added member data for the integrator name and type.  Name is a
   label that used for identification, e.g., 'My Integrator Basic'.
   Type defines the derived class being used, e.g., 'Integrator Basic'.
 * Added a shallow copy for the SolutionHistory.

* Tempus: Remove ParameterList from Internals of IntegratorBasic.

 * Changed Piro and ROL to use IntegratorBasicOld.  Will move to
   IntegratorBasic in future PR.
 * Added documentation on StepperName and StepperType to help
   distinguish between them.
 * setStepperType() is now a protected function of the Stepper
   class, which should help distinguish it against StepperName.
 * IMEX_RK and IMEX_RK_Partition now requires the stepperType
   in the default constructor to completely build it.  These
   should be changed to have a base class IMEX_RK and derived
   classes for each stepperType (similarly for IMEX_RK_Partition).
 * Fixed several misuses of stepperName and stepperType in source
   code and in unit tests.
 * Fixed some usage of Stepper aliases.

* Tpetra: add backup of scripts for perf testing

eclipse, vortex and stria env/build scripts.
These are used on SRN Jenkins and Watchr.

* cherry pick Kokkos-kernels PR #921
 Two-stage GS: add damping factors #921

* expose new options for two-stage GS from Ifpack2

* describe the two-stage parameters more in comments

* MueLu: Enable reuse of Ifpack2 smoothers

* Add openmpi 4.0.5 toolchain for VAN1

User Support Ticket(s) or Story Referenced: SPAR-969

* Add ctest drivers for new toolchains

* Correct an ordering issue and add tests

* ATDM/van1-tx2: Disable build stats

* re-basing muelu gold files with new two-stage gs parameters

* Replace VerifyExecutionCanAccessMemorySpace usage

Teuchos, Tpetra, Sacado, Stokhos: replace usage of deprecated
VerifyExecutionCanAccessMemorySpace with SpaceAccessibility for
compatibility with Kokkos.
See kokkos/kokkos#3813 for relevant changes.

* blake atdm environment: update cmake to 3.19.3

* Ifpack2 Hypre: Fix link errors when multiple node types are used

* Fixes mismatched new/delete and memory leaks in reused solver objects

Integrating the templated Basker solver directly into Xyce to perform
custom linear solves for harmonic balance (HB) analysis, some memory
issues were noticed.  First there is a mismatch in the new and delete
used to create and destroy the pinv object, respectively.  If the same
solver object is reused, the L and U factors are leaked.  Also some
internal workspace is not properly cleaned up.  Due to clarity the
internal workspace objects have been refactored and the L and U
factors are deleted if the solver object is called to perform a
numeric factorization when one already exists.

This addresses the memory issues that have been observed through
valgrind.

* casting inner-damping factor to complex only if scalar-type is complex.

* MueLu CreateOperator test: Set verbosity so that default values are ignored

* MueLu: Update gold files

* Revert "re-basing muelu gold files with new two-stage gs parameters"

This reverts commit feea4d8.

* trilinos_couplings: Replace VerifyExecutionCanAccessMemorySpace

replace usage of deprecated VerifyExecutionCanAccessMemorySpace
with SpaceAccessibility for compatibility with Kokkos.
See kokkos/kokkos#3813 for relevant changes.

* SEACAS: Fix warnings and memory leaks

Automatic snapshot commit from seacas at 29acd7f151

Origin repo remote tracking branch: 'origin/master'
Origin repo remote repo URL: 'origin = https://github.com/gsjaardema/seacas'

At commit:

commit 29acd7f1510bf729084274fe0cead3ef5e815dd8
Author:  Greg Sjaardema <gdsjaar@sandia.gov>
Date:    Wed Apr 14 10:29:28 2021 -0600
Summary: sync back sierra-build changes [ci skip]

    EXODIFF: Eliminate edge/face block memory leaks
    PLT: Support for flang/f18
    APREPRO: Better support for array memory leak management
    Fix compilation warnings on nvidia
    IOSS: Eliminate long compile when sanitizer enabled
    IOSS: Eliminate compiler warning
    APREPRO: Eliminate array memory leaks

* Ctest: Fixing emailer for ascicgpu031

* Automatic snapshot commit from tribits at 18aed92

Origin repo remote tracking branch: 'github/master'
Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git'

At commit:

commit 18aed92550ebc8e8e0f7da2a5d38cd4eaa192e1f
Author:  Roscoe A. Bartlett <rabartl@sandia.gov>
Date:    Thu Apr 15 09:45:52 2021 -0600
Summary: Allow TRIBITS_ADD_ADVANCED_TEST_MAX_NUM_TEST_BLOCKS to be changed (#136)

* Tempus: Replace Logical Operators

The ROL team has received requests from Windows users to remove the
alternative logical operators 'and' and 'or' in favor of && and ||.
The C++ standard includes 'and' and 'or', of course. However, the
MS compiler only supports them in the -permissive mode, which
apparently isn't allowed in many companies, due to the quality
control niceties that the non-permissive (standard) mode provides.

Trilinos is not supporting Windows and do not have even platforms
to test on. This change should be straight forward, but there is
not any mechanism to prevent regression. Microsoft needs to support
the c++ standard!

Not sure all instances were found and there is no compiler flag
to throw on this.

* Galeri Xpetra: Add anisotrpic diffusion problem

* MueLu: fix agg export for multiple dofs per node

* muelu:  changes needed to handle Kokkos::complex in Cuda builds

* Turn off 3 Zoltan tests that fail  due to a bug in spectrummpi

See #8798 for the discussion

* Geminga CUDA nightly: Enable complex

* Add short reason for the disablement and note the issue for more details

* Galeri: Fix issue in boundary conditions

* MueLu RefMaxwell: Pass corrected nullspace to coarse (1,1) hierarchy

* Xpetra: Add EpetraInverseOperator

Its 'apply' calls 'ApplyInverse' instead of 'Apply'

* TrilinosCouplings: Fix scaling of CurlCurl in Maxwell example

* tpetra:  adding test of branch Tpetra_UVM_Removal for SAKE

* STK: Snapshot 04-21-21 11:17 (#9039)

* add a few more timers

* make default for GmresSingleReduce as single-reduce MGS and no Newton basis

* make "delayed normalization" default for single-reduce MGS

* Intrepid2: fix 8801 (second attempt) (#9044)

* Intrepid2: resolve uninitialized variable warnings.

* Intrepid2: move lambda implementation into a functor to work around apparent CUDA compiler bug.  PR #9044, fix for #8801 (second attempt).

* When compiling with IBM Clang 11 + Cuda, Zoltan2 MJ captures 'this' inside lambdas

This patch
* Marks 1 function static, which was class const (but used not class variables).
* addresses this-> being used inside a nested team lambda (which clang doesn't like)
  To remove this, I moved sEpsilon as parameter to the function being called and
  was able to mark the function static as well (removing its use of this->sEpsilon)
  This entails creating a funciton-locally copy of sEpsilon so that it may be
  captured by the default [=] capture.

Both changes entail adding a `using` statement within the function so that the
now static class functions can be called (e.g., AlgMJ< ... >::

* Snapshot of kokkos.git from commit 04b8196e0e3bfc4cee4047dbbbb13fc227730fe8

From repository at git@github.com:kokkos/kokkos.git

At commit:
commit 04b8196e0e3bfc4cee4047dbbbb13fc227730fe8
Merge: 1fb0c284 ffc35a82
Author: Nathan Ellingwood <ndellin@sandia.gov>
Date:   Mon Apr 26 00:14:56 2021 -0600

    Merge branch 'release-candidate-3.4.0' for 3.4.00

    Part of Kokkos C++ Performance Portability Programming EcoSystem 3.4

* Snapshot of kokkos-kernels.git from commit 3eb6a9298b58f224b876b6e29cda4491cddc53c5

From repository at git@github.com:kokkos/kokkos-kernels.git

At commit:
commit 3eb6a9298b58f224b876b6e29cda4491cddc53c5
Merge: fe439b21 dd0d4ef8
Author: Nathan Ellingwood <ndellin@sandia.gov>
Date:   Mon Apr 26 00:16:08 2021 -0600

    Merge branch 'release-candidate-3.4.0' for 3.4.00

    Part of Kokkos C++ Performance Portability Programming EcoSystem 3.4

* Turn on Intrepid2 per #8310

* MueLu: Add test for aggregation export

* Ifpack2 Relaxation&Chebyshev: Use offsets in more cases

* Framework: update messaging on issue autocloser bot

* Framework: Update autocloser throttle limit to 70

* MueLu: removed unused lines from agg export test

* Tpetra MultiVector and BlockMultiVector refactor to remove UVM requirement (#8821)

* Tpetra: add new user-friendly MV view access

Also add new "owningView_" DualView member that refers to
the actual original DV (not a subview of anything else). This
is the DualView to sync in order to maintain consistency regardless
of how MultiVectors alias each other.

4 new view accessor functions: getLocalView[Host|Device][Non]Const()

- Respect constness
- Manage syncs and modifies for the user
- Prevent taking out a view in one space while any view in the other
space is live.
- Existing getLocalView()/getLocalViewHost()/getLocalViewDevice() just
have the reference count checking added (no sync/modify). This has no
effect for HostSpace or CudaUVMSpace since those host mirrors match the
device views.

* Tpetra - fix MV test 14.

* Tpetra - fix item 17

* Tpetra - fix item 20

* Tpetra - fix item 23

* Tpetra - fix item 28

* Tpetra - fix item 29

* Tpetra - fix item 35

* Tpetra - workaround for item 30

* Tpetra: Modifying Bug7758 test to use the new getLocalViewHostConst (which will make sure things are actually sync'd)

* Tpetra: fix MV [un]pack to respect host/device refcounts

* fix nonconst in Bug7745

* Tpetra: stashing

* Tpetra - issue 354 fix

* Tpetra: refactor sameObject so it doesn't simultaneously ask for host and device views

* Tpetra: remove static_assert, fix getLocalView() ret type

Remove bad static_assert that tripped for Cuda/CudaUVMSpace build.
Correct MultiVector::getLocalView() return type to be exactly consistent with
DualView::view().

* tpetra:  fixed error in MultiVector pack that caused failures with UVM=ON

* tpetra:  Fix for FEMultivector -- rather than take the subview of a
DualView and create a new vector with it, use the MultiVector
constructor that gets "offset" views of a vector (in which
@brian-kelley has the owningView_ working correctly).
While I was at it, I added a swap of the owningView_ to the MultiVector
swap() function.

* Tpetra: Fixing ImportExport/Issue3968:  The tests uses sync_to* without changing the modify flags, which mucks up our internal tracking

* tpetra:  fix to work without UVM

* tpetra:  changed getLocalViewHost/Device to new Const/NonConst versions
as appropriate. #8591
Did not change getLocalView as the Const/NonConst versions of
getLocalView do not exist yet
Did not change MV_reduce_strided to avoid creating conflicts for
@brian-kelley

* tpetra: change getLocalViewHost to appropriate Const/NonConst version #8591

* Tpetra: Modifying MultiVector to remove all references to old getLocalViewX functions

* Tpetra: More getLocalView mods

* Tpetra: Lots and lots of fixes to tests to use the new getLocalView<thing>Const/NonConst functions

* Tpetra: Fixing scaleBlockDiagonal signature as per Brian

* Tpetra: Fixes to the BlockView test to work correctly with UVM=OFF

* Tpetra: Fixing MultiVector print outs for help with non-unified memory debugging

* Tpetra - missing getlocal view "device"

* Tpetra: public Access:: ReadOnly/ReadWrite/WriteOnly

Make WithLocalAccess use these tags instead of internal Details:: ones.
These will also be used for the new MultiVector view access interface.

* moving from getLocalView... to getLocalView...(Tpetra::Accesspattern)

* Tpetra - get1dview logic change

* Tpetra, WIP: using new tagged view access

* Tpetra: use new interface for all MV getLocalView

* tpetra:  removed unneeded include file

* Tpetra: Tags!

* Tpetra: Tags!

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* tpetra: copied implementation of getLocalViewHost and getLocalViewDevice
from templated getLocalView, as the getLocalView version does not work.
This commit may be temporary, but it allows us to make progress on other
bugs while someone figures out the template-fu.
Sorry for the debugging statements; we'll get rid of those eventually.

* adding localview tests

* tpetra:  getLocalView<template> now works.
cleaned up my obnoxious print statements
kept Host and Device implementations that do NOT use getLocalView.

* tpetra:  added Tpetra::Access to many getLocalView<> instances
Tests still pass with UVM=ON.

* Tpetra: Removing the dreaded parantheses from the Access tags

* Manually intercept UVM allocations, throw exception

Effectively makes it impossible for any UVM allocations to
exist (except for Stokhos, which calls cudaMallocManaged directly)

* Tpetra: Deprecate old getLocalView functions

* Allow UVM allocations when Kokkos_ENABLE_CUDA_UVM=ON

* tpetra:  changed getLocalView to use access tags and getLocalViewDevice

* tpetra:  added access tags to getLocalView(); fixed scope of some pointers

* xpetra:  fixes to allow compilation

* WIP: deprecate getLocalBlock and start adding tagged overloads

* Tpetra: rewrite allReduceView to work with non-UVM

allReduceView had one bug and one sub-optimal thing:
- Tried to make a view copy with both layout and device different -
  Kokkos can't do that in a single deep_copy
- If a LayoutStride -> contiguous copy needed to be made, it always used
  LayoutLeft. If one of the input/output views was LayoutStride and the
  other was LayoutRight, they would both be copied to LayoutLeft. Now, use
  LayoutRight in this case.

Some utilities to help manage layouts and MPI + Kokkos views in general
are in the new file temporaryViewUtils.hpp: layout unification,
making a contiguous view, and making an MPI-safe view.
In the future these can be used to clean up idot and
iallreduce without losing efficiency.

* Tpetra:  Block MultiVector correctly uses getLocalView; removed stored pointer

* fix host device type for const_little_host_vec_type

* tpetra:  clean up of BlockMultiVector fixes

* Tpetra:  deprecated held pointer mvData_

* tpetra:  removed modifies without syncs; fixed MueLu tests

* Tpetra - removing sync in ScaleAndAssign test

* Tpetra - unit test is okay without modify and sync flags

* Tpetra - test passes without modify and sync operations

* Tpetra - remove unnecessary sync modify clear state flags

* Tpetra - remove multi vector sync/modify/ things

* Tpetra - remove sync modify things in other places

* Tpetra: remove withLocalAccess, for_each, transform

The new MV::getLocalView interface is a simpler substitute for these.

* Issue 8391. Switched to C++17 standard for GCC 8.3 build.

* FROSch: Convert enum NullSpaceType to scoped enum

By converting the enum to an enum class NullSpaceType, one is forced to
use the enum class and cannot replace it with integers anymore. This
guarantees, that the expressive enum class is used in implementations
rather than the implicitly encoded integers.

* Patch in KokkosKernels #872

(fix #8727, TeamPolicy team size too large in sort_crs_*)
Adds the KokkosKernels unit test that replicated this issue.

* MueLu: Adding Aggregate size percentiles to AggregateQuality

* Moved Tpetra CRS GS into Ifpack2 Relaxation

* Moved BlockCrs GS functionality into Relaxation

* Enabled new local GS code for CRS

* Reduce redundant code in CRS (GS/SGS use same fn)

* Using refactored block CRS local apply, unify GS/SGS

* More refactoring to get rid of redundant functions

* Added required syncs/modifies for vectors

* Removed unneeded !constantStride paths

* Use cached MV to replace getColumnMapMV from CrsMatrix

* Ifpack2: remove unneeded includes

* Ifpack2: undo some find-and-replace in comments

Undoing some "Node" -> "node_type"

* MueLu: undo CMake change, should be its own PR

* MueLu: in configure, print out missing ETI setting

During configure, MueLu prints out the type combinations to ETI.
Add <complex, int, long long> to this, since it was missing.

* tpetra:  treat WriteOnly of subviews as ReadOnly.

* Ifpack2: in RBILUK, use tagged BMV::getLocalBlock

* Tpetra: add comment with caveat

on BMV::getLocalBlock(i, j, WriteOnly)

* tpetra: separated BugTests.cpp into separate test files so that we can
disable them separately (since they exercise different classes).

* Ifpack2: update BMV getLocalBlock calls

to use tagged access, and not use manual sync/modify (which has been
removed). With UVM, all Tpetra,Belos,Ifpack2,MueLu tests pass.

* more test changes

* mv localview tests

* wrapped up 6 tests for new behaviors

* tpetra:  scoping fix for Bug7234.cpp;
more output from getLocalView* when error occurs, as in parallel runs,
throw messages weren't always printed (e.g., from doExport when only
3/4 processors failed)

* Tpetra: add MV::aliases(const MV& other)

This allows a user to see if two MVs overlap, without actually getting
the local views and possibly hitting the reference count checker.

* Ifpack2: const correctness, use new getLocalView

- Throughout Ifpack2, remove manual sync/modify and calls to deprecated
  getLocalView. Use tagged getLocalView instead.
- In BlockRelaxation and the Containers, change interfaces to use const
  on views and multivectors that aren't actually modified

* Tpetra: fix one MV LocalView test, comment out another

We will make sure fix is OK, then uncomment and fix the other

* tpetra:  enable some Tpetra tests without UVM

* tpetra:  fix test for non-Cuda builds

* Ifpack2: fix more constness of apply vectors

* Kokkos: allow CudaUVMSpace::allocate again

Roll back change that made CudaUVMSpace::allocate throw
when UVM was not the default memory space for Cuda.

* tpetra:  changes needed to build with DEPRECATED_CODE=OFF #8821

* fix remaining test

* Tpetra - fix for nox failure

* Thyra: added missing fences to euclidean apply operations used
in MvTimesMatAddMv; the fences resolve test failures with
CUDA_LAUNCH_BLOCKING=0 and cleaner sync/modify in tpetra @rppawlo

Tpetra: the fences above provide a more surgical fix to the test
errors seen in #8821; this commit removes fences from
getLocalView*(ReadOnly).  @kyungjoo-kim

Belos: preventive fence added with @hkthorn's blessing
to mimic those in Thyra.

* tpetra: added fence between device kernels and retrieving blocks on host #8821

* Ifpack2: Minor fix

* DualView: make fencing behavior in sync consistent

sync<Device>() does extra exec space fences if the dev/host memory
spaces are the same. This was missing in sync_host/sync_device, so
this adds it there. Makes all Ifpack2 tests for UVM without launch
blocking.

* tpetra:  exercise the Teuchos-based interfaces, too

* changed access control from WriteOnly to OverwriteAll because semantics mean things

* WIP: fixing idot for MV dualview refactor

And some udpates to ifpack2 and amesos2 about that.
Working around Kokkos issue #3850 where the templated getLocalView was
used.

* WIP: idot/iallreduce cleanup

* Tpetra: finish idot/iallreduce refactor

* Fixed iallreduce test for non-uvm device

* Belos: use new Tpetra MV view interface

* Cleanup

* Remove extra dualview sync fences

* Ifpack2 passes without launch blocking

except RBILUK.

* Ifpack2: add temporary fence in RBILUK for BlockCrs

Later it should be possible to replace this fence with a refactored
DualView interface to BlockCrs.

* Tpetra: add a global reduce to a test so it will fail when only one proc is failing

* Tpetra: fix some typos in a Map unit test

* Tpetra: remove deprecated sync/modify calls from a unit test

* Ifpack2: fix impl_scalar/scalar mismatch

* Tpetra: remove/update remaining mentions of Gauss-Seidel

* Tpetra: fix iallreduce for builds without MPI

* Ifpack2: revert commenting out try/catch

Was causing unused var warning

* Ifpack2: Fixing vector mode mistake

* tpetra, ifpack2:  fixing several access mode errors

* Tpetra: use new MV view interface in Bug8794 test

* Amesos2: revert using tagged Tpetra MV getLocalView

for some reason, using ReadOnly tag to access MV view in
TpetraMultivecAdapter caused solve solution to not get copied back to
the Tpetra multivector. This is surprising because the views were just
used as the source for a Kokkos deep copy, and this caused
BlockRelaxation in Ifpack2 to fail for serial node (in which DualViews
are trivial, and all kernels are synchronous)

* Ifpack2: add back tag clobbered by merge

* kokkos:  patch from kokkos/kokkos#3857

* comment out all the instances of TPETRA_DEPRECATED (#9023)

* MueLu: add fence for recent intrepid2 changes

Fixes MueLu-Intrepid2 unit tests, uvm, no launch blocking.

* Tpetra: restore MV_reduce_strided test.

Key: use the MV (map, dualview, orig_dualview) constructor instead of the
(map, dualview) constructor. If $dualview is noncontiguous, the first one
lets you pass orig_dualview as the contiguous super-view containing
dualview, and orig_dualview can be sync'd without problems.

Also modify TempView::toLayout() to test span_is_contiguous, rather than
assuming that (Layout != LayoutStride) implies contiguous.

* tpetra:  Removed deprecated sync_device calls

* Tpetra: Remove some MultiVector that were checking modification state (#9032)

* Tpetra: Deprecate need_sync* in MultiVector

* Tpetra: for now, we won't deprecate need_sync_host/device

* tpetra:  removed instantiations of removed tests

* Tpetra: don't use CudaSpace in nonblocking collectives

OpenMPI does not support Cuda device buffers for nonblocking collectives
like MPI_Iallreduce, even with a Cuda-aware installation.

* Fix old typo in Ifpack2_UnitTestBlockRelaxation

* Fix access tag: OverwriteAll -> ReadWrite

Tpetra::COPY takes src then dst (opposite order to Kokkos deep_copy) so Y_cur is being read at first and written later.

* Undo bad DualView merge

Co-authored-by: Brian Kelley <bmkelle@sandia.gov>
Co-authored-by: Kyungjoo Kim <kyukim@sandia.gov>
Co-authored-by: Chris Siefert <csiefer@sandia.gov>
Co-authored-by: Geoff Danielson <gcdanie@sandia.gov>
Co-authored-by: Timothy A. Smith <tasmit@sandia.gov>
Co-authored-by: James M. Willenbring <jmwille@sandia.gov>
Co-authored-by: Matthias Mayr <matthias.mayr@unibw.de>
Co-authored-by: Timothy Smith <58484958+tasmith4@users.noreply.github.com>

* ascicgpu031: Testing updates

* MueLu: agg export does not play well with kokkos_aggregates

* Ctest: Adding Belos to email script

* Ctest: Adding Belos to email script

* Ctest: Adding Belos to email script

* Setting Tpetra Deprecated Code = ON #9067

* Ifpack & Ifpack2: Fix tiny bug in L1 method

* KokkosKernels: Fix bug in Serial specialization of spmv

Will only hit spmvs with a beta of exactly -1 using the Serial backend.

* Ifpack2: Add single kernel for diagonal extraction, L1 and small entry fix

* Disable tests in the UVM Off build

* MueLu: correct for variation due to roundoff in Convex Hulls

* Ifpack2 Relaxation: Add missing typedefs

Co-authored-by: Curtis C. Ober <ccober@sandia.gov>
Co-authored-by: Brian Kelley <bmkelle@sandia.gov>
Co-authored-by: iyamaza <iyamaza@sandia.gov>
Co-authored-by: Christian Glusa <caglusa@sandia.gov>
Co-authored-by: Samuel Browne <sebrown@sandia.gov>
Co-authored-by: Evan Harvey <57234914+e10harvey@users.noreply.github.com>
Co-authored-by: Evan Harvey <eharvey@sandia.gov>
Co-authored-by: Nathan Ellingwood <ndellin@sandia.gov>
Co-authored-by: trilinos-autotester <trilinos@sandia.gov>
Co-authored-by: Heidi K. Thornquist <hkthorn@sandia.gov>
Co-authored-by: Christian Glusa <cgcgcg@users.noreply.github.com>
Co-authored-by: Jonathan Hu <jhu@sandia.gov>
Co-authored-by: gsjaardema <gsjaardema@gmail.com>
Co-authored-by: Chris Siefert <csiefer@sandia.gov>
Co-authored-by: Roscoe A. Bartlett <rabartl@sandia.gov>
Co-authored-by: Peter Ohm <pohm@sandia.gov>
Co-authored-by: Paul Wolfenbarger <prwolfe@sandia.gov>
Co-authored-by: Alan Williams <william@sandia.gov>
Co-authored-by: iyamazaki <ic.yamazaki@gmail.com>
Co-authored-by: Nate Roberts <nvrober@sandia.gov>
Co-authored-by: James J. Elliott <jjellio@sandia.gov>
Co-authored-by: Jennifer Loe <jloe@sandia.gov>
Co-authored-by: Henry Swantner <HRSwant@sandia.gov>
Co-authored-by: Christian Trott <crtrott@sandia.gov>
Co-authored-by: William McLendon <wcmclen@sandia.gov>
Co-authored-by: Kyungjoo Kim <kyukim@sandia.gov>
Co-authored-by: Geoff Danielson <gcdanie@sandia.gov>
Co-authored-by: Timothy A. Smith <tasmit@sandia.gov>
Co-authored-by: James M. Willenbring <jmwille@sandia.gov>
Co-authored-by: Matthias Mayr <matthias.mayr@unibw.de>
Co-authored-by: Timothy Smith <58484958+tasmith4@users.noreply.github.com>
Co-authored-by: James Elliott <jjellio@users.noreply.github.com>
jrobcary pushed a commit to Tech-XCorp/Trilinos that referenced this pull request May 4, 2021
…ement (trilinos#8821)

* Tpetra: add new user-friendly MV view access

Also add new "owningView_" DualView member that refers to
the actual original DV (not a subview of anything else). This
is the DualView to sync in order to maintain consistency regardless
of how MultiVectors alias each other.

4 new view accessor functions: getLocalView[Host|Device][Non]Const()

- Respect constness
- Manage syncs and modifies for the user
- Prevent taking out a view in one space while any view in the other
space is live.
- Existing getLocalView()/getLocalViewHost()/getLocalViewDevice() just
have the reference count checking added (no sync/modify). This has no
effect for HostSpace or CudaUVMSpace since those host mirrors match the
device views.

* Tpetra - fix MV test 14.

* Tpetra - fix item 17

* Tpetra - fix item 20

* Tpetra - fix item 23

* Tpetra - fix item 28

* Tpetra - fix item 29

* Tpetra - fix item 35

* Tpetra - workaround for item 30

* Tpetra: Modifying Bug7758 test to use the new getLocalViewHostConst (which will make sure things are actually sync'd)

* Tpetra: fix MV [un]pack to respect host/device refcounts

* fix nonconst in Bug7745

* Tpetra: stashing

* Tpetra - issue 354 fix

* Tpetra: refactor sameObject so it doesn't simultaneously ask for host and device views

* Tpetra: remove static_assert, fix getLocalView() ret type

Remove bad static_assert that tripped for Cuda/CudaUVMSpace build.
Correct MultiVector::getLocalView() return type to be exactly consistent with
DualView::view().

* tpetra:  fixed error in MultiVector pack that caused failures with UVM=ON

* tpetra:  Fix for FEMultivector -- rather than take the subview of a
DualView and create a new vector with it, use the MultiVector
constructor that gets "offset" views of a vector (in which
@brian-kelley has the owningView_ working correctly).
While I was at it, I added a swap of the owningView_ to the MultiVector
swap() function.

* Tpetra: Fixing ImportExport/Issue3968:  The tests uses sync_to* without changing the modify flags, which mucks up our internal tracking

* tpetra:  fix to work without UVM

* tpetra:  changed getLocalViewHost/Device to new Const/NonConst versions
as appropriate. trilinos#8591
Did not change getLocalView as the Const/NonConst versions of
getLocalView do not exist yet
Did not change MV_reduce_strided to avoid creating conflicts for
@brian-kelley

* tpetra: change getLocalViewHost to appropriate Const/NonConst version trilinos#8591

* Tpetra: Modifying MultiVector to remove all references to old getLocalViewX functions

* Tpetra: More getLocalView mods

* Tpetra: Lots and lots of fixes to tests to use the new getLocalView<thing>Const/NonConst functions

* Tpetra: Fixing scaleBlockDiagonal signature as per Brian

* Tpetra: Fixes to the BlockView test to work correctly with UVM=OFF

* Tpetra: Fixing MultiVector print outs for help with non-unified memory debugging

* Tpetra - missing getlocal view "device"

* Tpetra: public Access:: ReadOnly/ReadWrite/WriteOnly

Make WithLocalAccess use these tags instead of internal Details:: ones.
These will also be used for the new MultiVector view access interface.

* moving from getLocalView... to getLocalView...(Tpetra::Accesspattern)

* Tpetra - get1dview logic change

* Tpetra, WIP: using new tagged view access

* Tpetra: use new interface for all MV getLocalView

* tpetra:  removed unneeded include file

* Tpetra: Tags!

* Tpetra: Tags!

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing more tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* Tpetra: Fixing tests

* tpetra: copied implementation of getLocalViewHost and getLocalViewDevice
from templated getLocalView, as the getLocalView version does not work.
This commit may be temporary, but it allows us to make progress on other
bugs while someone figures out the template-fu.
Sorry for the debugging statements; we'll get rid of those eventually.

* adding localview tests

* tpetra:  getLocalView<template> now works.
cleaned up my obnoxious print statements
kept Host and Device implementations that do NOT use getLocalView.

* tpetra:  added Tpetra::Access to many getLocalView<> instances
Tests still pass with UVM=ON.

* Tpetra: Removing the dreaded parantheses from the Access tags

* Manually intercept UVM allocations, throw exception

Effectively makes it impossible for any UVM allocations to
exist (except for Stokhos, which calls cudaMallocManaged directly)

* Tpetra: Deprecate old getLocalView functions

* Allow UVM allocations when Kokkos_ENABLE_CUDA_UVM=ON

* tpetra:  changed getLocalView to use access tags and getLocalViewDevice

* tpetra:  added access tags to getLocalView(); fixed scope of some pointers

* xpetra:  fixes to allow compilation

* WIP: deprecate getLocalBlock and start adding tagged overloads

* Tpetra: rewrite allReduceView to work with non-UVM

allReduceView had one bug and one sub-optimal thing:
- Tried to make a view copy with both layout and device different -
  Kokkos can't do that in a single deep_copy
- If a LayoutStride -> contiguous copy needed to be made, it always used
  LayoutLeft. If one of the input/output views was LayoutStride and the
  other was LayoutRight, they would both be copied to LayoutLeft. Now, use
  LayoutRight in this case.

Some utilities to help manage layouts and MPI + Kokkos views in general
are in the new file temporaryViewUtils.hpp: layout unification,
making a contiguous view, and making an MPI-safe view.
In the future these can be used to clean up idot and
iallreduce without losing efficiency.

* Tpetra:  Block MultiVector correctly uses getLocalView; removed stored pointer

* fix host device type for const_little_host_vec_type

* tpetra:  clean up of BlockMultiVector fixes

* Tpetra:  deprecated held pointer mvData_

* tpetra:  removed modifies without syncs; fixed MueLu tests

* Tpetra - removing sync in ScaleAndAssign test

* Tpetra - unit test is okay without modify and sync flags

* Tpetra - test passes without modify and sync operations

* Tpetra - remove unnecessary sync modify clear state flags

* Tpetra - remove multi vector sync/modify/ things

* Tpetra - remove sync modify things in other places

* Tpetra: remove withLocalAccess, for_each, transform

The new MV::getLocalView interface is a simpler substitute for these.

* Issue 8391. Switched to C++17 standard for GCC 8.3 build.

* FROSch: Convert enum NullSpaceType to scoped enum

By converting the enum to an enum class NullSpaceType, one is forced to
use the enum class and cannot replace it with integers anymore. This
guarantees, that the expressive enum class is used in implementations
rather than the implicitly encoded integers.

* Patch in KokkosKernels trilinos#872

(fix trilinos#8727, TeamPolicy team size too large in sort_crs_*)
Adds the KokkosKernels unit test that replicated this issue.

* MueLu: Adding Aggregate size percentiles to AggregateQuality

* Moved Tpetra CRS GS into Ifpack2 Relaxation

* Moved BlockCrs GS functionality into Relaxation

* Enabled new local GS code for CRS

* Reduce redundant code in CRS (GS/SGS use same fn)

* Using refactored block CRS local apply, unify GS/SGS

* More refactoring to get rid of redundant functions

* Added required syncs/modifies for vectors

* Removed unneeded !constantStride paths

* Use cached MV to replace getColumnMapMV from CrsMatrix

* Ifpack2: remove unneeded includes

* Ifpack2: undo some find-and-replace in comments

Undoing some "Node" -> "node_type"

* MueLu: undo CMake change, should be its own PR

* MueLu: in configure, print out missing ETI setting

During configure, MueLu prints out the type combinations to ETI.
Add <complex, int, long long> to this, since it was missing.

* tpetra:  treat WriteOnly of subviews as ReadOnly.

* Ifpack2: in RBILUK, use tagged BMV::getLocalBlock

* Tpetra: add comment with caveat

on BMV::getLocalBlock(i, j, WriteOnly)

* tpetra: separated BugTests.cpp into separate test files so that we can
disable them separately (since they exercise different classes).

* Ifpack2: update BMV getLocalBlock calls

to use tagged access, and not use manual sync/modify (which has been
removed). With UVM, all Tpetra,Belos,Ifpack2,MueLu tests pass.

* more test changes

* mv localview tests

* wrapped up 6 tests for new behaviors

* tpetra:  scoping fix for Bug7234.cpp;
more output from getLocalView* when error occurs, as in parallel runs,
throw messages weren't always printed (e.g., from doExport when only
3/4 processors failed)

* Tpetra: add MV::aliases(const MV& other)

This allows a user to see if two MVs overlap, without actually getting
the local views and possibly hitting the reference count checker.

* Ifpack2: const correctness, use new getLocalView

- Throughout Ifpack2, remove manual sync/modify and calls to deprecated
  getLocalView. Use tagged getLocalView instead.
- In BlockRelaxation and the Containers, change interfaces to use const
  on views and multivectors that aren't actually modified

* Tpetra: fix one MV LocalView test, comment out another

We will make sure fix is OK, then uncomment and fix the other

* tpetra:  enable some Tpetra tests without UVM

* tpetra:  fix test for non-Cuda builds

* Ifpack2: fix more constness of apply vectors

* Kokkos: allow CudaUVMSpace::allocate again

Roll back change that made CudaUVMSpace::allocate throw
when UVM was not the default memory space for Cuda.

* tpetra:  changes needed to build with DEPRECATED_CODE=OFF trilinos#8821

* fix remaining test

* Tpetra - fix for nox failure

* Thyra: added missing fences to euclidean apply operations used
in MvTimesMatAddMv; the fences resolve test failures with
CUDA_LAUNCH_BLOCKING=0 and cleaner sync/modify in tpetra @rppawlo

Tpetra: the fences above provide a more surgical fix to the test
errors seen in trilinos#8821; this commit removes fences from
getLocalView*(ReadOnly).  @kyungjoo-kim

Belos: preventive fence added with @hkthorn's blessing
to mimic those in Thyra.

* tpetra: added fence between device kernels and retrieving blocks on host trilinos#8821

* Ifpack2: Minor fix

* DualView: make fencing behavior in sync consistent

sync<Device>() does extra exec space fences if the dev/host memory
spaces are the same. This was missing in sync_host/sync_device, so
this adds it there. Makes all Ifpack2 tests for UVM without launch
blocking.

* tpetra:  exercise the Teuchos-based interfaces, too

* changed access control from WriteOnly to OverwriteAll because semantics mean things

* WIP: fixing idot for MV dualview refactor

And some udpates to ifpack2 and amesos2 about that.
Working around Kokkos issue trilinos#3850 where the templated getLocalView was
used.

* WIP: idot/iallreduce cleanup

* Tpetra: finish idot/iallreduce refactor

* Fixed iallreduce test for non-uvm device

* Belos: use new Tpetra MV view interface

* Cleanup

* Remove extra dualview sync fences

* Ifpack2 passes without launch blocking

except RBILUK.

* Ifpack2: add temporary fence in RBILUK for BlockCrs

Later it should be possible to replace this fence with a refactored
DualView interface to BlockCrs.

* Tpetra: add a global reduce to a test so it will fail when only one proc is failing

* Tpetra: fix some typos in a Map unit test

* Tpetra: remove deprecated sync/modify calls from a unit test

* Ifpack2: fix impl_scalar/scalar mismatch

* Tpetra: remove/update remaining mentions of Gauss-Seidel

* Tpetra: fix iallreduce for builds without MPI

* Ifpack2: revert commenting out try/catch

Was causing unused var warning

* Ifpack2: Fixing vector mode mistake

* tpetra, ifpack2:  fixing several access mode errors

* Tpetra: use new MV view interface in Bug8794 test

* Amesos2: revert using tagged Tpetra MV getLocalView

for some reason, using ReadOnly tag to access MV view in
TpetraMultivecAdapter caused solve solution to not get copied back to
the Tpetra multivector. This is surprising because the views were just
used as the source for a Kokkos deep copy, and this caused
BlockRelaxation in Ifpack2 to fail for serial node (in which DualViews
are trivial, and all kernels are synchronous)

* Ifpack2: add back tag clobbered by merge

* kokkos:  patch from kokkos/kokkos#3857

* comment out all the instances of TPETRA_DEPRECATED (trilinos#9023)

* MueLu: add fence for recent intrepid2 changes

Fixes MueLu-Intrepid2 unit tests, uvm, no launch blocking.

* Tpetra: restore MV_reduce_strided test.

Key: use the MV (map, dualview, orig_dualview) constructor instead of the
(map, dualview) constructor. If $dualview is noncontiguous, the first one
lets you pass orig_dualview as the contiguous super-view containing
dualview, and orig_dualview can be sync'd without problems.

Also modify TempView::toLayout() to test span_is_contiguous, rather than
assuming that (Layout != LayoutStride) implies contiguous.

* tpetra:  Removed deprecated sync_device calls

* Tpetra: Remove some MultiVector that were checking modification state (trilinos#9032)

* Tpetra: Deprecate need_sync* in MultiVector

* Tpetra: for now, we won't deprecate need_sync_host/device

* tpetra:  removed instantiations of removed tests

* Tpetra: don't use CudaSpace in nonblocking collectives

OpenMPI does not support Cuda device buffers for nonblocking collectives
like MPI_Iallreduce, even with a Cuda-aware installation.

* Fix old typo in Ifpack2_UnitTestBlockRelaxation

* Fix access tag: OverwriteAll -> ReadWrite

Tpetra::COPY takes src then dst (opposite order to Kokkos deep_copy) so Y_cur is being read at first and written later.

* Undo bad DualView merge

Co-authored-by: Brian Kelley <bmkelle@sandia.gov>
Co-authored-by: Kyungjoo Kim <kyukim@sandia.gov>
Co-authored-by: Chris Siefert <csiefer@sandia.gov>
Co-authored-by: Geoff Danielson <gcdanie@sandia.gov>
Co-authored-by: Timothy A. Smith <tasmit@sandia.gov>
Co-authored-by: James M. Willenbring <jmwille@sandia.gov>
Co-authored-by: Matthias Mayr <matthias.mayr@unibw.de>
Co-authored-by: Timothy Smith <58484958+tasmith4@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants