Adding an alias for a page migrating memory space #5289

JBludau · 2022-07-28T22:48:23Z

This pr picks up on the discussion in #5193 and provides an alias for a page migrating memory space.

Even though the participants in the discussion preferred the name DefaultSharedMemorySpace I propose MigratingMemorySpace but have no problems changing that name. The reasoning was the following:

Dropping the Default as it is to close to the execution space aliases we have (DefaultExecutionSpace and DefaultHostExecutionSpace). I wanted to prevent it being mistaken as an execution space, thus it follows the convention we have for the (memory)spaces. Furthermore, if we stick with it being migrating the Default looses its meaning as there is only one per backend.
I propose Migrating, as it targets developers who actually want the property of a memory that automatically moves to the device and is accessed locally. I expect they want the same behavior independent of the backend and deliberately choose this (especially given that not all backends support this feature). If they switch to another backend that has no page migration, this should not silently chance behavior.
An alias for a memoryspace that is always accessible by host and device and available in (almost) all backends is useful. I propose to create a separate alias for this, maybe UniversalMemorySpace. I think this would need a good documentation on what the limitations and restrictions are, but it will be clearer what the cost for the universal accessibility is.
We can actually specify and thus test what we expect from this memory and track if we do any changes that would change behavior.

If we introduce this alias, we should reconsider removing Kokkos_ENABLE_CUDA_UVM as there is now a better way to specify that the user wants page migration and we are moving to a major release which includes HIP moving out of Experimental.

The specification we are testing is:

Migrate on fist touch in new execution space
Migrate only if switching to different execution space

Missing:

write documentation: #157

JBludau · 2022-07-28T22:55:33Z

Quick note on the amount of memory we allocate and the number of repetitions the test is running:
It turned out, that Nvidia's V100 had no problem cycling 10 times through 40% of its memory without being at 100% of the core clock rate. That is the reason for the high amount of (literal) boiler-plate runs.

MI100 (and below) do not support page migration, thus the test is disabled for these archs.

For SYCL: I need to read more documentation on getting device attributes and looking which devices do support page migration.

masterleinad

Add support for SYCL using Kokkos::Experimental::SYCLSharedUSMSpace.

core/unit_test/TestPageMigration.hpp

JBludau · 2022-07-29T16:41:47Z

Add support for SYCL using Kokkos::Experimental::SYCLSharedUSMSpace.

added. According to the doc the initial placement is not specified.
I will look at cuda and hip, especially what they do if we overcommit on migratingMemory.
Might make sense for us to not expect anything about the intital placement

core/unit_test/CMakeLists.txt

core/unit_test/TestPageMigration.hpp

masterleinad · 2022-07-29T19:21:07Z

Regarding the name, I'm not sure if people would find it intuitive to understand that this is a space they can access from the host and the device (which is what people would mainly be interested IMHO).

JBludau · 2022-07-29T20:10:35Z

looks like our rocm CI is building for Kokkos_ARCH_VEGA90A which it does not have hardware for. As the tests need real page migration support to pass, they are failing on our CI. Maybe we should add a label and specify the architecture. -> will do this in another pr -> this is outdated as it was a problem with the autodetection of the arch, see below

JBludau · 2022-07-29T20:13:51Z

Regarding the name, I'm not sure if people would find it intuitive to understand that this is a space they can access from the host and the device (which is what people would mainly be interested IMHO).

But this space is far more than just accessible from both. It actively moves the memory and allows local access afterwards. We can just add another alias for space that is accessible everywhere and does not migrate (As I proposed in the description)

masterleinad · 2022-08-01T15:39:37Z

On Intel GPUs, I see no migration overhead for host->device but only for device->host, i.e., the loop time is constant on the device and the first access on the host is slower than the remaining ones. Also, the first access in the first host loop is slower than the first access in the later host loops. Thus, the test currently fails with

Initial placement on device is 1 we expect true 
Memory migrates on every space access is 0 we expect true 
Memory migrates only once per access 0 we expect true

and I need

diff --git a/core/unit_test/TestPageMigration.hpp b/core/unit_test/TestPageMigration.hpp
index 9b6e1a051..ab4e0a494 100644
--- a/core/unit_test/TestPageMigration.hpp
+++ b/core/unit_test/TestPageMigration.hpp
@@ -122,7 +122,7 @@ TEST(TEST_CATEGORY, page_migration) {
   const unsigned int numWarmupRepetitions = 100;
   const unsigned int numDeviceHostCycles  = 3;
   double fractionOfDeviceMemory           = 0.4;
-  double threshold                        = 2.0;
+  double threshold                        = 1.5;
   size_t numBytes       = fractionOfDeviceMemory * getDeviceMemorySize();
   unsigned int numPages = numBytes / getBytesPerPage();
 
@@ -190,7 +190,7 @@ TEST(TEST_CATEGORY, page_migration) {
     if (cycle == 0 && indicatedPageMigrationsDevice == 0)
       initialPlacementOnDevice = true;
     else {
-      if (indicatedPageMigrationsDevice != 1) migratesOnlyOncePerAccess = false;
+      if (indicatedPageMigrationsDevice > 1) migratesOnlyOncePerAccess = false;
     }
 
     unsigned int indicatedPageMigrationsHost = std::count_if(

for it to pass.

dalg24 · 2022-08-02T04:05:50Z

Please justify that the unit test added here cannot ran faster and achieve the same thing. (I would probably already have complained for more than a few seconds.)

test 7
      Start  7: KokkosCore_UnitTest_Cuda4

7: Test command: /var/jenkins/workspace/Kokkos/build/core/unit_test/KokkosCore_UnitTest_Cuda4
7: Test timeout computed to be: 1500
7: [==========] Running 1 test from 1 test suite.
7: [----------] Global test environment set-up.
7: [----------] 1 test from cuda_uvm
7: [ RUN      ] cuda_uvm.page_migration
7: [       OK ] cuda_uvm.page_migration (38209 ms)
7: [----------] 1 test from cuda_uvm (38209 ms total)
7: 
7: [----------] Global test environment tear-down
7: [==========] 1 test from 1 test suite ran. (38209 ms total)
7: [  PASSED  ] 1 test.
 7/61 Test  #7: KokkosCore_UnitTest_Cuda4 ....................   Passed   39.66 sec

JBludau · 2022-08-03T19:19:04Z

just some name ideas for the temporally local and fixed but universal: (plz don't hate me for this)
UniversalLocalMemorySpace and UniversalFixedMemorySpace
GlobalMovingMemorySpace and GlobalPinnedMemorySpace
MoveThenLocalMemorySpace and FixedButGlobalMemorySpace
SharedMovingMemorySpace and SharedFixedMemorySpace

masterleinad · 2022-08-03T20:14:29Z

I like Universal but would try to avoid Local and Global so maybe UniversalMovingMemorySpace and UniversalPinnedMemorySpace?

PhilMiller · 2022-08-04T01:34:24Z

UniversalMoving sounds fine to me.

UniversalPinned is more troublesome, because either or both of host or device pinned could exist, and with different properties on both performance and non-compute accessibility

crtrott · 2022-08-04T15:21:31Z

I know its somewhat confusing but how about: SharedSpace, and SharedHostPinnedSpace. I know CUDA shared memory is something totally different, however ironically CUDA shared memory is the least shared of all allocations possible in CUDA except for registers ...

JBludau · 2022-08-04T16:01:49Z

I know its somewhat confusing but how about: SharedSpace, and SharedHostPinnedSpace. I know CUDA shared memory is something totally different, however ironically CUDA shared memory is the least shared of all allocations possible in CUDA except for registers ...

Damien will fight you on this :-)

JBludau · 2022-08-10T21:59:24Z

The unit test is now based on clock cycles rather than on wall clock time which eliminated the need for warmup runs and thus large memory chunks on AMD and Nvidia GPUs. @masterleinad could you rerun this for intel gpus?

Furthermore, I adapted the name @crtrott suggested but we can still change it. @crtrott could you sum up your reasoning for this here to have it documented?

JBludau · 2022-08-10T22:08:11Z

Please justify that the unit test added here cannot ran faster and achieve the same thing. (I would probably already have complained for more than a few seconds.)

test 7
      Start  7: KokkosCore_UnitTest_Cuda4

7: Test command: /var/jenkins/workspace/Kokkos/build/core/unit_test/KokkosCore_UnitTest_Cuda4
7: Test timeout computed to be: 1500
7: [==========] Running 1 test from 1 test suite.
7: [----------] Global test environment set-up.
7: [----------] 1 test from cuda_uvm
7: [ RUN      ] cuda_uvm.page_migration
7: [       OK ] cuda_uvm.page_migration (38209 ms)
7: [----------] 1 test from cuda_uvm (38209 ms total)
7: 
7: [----------] Global test environment tear-down
7: [==========] 1 test from 1 test suite ran. (38209 ms total)
7: [  PASSED  ] 1 test.
 7/61 Test  #7: KokkosCore_UnitTest_Cuda4 ....................   Passed   39.66 sec

should be in the order of ms now

masterleinad · 2022-08-11T14:14:20Z

The unit test is now based on clock cycles rather than on wall clock time which eliminated the need for warmup runs and thus large memory chunks on AMD and Nvidia GPUs. @masterleinad could you rerun this for intel gpus?

No, it doesn't pass on Intel GPUs and the numbers for device access are much higher than for host access even though the runtime before showed the opposite. Note that this calls different functions on the host and on the device and it's not implemented properly for SYCL+CUDA on the device. We use the function as seed and thus returning 0 is good enough for that but not for the purpose here. CI for SYCL+Cuda shows

[ RUN      ] sycl_shared_usm.page_migration
14: Page size as reported by os: 4096 bytes 
14: Allocating 100 pages of memory in pageMigratingMemorySpace.
14: Behavior found: 
14: Initial placement on device is: 1 we expect true 
14: Memory migrates back to GPU is: 0 we expect true 
14: Memory migrates at max once per access: 1 we expect true 
14: 
14: Please look at the following timings. A migration was marked detected if the time was larger than 0 for the device 
14: 
14: device timings of run 0:
14: TimingResult contains 10 results:
14: Duration of loop 0 is 0 clock cycles
14: Duration of loop 1 is 0 clock cycles
14: Duration of loop 2 is 0 clock cycles
14: Duration of loop 3 is 0 clock cycles
14: Duration of loop 4 is 0 clock cycles
14: Duration of loop 5 is 0 clock cycles
14: Duration of loop 6 is 0 clock cycles
14: Duration of loop 7 is 0 clock cycles
14: Duration of loop 8 is 0 clock cycles
14: Duration of loop 9 is 0 clock cycles
14: host timings of run 0:
14: TimingResult contains 10 results:
14: Duration of loop 0 is 20 clock cycles
14: Duration of loop 1 is 15 clock cycles
14: Duration of loop 2 is 15 clock cycles
14: Duration of loop 3 is 15 clock cycles
14: Duration of loop 4 is 15 clock cycles
14: Duration of loop 5 is 15 clock cycles
14: Duration of loop 6 is 14 clock cycles
14: Duration of loop 7 is 15 clock cycles
14: Duration of loop 8 is 14 clock cycles
14: Duration of loop 9 is 14 clock cycles
14: device timings of run 1:
14: TimingResult contains 10 results:
14: Duration of loop 0 is 0 clock cycles
14: Duration of loop 1 is 0 clock cycles
14: Duration of loop 2 is 0 clock cycles
14: Duration of loop 3 is 0 clock cycles
14: Duration of loop 4 is 0 clock cycles
14: Duration of loop 5 is 0 clock cycles
14: Duration of loop 6 is 0 clock cycles
14: Duration of loop 7 is 0 clock cycles
14: Duration of loop 8 is 0 clock cycles
14: Duration of loop 9 is 0 clock cycles
14: host timings of run 1:
14: TimingResult contains 10 results:
14: Duration of loop 0 is 18 clock cycles
14: Duration of loop 1 is 15 clock cycles
14: Duration of loop 2 is 14 clock cycles
14: Duration of loop 3 is 14 clock cycles
14: Duration of loop 4 is 16 clock cycles
14: Duration of loop 5 is 15 clock cycles
14: Duration of loop 6 is 15 clock cycles
14: Duration of loop 7 is 16 clock cycles
14: Duration of loop 8 is 14 clock cycles
14: Duration of loop 9 is 15 clock cycles
14: device timings of run 2:
14: TimingResult contains 10 results:
14: Duration of loop 0 is 0 clock cycles
14: Duration of loop 1 is 0 clock cycles
14: Duration of loop 2 is 0 clock cycles
14: Duration of loop 3 is 0 clock cycles
14: Duration of loop 4 is 0 clock cycles
14: Duration of loop 5 is 0 clock cycles
14: Duration of loop 6 is 0 clock cycles
14: Duration of loop 7 is 0 clock cycles
14: Duration of loop 8 is 0 clock cycles
14: Duration of loop 9 is 0 clock cycles
14: host timings of run 2:
14: TimingResult contains 10 results:
14: Duration of loop 0 is 18 clock cycles
14: Duration of loop 1 is 15 clock cycles
14: Duration of loop 2 is 15 clock cycles
14: Duration of loop 3 is 15 clock cycles
14: Duration of loop 4 is 16 clock cycles
14: Duration of loop 5 is 15 clock cycles
14: Duration of loop 6 is 15 clock cycles
14: Duration of loop 7 is 15 clock cycles
14: Duration of loop 8 is 15 clock cycles
14: Duration of loop 9 is 15 clock cycles
14: /var/jenkins/workspace/Kokkos/core/unit_test/TestPageMigration.hpp:206: Failure
14: Value of: passed
14:   Actual: false
14: Expected: true
14: [  FAILED  ] sycl_shared_usm.page_migration (57 ms)

All that is to say that a timing-based test would be better for testing the SYCL implementation.

core/unit_test/TestPageMigration.hpp

masterleinad · 2022-08-11T14:18:30Z

Also, the test fails for HIP in the CI with something like

4: [ RUN      ] hip_managed.page_migration
4: Page size as reported by os: 4096 bytes 
4: Allocating 100 pages of memory in pageMigratingMemorySpace.
4: Behavior found: 
4: Initial placement on device is: 0 we expect true 
4: Memory migrates back to GPU is: 0 we expect true 
4: Memory migrates at max once per access: 0 we expect true 
4: 
4: Please look at the following timings. A migration was marked detected if the time was larger than 13600 for the device 
4: 
4: device timings of run 0:
4: TimingResult contains 10 results:
4: Duration of loop 0 is 28412 clock cycles
4: Duration of loop 1 is 22968 clock cycles
4: Duration of loop 2 is 23048 clock cycles
4: Duration of loop 3 is 22856 clock cycles
4: Duration of loop 4 is 22896 clock cycles
4: Duration of loop 5 is 23355 clock cycles
4: Duration of loop 6 is 23344 clock cycles
4: Duration of loop 7 is 21965 clock cycles
4: Duration of loop 8 is 21477 clock cycles
4: Duration of loop 9 is 24143 clock cycles
4: host timings of run 0:
4: TimingResult contains 10 results:
4: Duration of loop 0 is 33 clock cycles
4: Duration of loop 1 is 33 clock cycles
4: Duration of loop 2 is 33 clock cycles
4: Duration of loop 3 is 33 clock cycles
4: Duration of loop 4 is 33 clock cycles
4: Duration of loop 5 is 32 clock cycles
4: Duration of loop 6 is 33 clock cycles
4: Duration of loop 7 is 33 clock cycles
4: Duration of loop 8 is 33 clock cycles
4: Duration of loop 9 is 32 clock cycles
4: device timings of run 1:
4: TimingResult contains 10 results:
4: Duration of loop 0 is 22179 clock cycles
4: Duration of loop 1 is 22560 clock cycles
4: Duration of loop 2 is 22308 clock cycles
4: Duration of loop 3 is 21959 clock cycles
4: Duration of loop 4 is 22544 clock cycles
4: Duration of loop 5 is 23154 clock cycles
4: Duration of loop 6 is 22839 clock cycles
4: Duration of loop 7 is 21566 clock cycles
4: Duration of loop 8 is 22493 clock cycles
4: Duration of loop 9 is 22335 clock cycles
4: host timings of run 1:
4: TimingResult contains 10 results:
4: Duration of loop 0 is 32 clock cycles
4: Duration of loop 1 is 33 clock cycles
4: Duration of loop 2 is 33 clock cycles
4: Duration of loop 3 is 33 clock cycles
4: Duration of loop 4 is 33 clock cycles
4: Duration of loop 5 is 33 clock cycles
4: Duration of loop 6 is 33 clock cycles
4: Duration of loop 7 is 33 clock cycles
4: Duration of loop 8 is 33 clock cycles
4: Duration of loop 9 is 33 clock cycles
4: device timings of run 2:
4: TimingResult contains 10 results:
4: Duration of loop 0 is 22573 clock cycles
4: Duration of loop 1 is 22335 clock cycles
4: Duration of loop 2 is 21409 clock cycles
4: Duration of loop 3 is 22105 clock cycles
4: Duration of loop 4 is 22678 clock cycles
4: Duration of loop 5 is 22888 clock cycles
4: Duration of loop 6 is 22534 clock cycles
4: Duration of loop 7 is 22149 clock cycles
4: Duration of loop 8 is 22226 clock cycles
4: Duration of loop 9 is 22630 clock cycles
4: host timings of run 2:
4: TimingResult contains 10 results:
4: Duration of loop 0 is 32 clock cycles
4: Duration of loop 1 is 33 clock cycles
4: Duration of loop 2 is 33 clock cycles
4: Duration of loop 3 is 33 clock cycles
4: Duration of loop 4 is 33 clock cycles
4: Duration of loop 5 is 33 clock cycles
4: Duration of loop 6 is 34 clock cycles
4: Duration of loop 7 is 32 clock cycles
4: Duration of loop 8 is 31 clock cycles
4: Duration of loop 9 is 31 clock cycles
4: /var/jenkins/workspace/Kokkos/core/unit_test/TestPageMigration.hpp:206: Failure
4: Value of: passed
4:   Actual: false
4: Expected: true
4: [  FAILED  ] hip_managed.page_migration (16 ms)

which is pretty close to my experience with SYCL.

core/unit_test/CMakeLists.txt

JBludau · 2022-08-11T15:46:24Z

Also, the test fails for HIP in the CI with something like

4: Expected: true
4: [ FAILED ] hip_managed.page_migration (16 ms)
which is pretty close to my experience with SYCL.

This should not even execute for HIP given the hardware in our CI has no proper page migration. Will investigate again. Thought the includeguard on the test would prevent it.

JBludau · 2022-08-11T21:39:18Z

BLOCKED by #5327 as it would break the CI otherwise

JBludau · 2022-08-29T22:08:21Z

Documentation issue #149

core/perf_test/CMakeLists.txt

core/perf_test/test_sharedSpace.cpp

core/unit_test/CMakeLists.txt

core/unit_test/TestSharedSpace.hpp

JBludau · 2022-09-06T19:38:32Z

Okay, I changed the following:

Except for OpenMPTarget and OpenACC the SharedSpace alias is now defined. If there is a device (Cuda,HIP,SYCL) it will point to the corresponding page migrating MemorySpace. If it is a host only build the SharedSpace points to HostSpace. There is both a preproc define and a constexpr function for checking if the feature is available (We do not have a configure time check, but this would be trivial for users if they need to know)

The unit test now test for the conditions @crtrott proposed:

Essentially the semantics of SharedSpace are:
(1) every existing execution space type can access it.
(2) if accessing a SharedSpace repeatedly from the same execution space, without accessing it from some other one in between, it will perform close to the performance of the native memory space of that execution space.

Thus we do not evaluate the first access in a new ExecutionSpace and compare the subsequent accesses to the speed of pure local memory. If we detect more than 50% deviation in the memory speed the test fails.

…lel workloads)

…here is no migration

…e case

Co-authored-by: Phil Miller <pbmille@sandia.gov>

crtrott

I did not in detail review the tests, but my concerns are addressed regarding when this is defined.

core/perf_test/CMakeLists.txt

core/src/Kokkos_Core_fwd.hpp

JBludau · 2022-09-09T15:27:19Z

Retest this please

PhilMiller · 2022-09-09T21:11:54Z

core/perf_test/test_sharedSpace.cpp

+  for (unsigned i = 0; i < numDeviceHostCycles; ++i) {
+    // WARMUP GPU
+    incrementInLoop<Kokkos::DefaultExecutionSpace>(
+        deviceData,
+        numWarmupRepetitions);  // warming up gpu
+    // GET RESULTS DEVICE
+    deviceResults.push_back(incrementInLoop<Kokkos::DefaultExecutionSpace>(
+        migratableData, numRepetitions));
+
+    // WARMUP HOST
+    incrementInLoop<Kokkos::DefaultHostExecutionSpace>(
+        hostData,
+        numWarmupRepetitions);  // warming up host
+    // GET RESULTS HOST
+    hostResults.push_back(incrementInLoop<Kokkos::DefaultHostExecutionSpace>(
+        migratableData, numRepetitions));
+  }


Maybe I'm reading things wrong, but shouldn't the warmup calls here both access migratableData - i.e. pull the pages to the space being measured?

Or is the warmup being performed here the clock-speed warmup, to make sure that each core is running at full speed? If so, please elaborate the comments within this loop to clarify.

It is indeed to ensure the core-clock is at max when we do the actual measurement that does include the page migration. Therefore, it should not use the migratableData

hope 3feb4d is helping

JBludau · 2022-09-19T14:12:02Z

Retest this please

core/perf_test/test_sharedSpace.cpp

Co-authored-by: Bruno Turcksin <bruno.turcksin@gmail.com>

JBludau added the Enhancement Improve existing capability; will potentially require voting label Jul 28, 2022

masterleinad requested changes Jul 29, 2022

View reviewed changes

dalg24 reviewed Jul 29, 2022

View reviewed changes

masterleinad reviewed Jul 29, 2022

View reviewed changes

core/unit_test/CMakeLists.txt Outdated Show resolved Hide resolved

masterleinad reviewed Jul 29, 2022

View reviewed changes

core/unit_test/TestPageMigration.hpp Outdated Show resolved Hide resolved

core/unit_test/TestPageMigration.hpp Outdated Show resolved Hide resolved

masterleinad reviewed Aug 11, 2022

View reviewed changes

core/unit_test/TestPageMigration.hpp Outdated Show resolved Hide resolved

dalg24 reviewed Aug 11, 2022

View reviewed changes

core/unit_test/CMakeLists.txt Outdated Show resolved Hide resolved

JBludau mentioned this pull request Aug 11, 2022

Enable automatic detection of arch when enabling 'HIP' with 'hipcc' #5327

Merged

JBludau force-pushed the SharedMemorySpace branch from 4c50f94 to f737917 Compare August 21, 2022 14:06

JBludau mentioned this pull request Aug 29, 2022

SharedHostPinnedSpace alias in fwd declaration #5405

Merged

2 tasks

Rombur reviewed Aug 31, 2022

View reviewed changes

core/perf_test/CMakeLists.txt Outdated Show resolved Hide resolved

core/perf_test/test_sharedSpace.cpp Outdated Show resolved Hide resolved

core/unit_test/CMakeLists.txt Show resolved Hide resolved

core/unit_test/TestSharedSpace.hpp Outdated Show resolved Hide resolved

JBludau requested a review from crtrott September 6, 2022 19:38

JBludau and others added 7 commits September 7, 2022 03:06

Adjusted threshold to 1.5 in an attempt to make ci pass on cpu (paral…

53d88bf

…lel workloads)

Unit test skips if host and device execution space are the same, as t…

bcfbbb3

…here is no migration

change from constexp func to constexpr variable and switching to snak…

b17e264

…e case

try if () around _WIN32 is getting windows to compile

74d6881

changed tests to use has_shared_space constexpr variable

4b614a9

moved include order to please Bill Gates

00aa7cd

Co-authored-by: Phil Miller <pbmille@sandia.gov>

remove include windows.h as it is done in Kokkos_core

7808e3b

crtrott approved these changes Sep 8, 2022

View reviewed changes

JBludau added 4 commits September 8, 2022 19:19

love windows includes

744617a

try if minimal windows header solves the issue

8697f2b

okay, lets redefine NOMINMAX ... but something is really fucked up

f1b01fb

switched from double to uint64_t in for_each

c3b5f5a

PhilMiller reviewed Sep 8, 2022

View reviewed changes

core/perf_test/CMakeLists.txt Outdated Show resolved Hide resolved

PhilMiller reviewed Sep 8, 2022

View reviewed changes

core/src/Kokkos_Core_fwd.hpp Show resolved Hide resolved

switched from lambda to named functor to get rid of ENABLE_CUDA_LAMBDA

8bf5526

PhilMiller reviewed Sep 9, 2022

View reviewed changes

comment why we are using different memory for warmup

3feb4de

dropped clang analyzer annotation for ShareSpace

012a789

dalg24 mentioned this pull request Sep 20, 2022

Deprecate CudaUVMSpace::available() #5472

Closed

Rombur approved these changes Sep 21, 2022

View reviewed changes

core/perf_test/test_sharedSpace.cpp Outdated Show resolved Hide resolved

core/perf_test/test_sharedSpace.cpp Outdated Show resolved Hide resolved

JBludau and others added 2 commits September 21, 2022 10:50

Update core/perf_test/test_sharedSpace.cpp

9457b67

Co-authored-by: Bruno Turcksin <bruno.turcksin@gmail.com>

Update core/perf_test/test_sharedSpace.cpp

2dee5cb

Co-authored-by: Bruno Turcksin <bruno.turcksin@gmail.com>

crtrott merged commit 1a15ff5 into kokkos:develop Sep 24, 2022

JBludau mentioned this pull request Sep 26, 2022

CHANGELOG: 4.0 #5439

Closed

ndellingwood mentioned this pull request Sep 29, 2022

Trilinos nightly test failure, cuda/11.2 Power9+Volta70 arch: defaultdevicetype.shared_space failed #5502

Closed

JBludau mentioned this pull request Oct 11, 2022

Do we need a HIP_ENABLE_MANAGED/SYCL_ENABLE_SHARED flag like we have for cuda #5193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding an alias for a page migrating memory space #5289

Adding an alias for a page migrating memory space #5289

JBludau commented Jul 28, 2022 •

edited

JBludau commented Jul 28, 2022 •

edited

masterleinad left a comment

JBludau commented Jul 29, 2022

masterleinad commented Jul 29, 2022

JBludau commented Jul 29, 2022 •

edited

JBludau commented Jul 29, 2022

masterleinad commented Aug 1, 2022

dalg24 commented Aug 2, 2022

JBludau commented Aug 3, 2022

masterleinad commented Aug 3, 2022

PhilMiller commented Aug 4, 2022

crtrott commented Aug 4, 2022 •

edited

JBludau commented Aug 4, 2022

JBludau commented Aug 10, 2022

JBludau commented Aug 10, 2022

masterleinad commented Aug 11, 2022

masterleinad commented Aug 11, 2022

JBludau commented Aug 11, 2022 •

edited

JBludau commented Aug 11, 2022

JBludau commented Aug 29, 2022

JBludau commented Sep 6, 2022 •

edited

crtrott left a comment

JBludau commented Sep 9, 2022

PhilMiller Sep 9, 2022

JBludau Sep 9, 2022

JBludau Sep 11, 2022

JBludau commented Sep 19, 2022

Adding an alias for a page migrating memory space #5289

Adding an alias for a page migrating memory space #5289

Conversation

JBludau commented Jul 28, 2022 • edited

JBludau commented Jul 28, 2022 • edited

masterleinad left a comment

Choose a reason for hiding this comment

JBludau commented Jul 29, 2022

masterleinad commented Jul 29, 2022

JBludau commented Jul 29, 2022 • edited

JBludau commented Jul 29, 2022

masterleinad commented Aug 1, 2022

dalg24 commented Aug 2, 2022

JBludau commented Aug 3, 2022

masterleinad commented Aug 3, 2022

PhilMiller commented Aug 4, 2022

crtrott commented Aug 4, 2022 • edited

JBludau commented Aug 4, 2022

JBludau commented Aug 10, 2022

JBludau commented Aug 10, 2022

masterleinad commented Aug 11, 2022

masterleinad commented Aug 11, 2022

JBludau commented Aug 11, 2022 • edited

JBludau commented Aug 11, 2022

JBludau commented Aug 29, 2022

JBludau commented Sep 6, 2022 • edited

crtrott left a comment

Choose a reason for hiding this comment

JBludau commented Sep 9, 2022

PhilMiller Sep 9, 2022

Choose a reason for hiding this comment

JBludau Sep 9, 2022

Choose a reason for hiding this comment

JBludau Sep 11, 2022

Choose a reason for hiding this comment

JBludau commented Sep 19, 2022

JBludau commented Jul 28, 2022 •

edited

JBludau commented Jul 28, 2022 •

edited

JBludau commented Jul 29, 2022 •

edited

crtrott commented Aug 4, 2022 •

edited

JBludau commented Aug 11, 2022 •

edited

JBludau commented Sep 6, 2022 •

edited