-
Notifications
You must be signed in to change notification settings - Fork 59
MemoryEvents
MemoryEvents tracks a timeline of allocation and deallocation events in Kokkos Memory Spaces. It records time, pointer, size, memory-space-name, and allocation-name. This is in particular useful for debugging purposes to understand where all the memory is going.
Additionally, the tool provides a timeline of memory usage for each individual Kokkos Memory Space.
The tool is located at: https://github.com/kokkos/kokkos-tools/tree/develop/profiling/memory-events
Simply type make inside the source directory. When compiling for specific platforms, modify the simple Makefile to use the correct compiler and compiler flags.
One can also use the cmake build system. Create a build directory and go to that directory. Type ccmake .. to ensure that kp_memory_memory_events is in the list of profilers. Then, type cmake ..; make -j; sudo make install
to build the profiler.
This is a standard tool which does not yet support tool chaining. In Bash do:
export KOKKOS_TOOLS_LIBS={PATH_TO_TOOL_DIRECTORY}/kp_memory_events.so
./application COMMANDS
Since a full trace is kept in memory, this tool might add considerable memory consumption to long runs with frequent data allocations.
The MemoryEvents tool will generate one file for the event list and one for file for each active memory space with the utilization timeline. The files are named HOSTNAME-PROCESSID.mem_events
and HOSTNAME-PROCESSID-MEMSPACE.memspace_usage
Consider the following code:
#include <Kokkos_Core.hpp>
typedef Kokkos::View<int*,Kokkos::CudaSpace> a_type;
typedef Kokkos::View<int*,Kokkos::CudaUVMSpace> b_type;
typedef Kokkos::View<int*,Kokkos::CudaHostPinnedSpace> c_type;
int main() {
Kokkos::initialize();
{
int N = 10000000;
for(int i =0; i<2; i++) {
a_type a("A",N);
{
b_type b("B",N);
c_type c("C",N);
for(int j =0; j<N; j++) {
b(j)=2*j;
c(j)=3*j;
}
}
}
}
Kokkos::finalize();
}
This will produce the following output:
HOSTNAME-PROCESSID.mem_events
# Memory Events
# Time Ptr Size MemSpace Op Name
0.311749 0x2048a0080 128 CudaHostPinned Allocate InternalScratchUnified
0.311913 0x2305ca0080 2048 Cuda Allocate InternalScratchFlags
0.312108 0x2305da0080 16384 Cuda Allocate InternalScratchSpace
0.312667 0x23060a0080 40000000 Cuda Allocate A
0.317260 0x23086e0080 40000000 CudaUVM Allocate B
0.335289 0x2049a0080 40000000 CudaHostPinned Allocate C
0.368485 0x2049a0080 -40000000 CudaHostPinned DeAllocate C
0.377285 0x23086e0080 -40000000 CudaUVM DeAllocate B
0.379795 0x23060a0080 -40000000 Cuda DeAllocate A
0.380185 0x23060a0080 40000000 Cuda Allocate A
0.384785 0x23086e0080 40000000 CudaUVM Allocate B
0.400073 0x2049a0080 40000000 CudaHostPinned Allocate C
0.433218 0x2049a0080 -40000000 CudaHostPinned DeAllocate C
0.441988 0x23086e0080 -40000000 CudaUVM DeAllocate B
0.444391 0x23060a0080 -40000000 Cuda DeAllocate A
HOSTNAME-PROCESSID-Cuda.memspace_usage
# Space Cuda
# Time(s) Size(MB) HighWater(MB) HighWater-Process(MB)
0.311913 0.0 0.0 81.8
0.312108 0.0 0.0 81.8
0.312667 38.2 38.2 81.8
0.379795 0.0 38.2 158.1
0.380185 38.2 38.2 158.1
0.444391 0.0 38.2 158.1
HOSTNAME-PROCESSID-CudaUVM.memspace_usage
# Space CudaUVM
# Time(s) Size(MB) HighWater(MB) HighWater-Process(MB)
0.317260 38.1 38.1 81.8
0.377285 0.0 38.1 158.1
0.384785 38.1 38.1 158.1
0.441988 0.0 38.1 158.1
HOSTNAME-PROCESSID-CudaHostPinned.memspace_usage
# Space CudaHostPinned
# Time(s) Size(MB) HighWater(MB) HighWater-Process(MB)
0.311749 0.0 0.0 81.8
0.335289 38.1 38.1 120.0
0.368485 0.0 38.1 158.1
0.400073 38.1 38.1 158.1
0.433218 0.0 38.1 158.1
SAND2017-3786