Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

CUDA memory access recorder and visualizer

This is a simple tool that records all memory accesses and timestamps of the accesses in a CUDA program. It is done by writing every memory access index and streaming multiprocessor clock cycle value during the access into global device memory. Depending on the amount of memory accesses, this might require quite a lot of space and makes this tool usable only on very small datasets.

The example directory contains some screen recordings of access pattern animations for the examples.


To generate access patterns for the examples seen above, go to

cd examples/v0
make && ./bin/main

Start a local web server for the animation app:

cd ../../web && python3 -m http.server

Go to and submit the generated examples/v0/access-patterns-v0.json file.

You should now see the access pattern from the first gif.

To use the v1 kernel, open examples/v0/ and define the kernel_v1 macro instead of kernel_v0.

If it does not work

Some things to try:

  • Make sure you are running wrapper AccessCounter first, then PatternRecorder.
  • The wrapper objects support only array indexing, pointer arithmetic etc. is not available.
  • Make sure the wrapper object calls enter_kernel once somewhere at the beginning of a kernel before the first memory access.
  • Make sure you call cudaDeviceSynchronize after the kernel call so that the unified memory pointers are accessible at the host.
  • Define the value of macro PR_VERBOSITY as 1 before including pattern_recorder.cuh. This will trigger some asserts.
  • If you are getting a warning of possibly using too much device memory, try reducing the number of required memory accesses by using a smaller data sample.


Record GPU memory accesses of a CUDA program and visualize the access pattern in a browser








No releases published


No packages published