Skip to content

VTuneConnector

Vivek Kale edited this page Jan 15, 2024 · 5 revisions

Tool Description

The Kokkos Tools VTuneConnector inserts instrumentation for Intel VTune. Kernels are marked through VTune's domain/frame interface. That is a kernel that is identified with a specific domain, with each individual call to the kernel being a frame of that domain. If the developer provides a string label for the parallel region, then it is used as the domain identifier. Otherwise, the C++ type name of the functor or lambda is used.

The tool is located at: https://github.com/kokkos/kokkos-tools/tree/develop/profiling/vtune-connector

Compilation

The Makefile needs to know where VTune's home directory is. Other than that, simply type make inside the source directory. When compiling for specific platforms modify the simple Makefile to use the correct compiler and link flags. Alternatively, you can use cmake to build the VTuneConnector along with other connectors, by creating a new folder and then typing cmake ...

Usage

This is a standard tool which does not yet support tool chaining. Modify your VTune run environment to include:

KOKKOS_PROFILE_LIBRARY={PATH_TO_TOOL_DIRECTORY}/kp_vtune_connector.so

This tool additional memory footprint is dwarfed by the memory usage of VTune during profiling.

Output

Switch to the domain/frame based view inside of VTune to analyze your applications kernel focused.

Example Output

Consider the following code:

#include<Kokkos_Core.hpp>

int main(int argc, char* argv[]) {
  Kokkos::initialize(argc,argv);
  {
    int N = 100000000;
  
    Kokkos::View<double*> a("A",N);
    Kokkos::View<double*> b("B",N);
    Kokkos::View<double*> c("C",N);
  
    Kokkos::parallel_for(N, KOKKOS_LAMBDA (const int& i) {
      a(i) = 1.0*i;
      b(i) = 1.5*i;
      c(i) = 0.0;
    });
  
    double result = 0.0;
    for(int k = 0; k<50; k++) {
    
      Kokkos::parallel_for("AXPB", N, KOKKOS_LAMBDA (const int& i) {
        c(i) = 1.0*k*a(i) + b(i);
      });
    
      double dot;
      Kokkos::parallel_reduce("Dot", N, KOKKOS_LAMBDA (const int& i, double& lsum) {
        lsum += c(i)*c(i);
      },dot);
      result += dot;
  
    }
    printf("Result: %lf\n",result);
  }
  Kokkos::finalize();
}

And here is a screenshot in VTune of the Bottom-up Frame/Domain view. The Kernel names are used for the domains, and individual calls with the same name are frames in that domain. Note how the lambda got a compiler generated type name (Z4mainEUlRKiE_) assigned. Demangling can translate this into "main::{lambda(int const&)#1}". These lambda names are compiler dependent.

VTuneDomainFrame

Clone this wiki locally