# Tutorial 1: Querying configured components and available events

Python is increasingly becoming performance conscious. Python 3.12 introduces profiling using linux perf. The Python community is keen on removing the Global Interpreter Lock (GIL) which would make Python truly multi-threaded multi-process. This can be expected to happen gradually over the next few Python releases.

Python has a rich ecosystem for machine learning that includes libraries like Pytorch, Tensorflow, Scipy, FastAI, etc. These libraries support computation on GPUs in a native way.

It would benefit the Python community to be able to access cross-platform native hardware counters to perform system-wide performance measurement and analysis. Thus, `CyPAPI` ports the `PAPI` library to meet that need.

`CyPAPI` has been written in Cython as an extension module and provides all of PAPI functionality in a Pythonic object oriented way.

## Importing `cypapi` and initialize library

In [1]:
from cypapi import *

The `PAPI` library has to be initialized with

In [2]:
pyPAPI_library_init()

To check the level of PAPI initialization

In [3]:
pyPAPI_is_initialized()

1

`CyPAPI` supports PAPI version `>=7.0.0`. To check the version of the installed PAPI library

In [4]:
pyPAPI_get_version_string()

'7.0.1.0'

### Components

PAPI can be installed with any number of components. `CyPAPI` can retrieve information about each of these components.

In [5]:
num_cmp = pyPAPI_num_components()
num_cmp

14

In [3]:
from papi.utils import papi_component_avail, papi_native_avail, papi_avail

In [5]:
papi_component_avail()

Component 0: perf_event
	Description:	Linux perf_event CPU counters
	Num events:	165
Component 1: perf_event_uncore
	Description:	Linux perf_event CPU uncore and northbridge
	Num events:	1126
Component 2: cuda
	Description:	CUDA profiling via NVIDIA CuPTI interfaces
	Num events:	1134672
Component 3: nvml
	Description:	NVML provides the API for monitoring NVIDIA hardware (power usage, temperature, fan speed, etc)
	Num events:	216
Component 4: lmsensors
	Description:	lm-sensors provides tools for monitoring the hardware health
	Num events:	170
Component 5: sde
	Description:	Software Defined Events (SDE) component
	Num events:	0
Component 6: io
	Description:	A component to read /proc/self/io
	Num events:	7
Component 7: coretemp
	Description:	Linux hwmon temperature and other info
	Num events:	42
Component 8: net
	Description:	Linux network driver statistics
	Num events:	160
Component 9: powercap
	Description:	Linux powercap energy measurements
	Num events:	30
Component 10: rapl
	Descripti

## Events in component

Each of these components have events that can be measured.

In [6]:
cuda_evts = papi_native_avail(2)
cuda_evts

['cuda:::dram__bytes.avg.pct_of_peak_burst_active:device=0',
 'cuda:::dram__bytes.avg.pct_of_peak_burst_elapsed:device=0',
 'cuda:::dram__bytes.avg.pct_of_peak_burst_frame:device=0',
 'cuda:::dram__bytes.avg.pct_of_peak_burst_region:device=0',
 'cuda:::dram__bytes.avg.pct_of_peak_sustained_active:device=0',
 'cuda:::dram__bytes.avg.pct_of_peak_sustained_elapsed:device=0',
 'cuda:::dram__bytes.avg.pct_of_peak_sustained_frame:device=0',
 'cuda:::dram__bytes.avg.pct_of_peak_sustained_region:device=0',
 'cuda:::dram__bytes.avg.peak_burst:device=0',
 'cuda:::dram__bytes.avg.peak_burst_active:device=0',
 'cuda:::dram__bytes.avg.peak_burst_elapsed:device=0',
 'cuda:::dram__bytes.avg.peak_burst_frame:device=0',
 'cuda:::dram__bytes.avg.peak_burst_region:device=0',
 'cuda:::dram__bytes.avg.peak_sustained:device=0',
 'cuda:::dram__bytes.avg.peak_sustained_active:device=0',
 'cuda:::dram__bytes.avg.peak_sustained_elapsed:device=0',
 'cuda:::dram__bytes.avg.peak_sustained_frame:device=0',
 'cuda::

We will do some actual measurement in the next tutorial.

## Preset events

In [8]:
papi_avail()

PAPI_L1_ICM L1I cache misses
PAPI_L2_DCM L2D cache misses
PAPI_L2_ICM L2I cache misses
PAPI_L3_DCM L3D cache misses
PAPI_L3_ICM L3I cache misses
PAPI_L1_TCM L1 cache misses
PAPI_L2_TCM L2 cache misses
PAPI_L3_TCM L3 cache misses
PAPI_CA_SNP Snoop Requests
PAPI_CA_SHR Ex Acces shared CL
PAPI_CA_CLN Ex Access clean CL
PAPI_CA_INV Cache ln invalid
PAPI_CA_ITV Cache ln intervene
PAPI_L3_LDM L3 load misses
PAPI_L3_STM L3 store misses
PAPI_BRU_IDL Branch idle cycles
PAPI_FXU_IDL IU idle cycles
PAPI_FPU_IDL FPU idle cycles
PAPI_LSU_IDL L/SU idle cycles
PAPI_TLB_DM Data TLB misses
PAPI_TLB_IM Instr TLB misses
PAPI_TLB_TL Total TLB misses
PAPI_L1_LDM L1 load misses
PAPI_L1_STM L1 store misses
PAPI_L2_LDM L2 load misses
PAPI_L2_STM L2 store misses
PAPI_BTAC_M Br targt addr miss
PAPI_PRF_DM Data prefetch miss
PAPI_L3_DCH L3D cache hits
PAPI_TLB_SD TLB shootdowns
PAPI_CSR_FAL Failed store cond
PAPI_CSR_SUC Good store cond
PAPI_CSR_TOT Total store cond
PAPI_MEM_SCY Stalled mem cycles
PAPI_MEM_RCY S

## Accurate timing functions

`CyPAPI` ports several accurate timing functions. Few examples are shown below

In [12]:
from time import sleep
start = pyPAPI_get_real_usec()
sleep(1)
stop = pyPAPI_get_real_usec()
print(f'Time elapsed = {stop-start} us')

Time elapsed = 1000356 us


In [13]:
start = pyPAPI_get_virt_usec()
sleep(1)
stop = pyPAPI_get_virt_usec()
print(f'Time elapsed = {stop-start} us')

Time elapsed = 152 us
