Skip to content

PAPI HL

Treece-Burgess edited this page Jan 31, 2024 · 35 revisions

High Level API

Note: The legacy high-level API (Application Programming Interface) (final release: 5.7.0) has been redesigned (first release: 6.0.0). Detailed information can be found in the White Paper.

The high-level API (Application Programming Interface) provides the ability to record performance events inside instrumented regions of serial, multi-processing (MPI, SHMEM) and thread (OpenMP, Pthreads) parallel applications. It is intended for users who want to perform simple event measurements in a very convenient way as they only have to mark code sections.

Events to be recorded are determined via an environment variable (PAPI_EVENTS) that lists comma separated events for any component (see example below). This enables users to perform different measurements without recompiling. In addition, users do not need to take care of printing performance events since an output is generated at the end of each measurement.

Some of the benefits of using the high-level API rather than the low-level API are that it is easier to use and requires less setup. For instance, the dynamic setting of performance events via the environment variable and the automatic detection of components makes the use of the high-level API extremely simple.

It should also be noted that the high-level API can be used in conjunction with the low-level API and, in fact, does call the low-level API.


High-Level Functions

The high-level API functions allow users to record and print specific performance events from both C and Fortran. Below are the function calls for the four high-level API functions:

C:

const char *region;
int retval = PAPI_hl_region_begin(region);

Arguments for PAPI_hl_region_begin:

  • region -- a unique region name. Such as "computation".
const char *region;
int retval = PAPI_hl_read(region);

Arguments for PAPI_hl_read:

  • region -- a unique region name corresponding to PAPI_hl_region_begin. Such as "computation".
const char *region;
int retval = PAPI_hl_region_end(region);

Arguments for PAPI_hl_region_end:

  • region -- a unique region name corresponding to PAPI_hl_region_begin. Such as "computation".
int retval = PAPI_hl_stop();

No arguments for PAPI_hl_stop.

Fortran:

use iso_c_binding
character(PAPI_MAX_STR_LEN, c_char) region
integer(c_int) check
call PAPIF_hl_region_begin(region, check)

Fortran arguments for PAPIF_hl_region_begin:

  • region -- a unique region name. Such as "computation".
  • check -- an error return value for Fortran.
use iso_c_binding
character(PAPI_MAX_STR_LEN, c_char) region
integer(c_int) check
call PAPIF_hl_read(region, check)

Fortran arguments for PAPIF_hl_read:

  • region -- a unique region name corresponding to PAPIF_hl_region_begin. Such as "computation".
  • check -- an error return value for Fortran.
use iso_c_binding
character(PAPI_MAX_STR_LEN, c_char) region
integer(c_int) check
call PAPIF_hl_region_end(region, check)

Fortran arguments for PAPIF_hl_region_end:

  • region -- a unique region name corresponding to PAPIF_hl_region_begin. Such as "computation".
  • check -- an error return value for Fortran.
use iso_c_binding
integer(c_int) check
call PAPIF_hl_stop(check)

Fortran arguments for PAPIF_hl_stop:

  • check -- an error return value for Fortran.

PAPI_hl_region_begin reads performance events at the beginning of a region (the first call also starts counting the events).

PAPI_hl_read reads performance events inside of a region and stores the difference to the corresponding beginning of the region.

PAPI_hl_region_end reads performance events at the end of a region and stores the difference to the corresponding beginning of the region.

PAPI_hl_stop stops stops a high-level event set (optional, and only necessary if the programmer wants to use the low-level API in addition).


Recording Performance Events

The following code example shows the use of the high-level API by marking a code section.

C:

#include <papi.h>
#include <stdio.h>
#include <stdlib.h>

void handle_error (int retval)
{
    printf("PAPI error %d: %s\n", retval, PAPI_strerror(retval));
    exit(1);
}
	
int main()
{
    int retval;
			
    retval = PAPI_hl_region_begin("computation");
    if ( retval != PAPI_OK )
        handle_error(retval);
		
    /* Do some computation here */
		
    retval = PAPI_hl_region_end("computation");
    if ( retval != PAPI_OK )
        handle_error(retval);

     /* Executes if all low-level PAPI
    function calls returned PAPI_OK */
    printf("\033[0;32mPASSED\n\033[0m");
    exit(0); 
}

Output

PASSED

On success, all PAPI functions return PAPI_OK and the above output is returned. On error, a non-zero error code is returned.

Fortran:

#include "fpapi.h"    
      program main
          integer retval

          call PAPIf_hl_region_begin("computation", retval)
          if ( retval .NE. PAPI_OK ) then
              write (*,*) "PAPIf_hl_region_begin failed!"
          end if

          !do some computation here

          call PAPIf_hl_region_end("computation", retval)
          if ( retval .NE. PAPI_OK ) then
              write (*,*) "PAPIf_hl_region_end failed!"
          end if
          
          !Executes if all low-level PAPI
          !function calls returned PAPI_OK
          write (*,*) "PASSED"
      end program main

Output

PASSED

On success, all PAPI functions return PAPI_OK and the above output is returned. On error, a non-zero error code is returned.

Note: To get a more detailed performance events evaluation PAPI_hl_read can be called several times inside of a region. However, the name argument must match the corresponding region name. It should also be noted, that a marked region is thread-local and therefore has to be in the same thread. If the programmer wants to mix high-level and low-level API calls, they must call PAPI_hl_stop() if low-level calls are used after a marked region.

Measurement Run:

If events are not specified via the environment variable PAPI_EVENTS, an output with default events is generated after the run. If supported by the respective machine the following default events are recorded:

  • perf::TASK-CLOCK
  • PAPI_TOT_INS
  • PAPI_TOT_CYC
  • PAPI_FP_INS (if not available PAPI tries to use PAPI_VEC_SP or PAPI_VEC_DP)
  • PAPI_FP_OPS (if not available PAPI tries to use PAPI_SP_OPS or PAPI_DP_OPS)

Note: Default events that are not available on the current machine, e.g. PAPI_FP_OPS, are automatically skipped. If PAPI_EVENTS is set, the default events are not recorded (unless they are added to PAPI_EVENTS). If some of the specified events cannot be interpreted, only the correct ones are taken for the measurement.

The output is generated in the current directory by default. However, it is recommended to specify an output directory for larger measurements, especially for MPI applications via environment variable PAPI_OUTPUT_DIRECTORY.

Example for setting performance events and output directory:

export PAPI_EVENTS="PAPI_TOT_INS,PAPI_TOT_CYC"
export PAPI_OUTPUT_DIRECTORY="scratch/measurement"

This will generate a directory called papi_hl_output in scratch/measurement that contains one or more output files in case of a MPI application.

Note: Performance events are stored as delta values, meaning the difference of the value from the end region call and the begin region call. Some events, like temperature or power, are specified as instantaneous values (see example below). In this case, only the value of the end region call is stored.

Example for setting instantaneous events:

export PAPI_EVENTS="coretemp:::hwmon0:temp3_input=instant"

Possible Output File:

Example of an output file for a serial application:

cat papi_hl_output/rank_720050.json
{
  "papi_version":"6.0.0.1",
  "cpu_info":"Intel(R) Xeon(R) CPU X7550 @ 2.00GHz",
  "max_cpu_rate_mhz":"1995",
  "min_cpu_rate_mhz":"1995",
  "event_definitions":{
    "perf::TASK-CLOCK":{
      "component":"perf_event",
      "type":"delta"
    },
    "PAPI_TOT_INS":{
      "component":"perf_event",
      "type":"delta"
    },
    "PAPI_TOT_CYC":{
      "component":"perf_event",
      "type":"delta"
    },
    "PAPI_FP_INS":{
      "component":"perf_event",
      "type":"delta"
    },
    "PAPI_FP_OPS":{
      "component":"perf_event",
      "type":"delta"
    }
  },
  "threads":{
    "0":{
      "regions":{
        "0":{
          "name":"computation",
          "parent_region_id":"-1",
          "cycles":"17729530032",
          "real_time_nsec":"8887417521",
          "perf::TASK-CLOCK":"8886388468",
          "PAPI_TOT_INS":"33007026164",
          "PAPI_TOT_CYC":"17624197693",
          "PAPI_FP_INS":"2003166805",
          "PAPI_FP_OPS":"2003166841"
        }
      }
    }
  }
}

Note: The output example above shows performance events for the region "computation" in JSON format. As it is a serial application there is only one thread containing performance events. In case of a thread parallel application there would be JSON objects for each thread. MPI applications would be saved in multiple files, one per MPI rank. In case more measurements are performed, the high-level library will not overwrite or delete old measurement directories. Instead, a timestamp is added to the old directory. For more convenience, the output can also be printed to stdout by setting PAPI_REPORT=1. This is not recommended for MPI applications as each MPI rank tries to print the output concurrently.

Enhanced Output:

The generated measurement output (see example above) can be converted in a better readable output. The python script papi_hl_output_writer.py located at src/high-level/scripts, enhances the output by creating some derived metrics, like IPC, MFlops/s, and MFlips/s as well as real and processor time in case the corresponding PAPI events have been recorded.

Example to generate an enhanced output using Python 3:

python3 papi_hl_output_writer.py --notation=derived --type=summary

Output:

{
    "computation": {
        "Region count": 1,
        "Real time in s": 7.62,
        "CPU time in s": 7.62,
        "IPC": 2.18,
        "MFLIPS/s": 263.0,
        "MFLOPS/s": 263.0
    }
}

Note: The output example above has been generated with the type option "summary" which summarizes performance events over all threads and MPI ranks in case of a parallel application. Use python3 papi_hl_output_writer.py --help to see all available options.

Multiplexing Support:

The high-level API also supports multiplexing of cpu core events via the environment variable PAPI_MULTIPLEX.

Enable multiplexing support:

export PAPI_MULTIPLEX=1

Overview of Environment Variables

The following environment variables are only used by the high-level API:

Environment Variable Description Type
PAPI_EVENTS PAPI events to measure String
PAPI_MULTIPLEX Enable Multiplexing -
PAPI_REPORT Print output to stdout -
PAPI_OUTPUT_DIRECTORY Path of the measurement directory Path
PAPI_HL_VERBOSE Enables warnings and info -
PAPI_DEBUG=HIGHLEVEL Enable debugging of high-level routines String
PAPI_HL_THREAD_MULTIPLE Set to "0" to disable multi-thread monitoring String

Note: Environment variables without a type are enabled when they are set to any value. The value will not be interpreted. To disable those variables use the command unset.