# Distributed profiling and energy measurements with perun

How to locate performance issues on your distributed application, and fix them, in three steps:

1. Find the problematic/slow function in your code.
2. Gather statistics and data about the slow function.
3. Fix it!

---

<div style="float: left; padding-right: 2em; padding-top: 2em;">
    <img src="https://raw.githubusercontent.com/Helmholtz-AI-Energy/perun/refs/heads/main/docs/images/full_logo.svg"></img>
</div>

If you want more information on perun, find any issues, or questions leaves us a message on [github](https://github.com/Helmholtz-AI-Energy/perun) or check the [documentation](https://perun.readthedocs.io/en/latest/?badge=latest).

## Installation

Perun can be installed with ```pip```:

```shell
pip install perun
```

Thourgh pip, optional dependencies can be installed that target different hardware accelerators, as well as the optional MPI support.


```shell
pip install perun[mpi,nvidia]
# or
pip install perun[mpi,rocm]
```

Running the cell below will install perun.

In [None]:
%%bash
pip install perun[mpi,nvidia]
perun --version

Collecting nvidia-ml-py>=12.535.77 (from perun[mpi,nvidia])
  Using cached nvidia_ml_py-12.575.51-py3-none-any.whl.metadata (9.3 kB)
Using cached nvidia_ml_py-12.575.51-py3-none-any.whl (47 kB)
Installing collected packages: nvidia-ml-py
Successfully installed nvidia-ml-py-12.575.51



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


perun 0.9.0


## Basic command line usage

Perun is primarily a command line tool. The complete functionality can be accessed through the ```perun``` command. On a terminal, simply type ```perun``` and click enter to get a help dialog with the available subcommands.

In [None]:
!perun

usage: perun [-h] [-c CONFIGURATION] [-l {DEBUG,INFO,WARN,ERROR,CRITICAL}]
             [--log_file LOG_FILE] [--version]
             {showconf,sensors,metadata,export,monitor} ...

Distributed performance and energy monitoring tool

positional arguments:
  {showconf,sensors,metadata,export,monitor}
    showconf            Print perun configuration in INI format.
    sensors             Print available sensors by host and rank.
    metadata            Print available metadata.
    export              Export existing output file to another format.
    monitor             Gather power consumption from hardware devices while
                        SCRIPT [SCRIPT_ARGS] is running. SCRIPT is a path to
                        the python script to monitor, run with arguments
                        SCRIPT_ARGS.

options:
  -h, --help            show this help message and exit
  -c CONFIGURATION, --configuration CONFIGURATION
                        Path to perun configuration file.
  -l {DE

**perun** can already be used after this, without any further configuration or modification of the code. perun can monitor command line scripts, and other programs from the command lines. Try running the ```perun monitor -b sleep 10``` on a terminal, or by running the cell below.

In [None]:
%%bash
pwd
mpirun -n 4 perun monitor -b sleep 10

/home/juanpedroghm/code/heat/doc/source/tutorials/notebooks
[2025-05-20 16:59:39,969][[1;36mperun.core[0m][[1;35mbackends[0m][[1;31mERROR[0m] - R3/4:[1;31mUnknown error loading dependecy NVMLBackend[0m
[2025-05-20 16:59:39,969][[1;36mperun.core[0m][[1;35mbackends[0m][[1;31mERROR[0m] - R3/4:[1;31mNVML Shared Library Not Found[0m
[2025-05-20 16:59:39,969][[1;36mperun.core[0m][[1;35mbackends[0m][[1;31mERROR[0m] - R1/4:[1;31mUnknown error loading dependecy NVMLBackend[0m
[2025-05-20 16:59:39,970][[1;36mperun.core[0m][[1;35mbackends[0m][[1;31mERROR[0m] - R1/4:[1;31mNVML Shared Library Not Found[0m
[2025-05-20 16:59:39,970][[1;36mperun.core[0m][[1;35mbackends[0m][[1;31mERROR[0m] - R0/4:[1;31mUnknown error loading dependecy NVMLBackend[0m
[2025-05-20 16:59:39,970][[1;36mperun.core[0m][[1;35mbackends[0m][[1;31mERROR[0m] - R0/4:[1;31mNVML Shared Library Not Found[0m
[2025-05-20 16:59:39,976][[1;36mperun.core[0m][[1;35mbackends[0m][[1;31mERROR

In the directory reported by ```pwd```, you should see a new directory called ```perun_results```, (might be named ```bench_data``` if the current directory is the heat root directory ) with two files, **sleep.hdf5** and **sleep_<date_and_time>.txt**. 

The file **sleep_<date_and_time>.txt** contains a summary of what was measured on the run, with the average power draw of different hardware componets, memory usage, and the total energy. The available information depends on the available *sensors* that perun finds. You can see a list of the available sensors by running the sensors subcommand:

In [None]:
!perun sensors

[2025-05-20 16:55:39,740][[1;36mperun.core[0m][[1;35mbackends[0m][[1;31mERROR[0m] - R0/1:[1;31mUnknown error loading dependecy NVMLBackend[0m
[2025-05-20 16:55:39,740][[1;36mperun.core[0m][[1;35mbackends[0m][[1;31mERROR[0m] - R0/1:[1;31mNVML Shared Library Not Found[0m
|           Sensor |        Source |          Device |   Unit |
|-----------------:|--------------:|----------------:|-------:|
|  cpu_0_package-0 | powercap_rapl |  DeviceType.CPU |      J |
|       CPU_FREQ_0 |        psutil |  DeviceType.CPU |     Hz |
|       CPU_FREQ_1 |        psutil |  DeviceType.CPU |     Hz |
|       CPU_FREQ_2 |        psutil |  DeviceType.CPU |     Hz |
|       CPU_FREQ_3 |        psutil |  DeviceType.CPU |     Hz |
|       CPU_FREQ_4 |        psutil |  DeviceType.CPU |     Hz |
|       CPU_FREQ_5 |        psutil |  DeviceType.CPU |     Hz |
|       CPU_FREQ_6 |        psutil |  DeviceType.CPU |     Hz |
|       CPU_FREQ_7 |        psutil |  DeviceType.CPU |     Hz |
|        C

The other file, **sleep.hdf5**, contains all the raw data that perun collects, that can be used for later processing. To get an interactive view of the data, navigate to [myhdf5](https://myhdf5.hdfgroup.org), and upload the file there.

This will let you explore the data tree that perun uses to store the hardware information. More info on the data tree can be found on the [data documentation](https://perun.readthedocs.io/en/latest/data.html).

The data that is stored on the hdf5 file can be exported to other formats. Supported formats are text (same as text report), csv, json and bench. Run the cell below to export the last run of the sleep program to csv.

In [None]:
%%bash
perun export perun_results/sleep.hdf5 csv
cat perun_results/sleep_*.csv

,run id,hostname,device_group,sensor,unit,magnitude,timestep,value
0,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,0.0,2021.14599609375
1,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,1.0068829,964.1939697265625
2,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,2.0126529,400.12799072265625
3,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,3.0183434,2600.0
4,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,4.024712,2800.0
5,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,5.0291414,2384.971923828125
6,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,6.033699,1418.0760498046875
7,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,7.0397954,2297.81298828125
8,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,8.047083,2893.419921875
9,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,9.0511675,2456.3759765625
10,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,10.060614,1828.7459716796875
11,0,juan-20w000p2ge,cpu,CPU_FREQ_0,Hz,1000000.0,10.068606,3012.5791015625
12,0,juan-20w000p2ge,cpu,CPU_FREQ_1,Hz,1000000.0,0.0,121

Let's move on to a slightly more interesting example, that we are going to profile in parallel inside our notebook using **ipyparallel**. 

## Setup for a notebook

In [None]:
from ipyparallel import Client
rc = Client(profile="default")
rc.ids

if len(rc.ids) == 0:
    print("No engines found")
else:
    print(f"{len(rc.ids)} engines found")

4 engines found


## Using the perun decorators

perun offers an alternative way to start monitoring your code by using function decorators. The main goal is to isolate the region of the code that you want to monitor inside a function, and decorate it with the ```@perun``` decorator. Now, your code can be started using the normal python command, and perun will start gathering data only when that function is reached.

**Carefull**: For each time the perun decorator is called, it will create a new output file and a new run, which could slow down your code significantly. If the function that you want to monitor will be run more than once, it is better to use the ```@monitor``` decorator. 

Let's look at the example below.

In [None]:
%%px
import sklearn
import heat as ht
from perun import perun, monitor

@monitor()
def data_loading():
    X,_ = sklearn.datasets.load_digits(return_X_y=True)
    return ht.array(X, split=0)

@monitor()
def fitting(X):
    k = 10
    kmeans = ht.cluster.KMeans(n_clusters=k, init="kmeans++")
    kmeans.fit(X)

@perun(log_lvl="WARNING", data_out="perun_data", format="text", sampling_period=0.1)
def main():
    data = data_loading()
    fitting(data)


The example has 3 functions, the ```main``` function with the ```@perun``` decorator, ```fitting``` and ```data_loading``` with the ```@monitor``` decorator. **perun** will start monitoring whenever we run the ```main``` function, and will record the entry and exit time of the other two functions marked with ```@monitor```. 

In [None]:
%%px
main()

The text report will have an extra table with with all the monitored functions, outlining the average runtime, and power draw measured while the application was running, together with other metrics. The data can also be found in the hdf5 file, where the start and stop events of the functions are stored under the regions node of the individual runs. 

If you want more information on perun check the [documentation](https://perun.readthedocs.io/en/latest/?badge=latest) or check the code in [github](https://github.com/Helmholtz-AI-Energy/perun). Thanks!