In [25]:
# ============================================================
# Notebook setup: run this before everything
# ============================================================

%load_ext autoreload
%autoreload 2

# Control figure size
interactive_figures = False
if interactive_figures:
    # Normal behavior
    %matplotlib widget
    figsize=(9, 3)
else:
    # PDF export behavior
    figsize=(14, 5)

import matplotlib import pyplot as plt
from util import util
from core import *
#from scipy.integrate import odeint
import numpy as np
import pandas as pd
import os
#from skopt.space import Space
#from eml.net.reader import keras_reader
from codecarbon import EmissionsTracker
from sklearn.tree import DecisionTreeRegressor

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


ModuleNotFoundError: No module named 'matplotlib'

# Case study: Sustainable Hardware Dimensioning

With widely recognised power-hungry and expensive training algorithms, deep learning has begun to address its carbon footprint. Machine learning (ML) models have grown exponentially in size over the past few years, with some algorithms training for thousands of core-hours, and the associated energy consumption and cost have become a growing concern [Green AI paper]. 

Previous studies have made advances in estimating GHG emissions of computation, and have attempted in providing general and easy-to-use methodologies for estimating carbon footprint that can be applied to any computational task [Green Algorithms paper].

In this work, we explore the dimension of finding the the best Hardware architecture and its dimensioning for AI algorithms, while respecting constraints in terms of carbon emissions. Previous work [HADA paper] has focused on HW Dimensioning for AI algorithms with constraints on budget, time and solution quality. This problem is called Hardware Dimensioning. In this work, we aim at extending this approach by also considering constraints on carbon emissions of the computations, and we name this problem Sustainable Hardware Dimensioning.

The HADA approach is based on the Empirical Model Learning paradigm [EML paper], which integrates Machine Learning (ML) models into an optimisation problem. The key idea is to integrate domain knowledge held by experts with data-driven models that learn the relationships between HW requirements and AI algorithm performances, which would be very complex to express formally in a suitable model. The approach starts with benchmarking multiple AI algorithms on different HW resources, generating data used to train ML models; then, optimisation is used to find the best [HW configuration](https://www.sciencedirect.com/topics/computer-science/hardware-configuration) that respects user-defined constraints.

# Methodology

At the basis of our approach is the Empirical Model Learning (EML) paradigm. Broadly speaking, EML deals with solving declarative optimisation models with a complex component $h$, which represents the relation between variables which can be acted upon $x$ (the decision variables) and the observables related to the system considered; the function $h(x) = y$ describes this relationships. As the $h(x)$ is complex, we cannot optimise directly over it. Hence, we exploit empirical knowledge to build a surrogate model $h_\theta(x)$ learned from data, where $\theta$ is the parameter vector.

HADA (HArdware Dimensioning of AI Algorithms), is then constituted of three main phases

1. data set collection (benchmarking phase) - an initial phase to collect the data set by running multiple times the target algorithms, under different configurations;
2. surrogate model creation - once a training set is available, a set of ML models is then trained on such data and then these models are encoded as a set of variables and constraints following EML paradigm;
3. optimisation – post the user-defined constraints and objective function on top of the combinatorial structure formed by the encoded ML models and the domain-knowledge constraints, and finally solve the optimisation model (either until an optimal solution or a time limit is reached.

## Dataset Collection

The training set was built based on grounding the two stochastic algorithms, i.e., anticipate and contingency [32, 4] from the energy management system domain. The two algorithms calculate the amount of energy that must be produced by the energy system to meet the required load, minimising the total energy cost over the daily time horizon and by taking into account the uncertainty. Both the algorithms divide the daily time horizon into 96 15-minutes time intervals.

### Input data

The input data is a set of 30 different instance realisations, each one representing one daily time horizon. For each day, we have **Load**, which is a 96-valued vector of the load observations sampled at each interval (every 15 minutes over the course of a day), and **PV**, a 96-valued vector representing the observations of available Photovoltaic energy production. Here we can see an example of the first two instances:

Since the objective of the two algorithms is to minimize the total energy cost, as input file we also have the grid electricity price at each our of the day

In [12]:
data = util.display_prices_data()
display(data)

Unnamed: 0,Time,Price
0,0,46.99
1,1,42.81
2,2,39.83
3,3,38.02
4,4,37.0


The base version of HADA involves measuring the solution cost, the runtime and the average memory usage for the target algorithm. In the next section we will see the additional metrics that were added.

### Measuring Carbon Emissions

In order to extend HADA for taking into account sustainability, we need to measure the carbon emissions for running the algorithms during the benchmark phase. A simple tool to do so is [codecarbon](https://mlco2.github.io/codecarbon/index.html), which is a python package offering useful tools for tracking the emissions resulting from executing code execution.

The CO2e emission tracking tool offered by codecarbon can be used in [different modalities](https://mlco2.github.io/codecarbon/usage.html): as an Explicit Object (instantiating a EmissionsTracker object and pass it as a parameter to function calls to start and stop the emissions tracking of the compute section), as a Context Manager (recommended for monitoring a specific code block) or as a Decorator (recommended for monitoring training functions). For example, let's track the emissions of running the ANTICIPATE algorithm. Let's say we would like to solve instance 5 with 4 scenarios:

In [4]:
scenarios = 4
instance = 5
project_name = f"anticipate-ins-{instance}-ns-{scenarios}"
output_dir = '../data/'

# Codecarbon emission tracker
tracker = EmissionsTracker(project_name=project_name, 
                           log_level='ERROR', 
                           output_dir=output_dir)

with tracker as t:
    sol_cost, run_final, mem_final = util.online_ant(scenarios=scenarios, instance=instance, file='InstancesTest.csv')

print(f"The solution cost (in keuro) is: {sol_cost:.2f}")
print(f"The runtime (in sec) is: {run_final:.2f}")
print(f"Avg memory used (in MB) is: {mem_final:.2f}")

Set parameter WLSAccessID
Set parameter WLSSecret
Set parameter LicenseID to value 2512647
Academic license 2512647 - for non-commercial use only - registered to en___@studio.unibo.it
The solution cost (in keuro) is: 370.29
The runtime (in sec) is: 8.24
Avg memory used (in MB) is: 191.15


This tracks the emissions of the `online_ant` function by using the codecarbon `EmissionTracker` as a context manager. By default, codecarbon saves the tracking data to a .csv file, named `emissions.csv`. Let's take a look:

In [11]:
emissions = util.display_emissions_data()
display(emissions)

Unnamed: 0,timestamp,project_name,duration,emissions,emissions_rate,cpu_power,ram_power,cpu_energy,ram_energy,energy_consumed,...,region,os,python_version,codecarbon_version,cpu_count,cpu_model,longitude,latitude,ram_total_size,tracking_mode
0,2024-11-06T10:10:38,anticipate-ins-5-ns-4,14.329743,6e-05,4e-06,42.5,1.433251,0.000169,6e-06,0.000175,...,emilia-romagna,Linux-6.10.4-linuxkit-x86_64-with-glibc2.31,3.9.20,2.3.2,8,Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz,11.3752,44.488,3.822002,machine
1,2024-11-06T12:47:21,anticipate-ins-5-ns-4,13.39602,5.6e-05,4e-06,42.5,1.433252,0.000158,5e-06,0.000163,...,emilia-romagna,Linux-6.10.4-linuxkit-x86_64-with-glibc2.31,3.9.20,2.3.2,8,Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz,11.3752,44.488,3.822006,machine


(**NOTE:** maybe a table is not the most suitable representation) As we can see, Codecarbon keeps track of a series of metrics. For this project, we decided to include the following metrics in the training set to generate:

* `emissions`: the total emissions of CO2eq (kg) (**NOTE:** add a brief explanation about how codecarbon computes the emissions);
* `emission_rate`: the amount of CO2eq emissions per second (kg/s);
* `cpu_energy`: the energy consumed by the cpu;
* `ram_energy`: the energy consumed by the ram;
* `tot_energy`: the total energy consumed;
* `country`, `region`: the country and region where the computation took place;
* `cpu_count`: the number of cores.

### Benchmarking

For the benchmarking phase, the ANTICIPATE and CONTINGECY algorithms were run on each instance 100 times, each time considering a different number of the configurable parameter (from 1 to 100 traces/scenarios). This value is taken directly from the HADA paper, according to which running the algorithms on each instance 100 times sufficiently explores the parameter space [Hada paper, 4]. Then, the training set of each algorithm will be of 3,000 records (100 runs x 30 instances). 

Since for the HADA approach is recommended to collect data relative to different Hardware configurations, i executed the benchmarking phase on my personal laptop [Insert specifications], and on [Leonardo](https://leonardo-supercomputer.cineca.eu/), an HPC System hosted by CINECA. After the benchmark phase, the collected dataset will look something like this:

In [10]:
filename = 'contingency_mbp19.csv'
benchmark_data = util.display_benchmark_data(filename)
display(benchmark_data)

Unnamed: 0,nTraces,sol(keuro),time(sec),memAvg(MB),memPeak(MB),CO2e(kg),CO2eRate(kg/s),cpuEnergy(kW),ramEnergy(kW),totEnergy(kW),country,region,cpuCount
0,1,369.19,7.75,162.51,170.4,7.9e-06,4.31e-06,2.17e-05,1.53e-06,2.32e-05,Italy,emilia-romagna,8
1,2,390.2,10.93,165.13,173.15,3.65e-06,4.31e-06,9.99e-06,7.05e-07,1.07e-05,Italy,emilia-romagna,8
2,3,374.38,9.4,168.86,177.57,2.66e-06,4.31e-06,7.3e-06,5.15e-07,7.82e-06,Italy,emilia-romagna,8
3,4,332.0,10.24,172.66,181.44,2.78e-06,4.31e-06,7.63e-06,5.39e-07,8.17e-06,Italy,emilia-romagna,8
4,5,333.78,11.16,176.31,185.48,2.97e-06,4.31e-06,8.13e-06,5.74e-07,8.71e-06,Italy,emilia-romagna,8
5,6,333.63,12.13,180.02,189.12,3.14e-06,4.31e-06,8.6e-06,6.07e-07,9.21e-06,Italy,emilia-romagna,8
6,7,341.48,13.51,184.0,193.0,3.49e-06,4.31e-06,9.57e-06,6.76e-07,1.02e-05,Italy,emilia-romagna,8
7,8,335.11,15.79,188.27,197.49,4.11e-06,4.31e-06,1.13e-05,7.94e-07,1.21e-05,Italy,emilia-romagna,8
8,9,326.77,15.35,192.16,201.72,3.73e-06,4.31e-06,1.02e-05,7.22e-07,1.1e-05,Italy,emilia-romagna,8
9,10,326.04,16.25,196.11,205.87,4.07e-06,4.31e-06,1.12e-05,7.88e-07,1.19e-05,Italy,emilia-romagna,8


## Data exploration

Let's have a look at the data produced during benchmark phase, to gather some insights

### Load and combine data

We first load the data and add identifying columns to make it easy to filter and compare the data.

In [18]:
# load data for each combination of algorithm and platform
files = {
    "anticipate_mbp19": "anticipate_mbp19.csv",
    "anticipate_leonardo": "anticipate_leonardo.csv",
    "contingency_mbp19": "contingency_mbp19.csv",
    "contingency_leonardo": "contingency_leonardo.csv"
}

# load each files and add identifiers
dataframes = []
for key, file in files.items():
    algorithm, platform = key.split('_')
    df = util.read_benchmark_file(file)
    df['algorithm'] = algorithm
    df['platform'] = platform
    dataframes.append(df)

# concatenate all dataframes into one for analysis
data = pd.concat(dataframes, ignore_index=True)

### Basic data overview

To get a high-level summary of each metric, we run `.describe()` for statistical insights and `.info()` to check data types

In [23]:
# Lets drop some of the columns
columns = ['nScenarios','nTraces','cpuCount']
data_overview = data.drop(columns=columns)

print(data_overview.describe())
print("="*50)
print(data_overview.info())

         sol(keuro)     time(sec)    memAvg(MB)   memPeak(MB)      CO2e(kg)  \
count  10400.000000  10400.000000  10400.000000  10400.000000  1.040000e+04   
mean     358.864333   1303.906542   9767.433452    288.067621  8.593945e-05   
std       78.470274   2456.130215  14250.336679    278.932854  1.553400e-04   
min      233.130000      1.260000     81.930000      0.120000  9.640000e-09   
25%      313.777500     47.642500    358.072500     28.472500  7.830000e-06   
50%      345.160000    102.170000    626.460000    232.750000  3.130000e-05   
75%      372.060000    789.780000  17519.650000    506.260000  4.830000e-05   
max      813.710000   9984.340000  51772.230000   1202.740000  1.380000e-03   

       CO2eRate(kg/s)  cpuEnergy(kW)  ramEnergy(kW)  totEnergy(kW)  
count    1.040000e+04   10400.000000   1.040000e+04   10400.000000  
mean     2.085635e-06       0.001990   2.354033e-03       0.006746  
std      2.013126e-06       0.002189   2.898436e-03       0.007949  
min      3.0

Notice that the min for memPeak is 0.12. That's because the values for the memory peak consumption recorded on leonardo are strangely low, thus also bringing the average down. It is strange also because in the instance with memPeak 0.12, the average memory is higher, which shouldn’t be possible. The benchmark runs on Leonardo should have been repeated. Unfortunately, the recent weather emergency in Emilia-Romagna caused the interruptions of the HPC services offered by Cineca.

Also, notice that we "canada" and "quebec" in many entries for the country and region. That's not because i travelled, even though i'd like to. That is something taken from codecarbon, so there should be some problem there when tracking emissions.

### Categorical analysis

Analyze categorical columns like `algorithm`, `platform`, `country` and `region` to understand the distribution.

In [24]:
# Count distribution for categorical columns
print(data['algorithm'].value_counts())
print(data['platform'].value_counts())
print(data['country'].value_counts())
print(data['region'].value_counts())

contingency    6000
anticipate     4400
Name: algorithm, dtype: int64
mbp19       6000
leonardo    4400
Name: platform, dtype: int64
Canada    5724
Italy     4676
Name: country, dtype: int64
quebec            5724
emilia-romagna    4676
Name: region, dtype: int64


### Exploring performance metrics

Since `sol(keuro)`, `time(sec)`, `memAvg(MB)` and `memPeak(MB)` represent performance metrics, visualize and compare them across algorithms and platforms.