In [1]:
# ============================================================
# Notebook setup: run this before everything
# ============================================================

%load_ext autoreload
%autoreload 2

# Control figure size
interactive_figures = False
if interactive_figures:
    # Normal behavior
    %matplotlib widget
    figsize=(9, 3)
else:
    # PDF export behavior
    figsize=(14, 5)

#from matplotlib import pyplot as plt
from util import util
#from scipy.integrate import odeint
import numpy as np
import pandas as pd
#from skopt.space import Space
#from eml.net.reader import keras_reader

# Case study: Sustainable Hardware Dimensioning

With widely recognised power-hungry and expensive training algorithms, deep learning has begun to address its carbon footprint. Machine learning (ML) models have grown exponentially in size over the past few years, with some algorithms training for thousands of core-hours, and the associated energy consumption and cost have become a growing concern [Green AI paper]. 

Previous studies have made advances in estimating GHG emissions of computation, and have attempted in providing general and easy-to-use methodologies for estimating carbon footprint that can be applied to any computational task [Green Algorithms paper].

In this work, we explore the dimension of finding the the best Hardware architecture and its dimensioning for AI algorithms, while respecting constraints in terms of carbon emissions. Previous work [HADA paper] has focused on HW Dimensioning for AI algorithms with constraints on budget, time and solution quality. This problem is called Hardware Dimensioning. In this work, we aim at extending this approach by also considering constraints on carbon emissions of the computations, and we name this problem Sustainable Hardware Dimensioning.

The HADA approach is based on the Empirical Model Learning paradigm [EML paper], which integrates Machine Learning (ML) models into an optimisation problem. The key idea is to integrate domain knowledge held by experts with data-driven models that learn the relationships between HW requirements and AI algorithm performances, which would be very complex to express formally in a suitable model. The approach starts with benchmarking multiple AI algorithms on different HW resources, generating data used to train ML models; then, optimisation is used to find the best [HW configuration](https://www.sciencedirect.com/topics/computer-science/hardware-configuration) that respects user-defined constraints.

# Methodology

At the basis of our approach is the Empirical Model Learning (EML) paradigm. Broadly speaking, EML deals with solving declarative optimisation models with a complex component $h$, which represents the relation between variables which can be acted upon $x$ (the decision variables) and the observables related to the system considered; the function $h(x) = y$ describes this relationships. As the $h(x)$ is complex, we cannot optimise directly over it. Hence, we exploit empirical knowledge to build a surrogate model $h_\theta(x)$ learned from data, where $\theta$ is the parameter vector.

HADA (HArdware Dimensioning of AI Algorithms), is then constituted of three main phases

1. data set collection (benchmarking phase) - an initial phase to collect the data set by running multiple times the target algorithms, under different configurations;
2. surrogate model creation - once a training set is available, a set of ML models is then trained on such data and then these models are encoded as a set of variables and constraints following EML paradigm;
3. optimisation – post the user-defined constraints and objective function on top of the combinatorial structure formed by the encoded ML models and the domain-knowledge constraints, and finally solve the optimisation model (either until an optimal solution or a time limit is reached.

## Dataset Collection

The training set was built based on grounding the two stochastic algorithms, i.e., anticipate and contingency [32, 4] from the energy management system domain. The two algorithms calculate the amount of energy that must be produced by the energy system to meet the required load, minimising the total energy cost over the daily time horizon and by taking into account the uncertainty. Both the algorithms divide the daily time horizon into 96 15-minutes time intervals.

### Input data

The input data is a set of 30 different instance realisations, each one representing one daily time horizon. For each day, we have **Load**, which is a 96-valued vector of the load observations sampled at each interval (every 15 minutes over the course of a day), and **PV**, a 96-valued vector representing the observations of available Photovoltaic energy production. Here we can see an example of the first two instances:

In [2]:
# Data exploration: look at the input data
data = util.load_instances_data()
display(data)

Unnamed: 0,PV(kW),Load(kW)
0,[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.  0. 0. 0. 10.4 20.4 29.8 25.9 37.6 54.9 58.6 114.2 122.2  241.2 288.2 265. 293.7 387.4 563. 560.9 447.7 569.6 555.8 307.2 693.7  718.5 767.9 664.5 724. 651.2 692. 650.5 673.5 378.8 720.2 654.4 417.8  546. 403.8 715. 517.1 581.9 669.2 629.5 672.7 640.5 653.5 655.6 655.1  416.2 318.9 569.6 523.5 511. 485.5 314.1 265.8 153.8 93. 110.5 148.  275.5 223.1 88.2 71. 58.8 31.8 32.8 12.9 21.3 23.9 20.4 0.  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ],[106.9 130.3 140.2 98.7 85. 127.4 132. 115.3 130.3 89.4 84.3 90.9  104.7 97. 97.2 95.3 81.6 107.3 82. 102.6 108.9 81.2 108. 83.7  149.4 172.7 181.9 202.5 259.5 330.1 375.8 365.2 307.7 442.7 299.7 239.9  287.7 367.2 305.9 256. 322.4 399.1 276.6 258.9 281.1 266.7 306.2 289.1  296.5 346.8 235.7 236.3 240.1 228.2 277.4 251.5 274.8 291.5 325.6 363.3  317. 331.7 293. 346.5 385.5 384.1 351.7 395.3 596.6 443.5 585.6 561.8  491.7 616.5 562.3 438.4 432.3 563.6 628.8 448.1 581.8 531.6 551.6 433.3  701.6 635.7 580.8 544.6 561.5 656.2 408.2 414.3 387.8 353.7 311.2 230.4]
1,[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.  0. 0. 0. 14.4 20.4 29.8 25.9 42. 56.7 60.7 114.2 133.5  253.6 288.2 265. 293.7 387.4 519.1 560.9 457.3 532.3 532.6 269.8 723.3  718.5 767.9 664.5 637.1 651.2 655.9 657.4 770.6 378.8 698.6 654.4 417.8  546. 403.8 647.2 517.1 581.9 669.2 629.5 672.7 640.5 639.1 659.8 642.2  416.2 318.9 569.6 509.2 491.1 461. 314.1 246.7 151.2 102.4 110.5 148.  275.5 233.9 88.2 71. 64.1 31.8 31.6 12.9 25.3 20. 20.4 0.  0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ],[106.9 130.3 128.8 98.7 85. 126.2 109.2 144.6 130.3 92.9 84.3 96.9  121.3 73.9 109. 95.3 90.1 95.8 82. 112.4 108.9 103.1 108. 96.5  149.4 174.5 181.9 202.5 242. 330.1 375.8 399.2 428.6 325.8 256.3 334.8  306.5 367.2 286. 264.8 322.4 399.1 276.6 258.9 281.1 337.4 329.7 289.1  341.8 300.4 235.7 236.3 240.1 228.2 262.2 251.5 274.8 291.5 325.6 363.3  397.9 331.7 305.6 346.5 385.5 458.4 351.7 395.3 456.4 443.5 553.9 561.8  491.7 542.3 562.3 438.4 432.3 508.6 536. 448.1 435.7 538.2 436.5 433.3  701.6 682.1 580.8 544.6 561.5 655.9 533.5 414.3 399.7 353.7 311.2 230.4]


Since the objective of the two algorithms is to minimize the total energy cost, as input file we also have the grid electricity price at each our of the day

In [3]:
data = util.load_prices_data()
display(data)

Unnamed: 0,Time,Price
0,0,46.99
1,1,42.81
2,2,39.83
3,3,38.02
4,4,37.0


Each algorithm was then run on each instance 100 times, each time considering a different number of the configurable parameter (from 1 to 100 traces/scenarios). This value is taken directly from the HADA paper, according to which running the algorithms on each instance 100 times sufficiently explores the parameter space [Hada paper, 4]. Then, the training set of each algorithm will be of 3,000 records (100 runs x 30 instances).

### Measuring Carbon Emissions

In order to extend HADA for taking into account sustainability, we need to measure the carbon emissions for running the algorithms during the benchmark phase. A simple tool to do so is [codecarbon](https://mlco2.github.io/codecarbon/index.html), which is a python package offering useful tools for tracking the emissions resulting from executing code execution.

The CO2e emission tracking tool offered by codecarbon can be used in [different modalities](https://mlco2.github.io/codecarbon/usage.html): as an Explicit Object (instantiating a EmissionsTracker object and pass it as a parameter to function calls to start and stop the emissions tracking of the compute section), as a Context Manager (recommended for monitoring a specific code block) or as a Decorator (recommended for monitoring training functions). For example, let's track the emissions of running the ANTICIPATE algorithm. Let's say we would like to solve instance 5 with 4 scenarios:

In [4]:
# Tracking of the algorithms emissions
util.online_ant(scenarios=4, instance=5, file='InstancesTest.csv')

Set parameter WLSAccessID
Set parameter WLSSecret
Set parameter LicenseID to value 2512647
Academic license 2512647 - for non-commercial use only - registered to en___@studio.unibo.it
The solution cost (in keuro) is: 370.29
The runtime (in sec) is: 12.59
Avg memory used (in MB) is: 139.02
