**Data Science and AI for Energy Systems** 

Karlsruhe Institute of Technology

Institute of Automation and Applied Informatics

Summer Term 2024

---

# Exercise I: Introduction to Data Science and AI for Energy Systems

**Imports**

In [9]:
import pandas as pd
import numpy as np
import scipy as sc
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import os

## PROBLEM I.2 (PROGRAMMING) – INTRODUCTION TO DATA ANALYSIS FOR FREQUENCY DYNAMICS

A central element of this lecture is the analysis of empirical data. In this exercise, we introduce some analysis methods to an empirical time series of frequency measurement data. In especially we focus on some important indicators of frequency stability, namely the largest frequency deviation (Nadir), the aggregated deviation (integral), the mean-square-displacement (MSD) and the largest change of frequency (RoCoF = rate of change of frequency). <br>
Given a time series of frequency deviation data (i.e. a time series of frequency data minus the reference frequency), we calculate the indicators for certain time intervals $$I_i = \{t_i,t_{i+1}+\tau,\ldots,t_i+\tau \gamma\},$$ where $\tau$ is the time resolution, $\gamma$ represents the size of the interval, and $t_i$ is the starting point of the $i\text{th}$ interval.<br>
In our example, we consider hourly time steps with $\gamma = 3600$, a time resolution $\tau = 1\text{s}$, and $t_i$ denotes the start of the $i\text{th}$ hour of the data.
Given the frequency time series $f(t)$, the indicators are then given as 
\begin{align*}
    \text{Nadir}(t_i) &= f(\text{argmax}_{t\in I_i}|f(t)|),\\
    \text{Integral}(t_i) &= \tau \sum_{t\in I_i}f(t),\\
    \text{MSD}(t_i) &= \tau \sum_{t\in I_i}f^2(t), \text{ and}\\
    \text{RoCoF}(t_i) &= \frac{df}{dt}\left(\text{argmax}_{t \in W_i}\left|\frac{df}{dt}\right|\right), 
\end{align*}
where $W_i = [t_i - T, t_i + T]$ represents a time window around $t_i$ for a fixed window size $T$.

#### (a) Load the file *frequency_CE_ex1.csv* which contains a time series of frequency deviation data from the Continental European power grid and plot the entire time series as well as a histogram to get an impression of the distribution of the time series.

In [13]:
'''We assumes that the file is located in a subfolder called 'data': '''
df = pd.read_csv('data/frequency_CE_ex1.csv', index_col='Datetime', parse_dates=True) 
'''Continue with plotting the data and the frequency distribution: '''


#### (b) Compute and plot the daily profile of the time series. The daily profile is defined as the average over the entire data set for each time step of each day, i.e. over all the days in the data set, and plot the result. Can you observe a pattern? What could be a possible reason for a pattern?

#### (c) Calculate the Nadir, the integral, and the MSD of the frequency deviations for each hour of the dataset and plot the distributions in a histogram. Further, compute the mean of the indicators for each hour of the day (for example the time range 6 am - 7 am) and compare the results. <br>
 Hint: Use the method *pandas.groupby*.

#### (d) Calculate analogously to (c) the RoCoF for all hours of the time series using $T=60\text{s}$, and plot the distribution in a histogram. Further, compute the mean of the RoCoF for each hour of the day (for example the time range 6 am - 7 am) and compare the results. <br> To flatten the signal, you can estimate the derivative $\frac{df}{dt}(t)$ using a low-pass filter on the frequency increments $\Delta f(t) = f(t)-f(t-\tau)$ with a rectangular rolling window of length $L=60\text{s}$, i.e. take for each time step the average of an interval of size $L$ around the time step. Compare your results using the original and the flattened derivative.