#  Time Series Analysis

Energy Systems 

TU Berlin
***


# Imports

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('bmh')
%matplotlib inline

***
# Introductory Comments

## Getting Help

Executing cells with Shift-Enter and with `h` there is help.

Help is available with `.<TAB>` or `load.sort_values()` <- cursor between brackets, `Shift-<TAB>`

## Using one-dimensional arrays (Numpy and Pandas)

**Numpy**

In [2]:
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]:
a[1:3]

array([1, 2])

**Pandas**

In [4]:
s = pd.Series(np.random.random(3), index=['foo', 'bar', 'baz'])
s

foo    0.903773
bar    0.549852
baz    0.160605
dtype: float64

In [5]:
s["foo":"bar"]

foo    0.903773
bar    0.549852
dtype: float64

## Using two-dimensional arrays (Numpy and Pandas)

**Numpy** 

In [6]:
np.random.random((3,5))

array([[0.51830195, 0.69753092, 0.44532721, 0.23668576, 0.02572424],
       [0.08049763, 0.67027866, 0.05185742, 0.55377779, 0.58807674],
       [0.41263932, 0.10634426, 0.5192817 , 0.89140937, 0.77065472]])

**Pandas**

In [7]:
s = pd.DataFrame(np.random.random((3,5)),
                 index=['foo', 'bar', 'baz'],
                 columns=['colA', 'colB', 'colC', 'colD', 'colE'])
s

Unnamed: 0,colA,colB,colC,colD,colE
foo,0.64772,0.991518,0.173263,0.766106,0.292996
bar,0.522912,0.709316,0.706054,0.085725,0.956925
baz,0.924356,0.167063,0.175879,0.089804,0.875475


In [8]:
s.mean()

colA    0.698329
colB    0.622632
colC    0.351732
colD    0.313878
colE    0.708465
dtype: float64

In [9]:
s.mean(axis=1)

foo    0.574320
bar    0.596186
baz    0.446515
dtype: float64

***
# Problem I.1

The following data are made available to you on the repository in the `./data` directory:

`de_data.csv`, `gb_data.csv`, `eu_data.csv`
and alternatively
`wind.csv`, `solar.csv`, `load.csv`

They describe (quasi-real) time series for wind power generation $W(t)$, solar power generation $S(t)$ and load $L(t)$ in Great Britain (GB), Germany (DE) and Europe (EU). The time step is 1 h and the time series are several years long.

> Remark: In this example notebook, we only look at Germany and the EU, Great Britain works in exactly the same way.

***
**Read Data**

In [10]:
de = pd.read_csv('data/de_data.csv', parse_dates=True, index_col=0)
eu = pd.read_csv('data/eu_data.csv', parse_dates=True, index_col=0)
gb = pd.read_csv('data/gb_data.csv', parse_dates=True, index_col=0)

In [11]:
wind = pd.read_csv('data/wind.csv', parse_dates=True, index_col=0)
solar = pd.read_csv('data/solar.csv', parse_dates=True, index_col=0)
load = pd.read_csv('data/load.csv', parse_dates=True, index_col=0)

Extra: Show the first and last 5 lines (header) of the German data:

In [12]:
de.head()

Unnamed: 0_level_0,wind,solar,load
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2011-01-01 00:00:00,0.535144,0.0,46209.0
2011-01-01 01:00:00,0.580456,0.0,44236.0
2011-01-01 02:00:00,0.603605,0.0,42502.0
2011-01-01 03:00:00,0.614114,0.0,41479.0
2011-01-01 04:00:00,0.627257,0.0,39923.0


In [13]:
de.tail()

Unnamed: 0_level_0,wind,solar,load
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014-12-31 19:00:00,0.191347,0.0,50365.0
2014-12-31 20:00:00,0.220209,0.0,48725.0
2014-12-31 21:00:00,0.247598,0.0,49074.0
2014-12-31 22:00:00,0.273812,0.0,47667.0
2014-12-31 23:00:00,0.295076,0.0,47667.0


Extra: Check that wind, solar and load files are just differently organized datasets and it's the same data:

In [14]:
(wind['DE'] == de['wind']).all()

True

Extra: How many years does the dataset comprise?

In [15]:
de.index

DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 01:00:00',
               '2011-01-01 02:00:00', '2011-01-01 03:00:00',
               '2011-01-01 04:00:00', '2011-01-01 05:00:00',
               '2011-01-01 06:00:00', '2011-01-01 07:00:00',
               '2011-01-01 08:00:00', '2011-01-01 09:00:00',
               ...
               '2014-12-31 14:00:00', '2014-12-31 15:00:00',
               '2014-12-31 16:00:00', '2014-12-31 17:00:00',
               '2014-12-31 18:00:00', '2014-12-31 19:00:00',
               '2014-12-31 20:00:00', '2014-12-31 21:00:00',
               '2014-12-31 22:00:00', '2014-12-31 23:00:00'],
              dtype='datetime64[ns]', name='time', length=35064, freq=None)

Data set includes four years ranging from `2011-01-01` until `2014-12-31`.

***
**(a) Check that the wind and solar time series are normalized to ’per-unit of installed capacity’,
and that the load time series is normalized to MW.**

***
**(b) For all three regions, calculate the maximum, mean, and variance of the time series.**

***
**(c) For all three regions, plot the time series $W (t)$, $S(t)$, $L(t)$ for a winter month (January) and a summer month (July).**

Extra: Also compare the wind between the different regions

***
**(d) Resample the time series to daily, weekly and monthly data points and visualise them in plots. Can you identify some recurring patterns?**

> **Hint:** Use the function [`.resample`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html) with `.mean`.

Wind:

Solar:

Load:

***
**(e) For all three regions, plot the duration curve for $W(t)$, $S(t)$, $L(t)$.** 
> **Hint:** You might want to make use of the functions [`.sort_values`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html) and [`.reset_index`](https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.reset_index.html)

> **Tip:** Go through the line `de['wind'].sort_values(ascending=False).reset_index(drop=True).plot()` dot by dot and note what happens to the output.

***
**(f) For all three regions, plot the probability density function of $W(t)$, $S(t)$, $L(t)$.**

There are two different methods:
1. [Histograms](https://en.wikipedia.org/wiki/Histogram) and 
2. [Kernel density estimation (KDE)](https://en.wikipedia.org/wiki/Kernel_density_estimation).

This [image](https://en.wikipedia.org/wiki/Kernel_density_estimation#/media/File:Comparison_of_1D_histogram_and_KDE.png) on the KDE page provides a good summary of the differences. You can do both with `pandas`!

First, let's look at the wind data:

Now, let's look at the solar data:

The solar data might be hard to see. Look at this in detail by limiting the y-axis to (0,2):

Finally, let's look at the load profile:

***
**(g) Apply a [(Fast) Fourier Transform](https://en.wikipedia.org/wiki/Fast_Fourier_transform) to the the three time series $X \in W(t), S(t), L(t)$:**

$$\tilde{X}(\omega) = \int_0^T X(t) \;e^{i\omega t} \;\mathrm{d}t.$$

**For all three regions, plot the energy spectrum $\|\tilde{X}(\omega)\|^2$ as a function of $\omega$. Discuss the relationship of these results with the findings obtained in (b)-(f).**

> **Remark:** Use the function [`numpy.fft.rfft`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.fft.rfft.html) and make sure you subtract the mean since the Fourier transformation requires the time series to have a zero mean to decompose it into its frequencies.

> **Remark:** To determine the frequencies [`numpy.fft.rfffreq`](https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.fft.rfftfreq.html) is used, the argument `d` indicates the distance between two data points, `1h` hour, which we specify as $\frac{1}{8760} a$, so that the frequencies come out in the unit $\frac{1}{a}$.

***
**(h) Normalize the time series to one, so that $\langle{W}\rangle = \langle{S}\rangle = \langle{L}\rangle = 1$.**

**Now, for all three regions, plot the mismatch time series**
  
  $$\Delta(t) = \gamma \alpha W(t) + \gamma (1 - \alpha) S(t) - L(t) $$
  
**for the same winter and summer months as in (c). Choose** 
1. $\alpha \in \{0.0, 0.5, 0.75, 1.0\}$ with $\gamma = 1$, and 
2. $\gamma \in \{0.5, 0.75, 1.0, 1.25, 1.5\}$ with $\alpha = 0.75$.

**What is the interpretation of $\gamma$ and $\alpha$?**

**Which configuration entails the lowest mismatch on average and in extremes?**

Choose the country and alpha, gamma values and re-run:

In [16]:
d = de
gamma = 1.0
alpha = 0

Normalize the time series and calculate mismatch time series:

Plot the mismatch time series for the winter and summer months:

***
**(i) For all three regions, repeat (b)-(g) for the mismatch time series. What changed?**

**Statistics**

**Time series plot**

**Duration curve**

**Probability density function**

**Fast Fourier Transform**