<img style="float: right;" src="https://www.gakhov.com/static/gakhov_logo_text.svg">

<dl>
  <dt>CS-GH301/2018: Introduction to Time Series Forecasting with Python</dt>
  <dd>Dr. Andrii Gakhov</dd>
</dl>



------------------


# Lecture 1.4: Motivation Example

In this Jupyter Notebook we make a brief preliminary analysis of the time series to confirm the [ADAC report](https://www.adac.de/der-adac/verein/reifenbreite/spritpreise-schwankungen/) about petrol price distribution during the day in Germany.

## Analysis of dataset "Benzin Price"

> The dataset represents the hourly price rates for Super 95 petrol in Berlin, Germany (in EUR per 100 liters)
> from 12:00 19 September 2018 to 12:00 November 2018. Source: [gakhov.com](https://www.gakhov.com/datasets/benzin-price-2018.html)

In [None]:
import warnings
warnings.filterwarnings('ignore')

import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set(style="ticks")

#### Step 1. Load the dataset

In [None]:
filepath = "data/benzin-price.csv"

In [None]:
import pandas as pd

df = pd.read_csv(filepath, header=None, skiprows=0, parse_dates=[0], names=['timestamp', 'price'], index_col=0)
df.price.astype(float, copy=False);

In [None]:
df.head(5)

#### Step 2. Plot the dataset

In [None]:
fig, ax = plt.subplots(figsize=(18,6))

df.plot(ax=ax)
plt.legend(loc='upper left');

#### Step 3. Zoom days

In [None]:
days = df.resample('D', kind='timestamp').sample(n=5, random_state=73).index.tolist()
print(days)

In [None]:
fig, axes = plt.subplots(nrows=5, ncols=1, figsize=(18,18))

for index, day in enumerate(days):
    start_datetime = day
    end_datetime = day + pd.Timedelta('1 days')
    zoom_range = (df.index >= start_datetime) & (df.index < end_datetime)
    df[zoom_range].plot(
        y='price',
        kind='line',
        ax=axes[index],
        title=day.strftime('%Y-%m-%d'))


#### Step 4. Improve the visualization

In [None]:
fig, axes = plt.subplots(nrows=5, ncols=1, figsize=(18,18))

adac_periods = {
    'low': [(15, 17), (19, 22)],
    'high': [(6, 9), (12, 15), (17, 19)]
}

for index, day in enumerate(days):
    start_datetime = day
    end_datetime = day + pd.Timedelta('1 days')
    zoom_range = (df.index >= start_datetime) & (df.index < end_datetime)
        
    df[zoom_range].plot(
        y='price',
        kind='line',
        drawstyle='steps-post',
        ax=axes[index],
        title=day.strftime('%Y-%m-%d'))

    day_avg = df[zoom_range].price.mean() # compute the average for the current period
    axes[index].axhline(y=day_avg, color='grey', linestyle='--')
    
    for (hour_start, hour_end) in adac_periods['low']:
        start = day.replace(hour=hour_start)
        end = day.replace(hour=hour_end)
        axes[index].axvspan(start, end, color='green', alpha=0.1)

    for (hour_start, hour_end) in adac_periods['high']:
        start = day.replace(hour=hour_start)
        end = day.replace(hour=hour_end)
        axes[index].axvspan(start, end, color='red', alpha=0.1)
