# main

# Exercise 1

- How can you select a column in a pandas dataframe?
- How can you select a row in a pandas dataframe?
- What does the method `.groupby()` in combination with aggregate do? Give an example!
- What does the method `.unstack()`do to a dataframe? Give an example!
- What does it mean to fit a model?
- What does it mean to predict with a model?

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

# Run code before exercises

In [None]:
# donwload the data
def download_attached_files():
    import urllib
    import os.path
    fnames = {
              'entsoe-demand-shortened.pickle': 'https://bokubox.boku.ac.at/index.php/get/4bbdac7c2872bd8cefd4fb6a4267a879/entsoe-demand-shortened.pickle'
    }
    for fname, url in fnames.items():
        if not os.path.exists(fname):
            urllib.request.urlretrieve(url, filename=fname)

download_attached_files()

In [None]:
import glob
import os.path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


from sklearn import datasets, linear_model, ensemble, neural_network
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split

from pathlib import Path

import holidays

def plot_prediction(Y, prediction, alpha=1):
    plt.plot(Y, label="Observation")
    plt.plot(prediction, label="Prediction", alpha=alpha)
    plt.xlabel("Time")
    plt.ylabel("Load (MWh)")
    plt.legend()

In [None]:
def get_hourly_country_data(data, country):
    ret_data = data[data["AreaName"] == country].interpolate() # data may contain NAs, therefore inteprolate
    ret_data = ret_data.resample("1h").mean().interpolate() # not all hours may be  complete 
                                                            # (i.e. some last 15 minutes are lacking, therefore another inpolation here)

    holidays_only = pd.to_datetime(list(holidays.CountryHoliday('AT', years=range(2015,2021)).keys())).sort_values()

    idx_period = ret_data.index

    idx_period.freq = None
    
    all_days_holidays = np.isin(idx_period.date,holidays_only.date)

    ret_data["holidays"] = all_days_holidays
    
    return ret_data

power_demand = pd.read_pickle("entsoe-demand-shortened.pickle")

power_demand_at_hourly = get_hourly_country_data(power_demand, "Austria")["2015-01-01":"2021-04-20"]


# Exercise 2 - model selection

We want to predict Austrian load in 2019 with a model trained on data from other years. In this exercise you should decide which year to use as training data (of course you are not allowed to use 2019). First, rewrite the function `train_test_plot_with_holidays`. Add one parameter `resample`. The parameter tells the function, how the output of the model, i.e. `Y_test` and `prediction` should be resampled before you plot. It is a string such as `1h` or `d`. You can use the `insert` statement for pandas dataframes to insert your prediction into the `power_demand_at_hourly_test` dataframe and then do the resampling before you plot.

Now run the function in a loop, predicting 2019 with all years from 2015 - 2018 for daily and monthly resampling. We do not use 2020/2021 for training, as they are special years due to the lock-downs. 

- In your opinion, which training years are best to predict 2019?
- There is one period that is very badly predicted during the year, independent of the training year. Which one? What could be the reason? Can you think of a way of how to improve the fit during this period?

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

# Exercise 3 - The impact of lockdowns

Here we want to understand how observed load in 2020 and 2021 differed from a prediction of load, trained on 2019 data. The idea is that the difference between observation and prediction allows to estimate the impact of the lock-downs on a temporal scale.

Adapt the function developed in Exercise 1 so that it returns the sum of the observed load and the sum of the predicted load for the whole period in a list, i.e. the result of the function is a list with two elements.

- Run the model for 2020, using 2019 data for training. Save the result of the function in `predicted_load_2020`. By how many % was predicted load lower than observed load for 2020?
- Run the model for 2021, using 2019 data for training. Save the result of the function in `predicted_load_2021`. By how many % was predicted load lower than observed load for 2021?
- Now simply calculate the ratio of the sum of load in 2019 and 2020 and also of January-March 2019 and January-March 2020. Is this similar to what you have calculated with your model? Where may be differences?
- What's your take-home-message from the exercise. Did Corona have a significant impact on electricity consumption? How did the temporal difference develop - is this related to the lockdowns (change the temporal aggregation to a format that best allows you to see differences over time)? Are there maybe other reasons than the lock-downs for the difference?

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
assert round(predicted_load_2020[0] / predicted_load_2020[1], 2) == 0.96

# Exercise 4 - A European comparison

Which of the countries `["Austria", "Germany", "France", "United Kingdom", "Italy", "Slovenia"]` suffered most from the lockdowns - and was 2020 really a particularly bad year in terms of electricity demand? We use a very simple model and simply calculate the ratio of the maximum load in the period 2015-2020 to 2020 for countries in the dataset and plot this ratio.

- First, select only the countries given above. Hint: `.isin()` may be useful here.
- Second, reduce the timeseries to 2015 until 2020.
- Group by country and year and calculate the mean load
- Unstack your table so that each column is a country and each row is a year. Hint: the method `.unstack()` has a parameter that allows to select the index on which to unstack.
- Divide each column of the unstacked table by the demand in 2019. Hint: you can select the 2019 data by using the `[]` operator and choosing the rows where the index is equal to 2019.
- Plot the resulting table.

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #