## Preparation: Download data
We use a rather extensive data set during this homework. Below, there is automatic download code for that data. The download needs to be done only once, since Datalore stores the file in the Notebook files.

In [None]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

In [None]:
# workaround: Datalore does not allow to publish attached files, so we have to download it.
def download_attached_files():
    import urllib
    import os.path
    fnames = {
              'entsoe-demand-shortened.pickle': 'https://files.boku.ac.at/filr/public-link/file-download/0d7483c9959b20360196809f11ff2d67/18707/-4160977441044749444/entsoe-demand-shortened.pickle'
    }
    for fname, url in fnames.items():
        if not os.path.exists(fname):
            print(f'Downloading: {url}')
            urllib.request.urlretrieve(url, filename=fname)
            print(f'Download finished!')
        else:
            print("File already exists, not downloading again.")

download_attached_files()

In [None]:
power_demand = pd.read_pickle("entsoe-demand-shortened.pickle")

countries_selected = ["AT CTY", "DE CTY", "UK CTY", "ES CTY", "SE CTY", "IT CTY", "HR CTY"]
power_demand= power_demand[power_demand.AreaName.isin(countries_selected)]

power_demand = power_demand.set_index("AreaName", append=True)
power_demand = power_demand.groupby([pd.Grouper(level='AreaName'), 
            pd.Grouper(level='DateTime', freq='1h')]
          ).mean(numeric_only=True)

power_demand = power_demand.unstack(level=-2)

power_demand = power_demand.T.reset_index(level=0, drop=True).T

# Exercise 1 - Pandas

- How can you select a column in a pandas dataframe?
- How can you select a row in a pandas dataframe?
- What is a pandas dataframe index?
- Give a concrete example, how the pandas command `.groupby().mean()` works, i.e. describe an example with a table and show what the command does. Do not provide python code, but report how a table looks like before and after using the command.

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

# Exercise 2 - Calculate the relation of Wednesday average consumption to Sunday average consumption for selected countries

In this exercise, calculate the relation of Wednesday average consumption to Sunday average consumption for the following countries: Austria, Germany, United Kingdom, Spain, Sweden, Italy, Croatia. Use the data set `power_demand`from above.

(1) First reduce the data to only consider the period 2015-01-01 until 2024-12-31. The lecture slides may contain relevant code here.

(2) Then, group the data by weekday to calculate the mean per weekday. Use groupby and mean for that purpose.

(3) Calculate for all countries the proportion of Wednesday (day 2) and Sunday (day 6) by dividing the two values and store it in the variable `relation_wednesday_sunday`. Round the values to 2 digits.

(4) For which country, this relative value is highest? What could this indicate?

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # RUN THIS CELL TO CHECK YOUR RESULTS # # # # # 

from urllib.request import urlretrieve
import os.path
if not os.path.exists('check.py'):
    urlretrieve('https://files.boku.ac.at/filr/public-link/file-download/0d7483b89572915c0195813e2edf66f6/18652/4113939315873631059/check.py', filename='check.py')
from check import check_solution
check_solution([
    ("relation_wednesday_sunday[\"AT CTY\"]", 1.26)
], globals())

# Exercise 3 - Calculate the monthly and average consumption as deviation from mean consumption

For the same countries as in exercise 2, calculate the monthly mean consumption as percentage of the mean of consumption over the whole time. Plot the curves for all countries. Please use only data in the period 2015-01-01 to 2024-12-31.

(1) Calculate the average consumption by month by country. Use groupby and mean for that purpose.

(2) Calculate the average consumption by country by calculating the mean of each column. 

(3) Divide the result of (1) by (2) and observe how well broadcasting works here. Save the result in the variable `monthly_deviation_country_mw`.

(4) You can directly plot the data.frame using `plot` on your dataframe object. The parameter `figsize` can be used to control the size of the figure. Google it to find out how to change this parameter to increase the figure size!

(5) How would you explain the difference in the curve between Croatia and Sweden?

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # RUN THIS CELL TO CHECK YOUR RESULTS # # # # # 

from urllib.request import urlretrieve
import os.path
if not os.path.exists('check.py'):
    urlretrieve('https://files.boku.ac.at/filr/public-link/file-download/0d7483b89572915c0195813e2edf66f6/18652/4113939315873631059/check.py', filename='check.py')
from check import check_solution
check_solution([
    ("monthly_deviation_country_mw[\"AT CTY\"].iloc[1]", 1.12)
], globals())

# Exercise 4 - Calculate the average load per capita

Below you find a table with population data for our selected countries. Use it to calculate per capita consumption. Use only demand data in the period 2015-01-01 to 2024-12-31. The demand in the dataframe `power_demand` is in MW.

(1) Calculate the average load for each countries by taking the average of each column.

(2) Divide the result by the Population column of the dataframe population, convert to KW and the save the result in `per_capita_load_kw`. Observe, how broadcasting helps here nicely.

(3) Plot the result as barplot. Use google to find the right function for a bar plot with pandas. Which country has the highest load, which the lowest? What may be the reason? 

(4) Convert the load given in kW per capita to MWh per year per capita, assuming that a year has 8760 hours, i.e. neglecting leap years. Store the result in `per_capita_energy_mwh` and also plot the result.

In [None]:
# number of inhabitants per country
population = pd.Series([8840521, 4087843, 82905782, 60421760, 46796540, 10175214, 66460344],
                        index=["AT CTY", "HR CTY", "DE CTY", "IT CTY", "ES CTY", "SE CTY", "UK CTY"])

# Note that a pandas.Series is a like a signle column of a pandas.Dataframe, similar to a 1d-numpy object.

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # RUN THIS CELL TO CHECK YOUR RESULTS # # # # # 

from urllib.request import urlretrieve
import os.path
if not os.path.exists('check.py'):
    urlretrieve('https://files.boku.ac.at/filr/public-link/file-download/0d7483b89572915c0195813e2edf66f6/18652/4113939315873631059/check.py', filename='check.py')
from check import check_solution
check_solution([
    ("per_capita_energy_mwh[\"AT CTY\"]", 6.95),
    ("per_capita_load_kw[\"AT CTY\"]",0.79)
], globals())

# Exercise 5 - Calculate the hourly average consumption as deviation from mean consumption (optional bonus exercise)

Do the same as in exercise 2, but instead of monthly consumption, derive the the hourly average consumption for each hour of the day (i.e. 24 hours). Save the result in `hourly_deviation_country_mw`. I.e. how much is consumed on each of the 24 hours of a day? Please use only data in the period 2015-01-01 to 2024-12-31.

Which country has the lowest, which the highest variability? What may be the reason for it?

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # RUN THIS CELL TO CHECK YOUR RESULTS # # # # # 

from urllib.request import urlretrieve
import os.path
if not os.path.exists('check.py'):
    urlretrieve('https://files.boku.ac.at/filr/public-link/file-download/0d7483b89572915c0195813e2edf66f6/18652/4113939315873631059/check.py', filename='check.py')
from check import check_solution
check_solution([
    ("hourly_deviation_country_mw[\"AT CTY\"].loc[0]", 0.80)
], globals())