WISO100303 / Johannes Schmidt & Peter Regner

# **An introduction to scientific programming**

<br> <br> <br> <br><br> <br> <br> <br>

# Please provide feedback to us!

On BOKU Online - and on menti:

<a href="https://www.menti.com/hvyn9aw7r9" target="_blank">https://www.menti.com/hvyn9aw7r9</a>

<a href="https://www.menti.com/dmzbk2193i" target="_blank">https://www.menti.com/dmzbk2193i</a>

# Final test: hints at end of lecture




# Predicting electricity demand

Predicting electricity demand is extremely important in the operation of power systems. Only if system operators know, which demand to expect, they can schedule resources (i.e. generation and transmission facilities).

Today, we develop a very simple model which can be used to predict electricity demand. We will not use it for operational forecasting but will try to understand how the corona pandemic and high gas and electricity prices in 2022 and 2023 have impacted electricity demand in a very simple approach.



# Scikit Learn
For that purpose, we use scikit learn. It is a machine learning library in Python. Builds on numpy, matplotlib and SciPy. It can be used for 
- classification (i.e. is there a cat on a photo - although caution, there are better libraries to do that)
- regression (i.e. how high will the electricity demand be under given conditions?), 
- and clustering (i.e. which objects belong to the same group?)

It provides a huge toolbox for model selection (i.e. which of my models performs best?), data preprocessing (e.g. normalization), and dimensionality reduction.

We will have a very brief glimpse into scikit learn only - and we will only use one particular fitting algorith, random forests without further explanation. If you are interested in algorithmic details, please check this video for a brief introduction [for details](https://www.youtube.com/watch?v=v6VJ2RO66Ag&t=33s).

# Additional information about regression and the regression algorithm we use, i.e. random forests 

[Introduction to Machine Learning for Beginners](https://towardsdatascience.com/introduction-to-machine-learning-for-beginners-eed6024fdb08)

[An Introduction to Machine Learning Theory and Its Applications: A Visual Tutorial with Examples](https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer)

[Regression in Machine Learning](https://medium.datadriveninvestor.com/regression-in-machine-learning-296caae933ec)

# File download

We need again the pickle file to have access to the load data. Please run the cell below therefore.

In [None]:
# workaround: Datalore does not allow to publish attached files, so we have to download it.
def download_attached_files():
    import urllib
    import os.path
    fnames = {
              'entsoe-demand-shortened.pickle': 'https://files.boku.ac.at/filr/public-link/file-download/0d7483c9959b20360196809f11ff2d67/18707/-4160977441044749444/entsoe-demand-shortened.pickle'
    }
    for fname, url in fnames.items():
        if not os.path.exists(fname):
            print(f'Downloading: {url}')
            urllib.request.urlretrieve(url, filename=fname)
            print(f'Download finished!')
        else:
            print("File already exists, not downloading again.")

download_attached_files()

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn import ensemble
from sklearn.metrics import r2_score

In [None]:
def get_hourly_country_data(data, country):
    ret_data = data[data["AreaName"] == country].interpolate() # data may contain NAs, therefore inteprolate
    ret_data = ret_data.resample("1h").mean(numeric_only=True).interpolate()    # not all hours may be complete 
                                                               # (i.e. some last 15 minutes are lacking, therefore
                                                               # another inpolation here)

    return ret_data

power_demand = pd.read_pickle("entsoe-demand-shortened.pickle")

power_demand_at_hourly = get_hourly_country_data(power_demand, "AT CTY")

In [None]:
# Let's have a look into the data...
power_demand_at_hourly.plot()

# Model fitting

We want to understand if electricity load is lower than expected due to the Corona Lockdown in 2020 and the high gas prices in 2021/2022/2023. We therefore have to know which electricity load we should have expected without the lockdown.

We do so by fitting a function to the electricity load, i.e. $y=f(x_1, x_2, ..., x_n)$. $y$ is the output feature, in our case the load. $f$ is some function depending on some $x_i$. We call the $x_i$ input features in the following.

Let's see an interactive example of model fitting [here](https://observablehq.com/@grahampullan/interactive-curve-fitting).



## Exercise 1

When you think of the last lecture - how can we build a model that predicts electricity load?

- First, which data do we want to predict in our data frame? How would you store the respective data in the output feature variable called `Y`?
- Second, which data describes the electricity load data pretty well, which data are you fitting to (input features)?
- Can you store this data in a new np array called `X`?

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #