# Exercise 1 - Syntax and English terminology

- Which symbol is used in Python after the function name to call a function? How is that symbol called in English?
- Which symbols are used in Python to define a numpy array? How are they called in English?
- Which symbol is used in Python after the function name to start the code block defining the function? How is the symbol called in English?
- Which symbol is used in Python to calculate this expression: $x^n$? How is this expression called in English?

    ### BEGIN SOLUTION

Answers:
- `()` - round parenthesis
- `[]` - square brackets and round parenthesis
- `:` - colon
- `**` - $x$ to the power of $n$

In [None]:
### END SOLUTION

# Exercise 2 - Write a function which calculates the solution to quadratic formulas

"Given a general quadratic equation of the form

$$ax^2+bx+c=0$$

with $x$ representing an unknown, $a$, $b$ and $c$ representing constants with $a$ â‰  0, the quadratic formula is:

$$
    x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}
$$

where the $\pm$ indicates that the quadratic equation has two solutions." [https://en.wikipedia.org/wiki/Quadratic_formula]

Create a function `solve_quadratic_equation(a, b, c)`. Use it to calculate the solution for a quadratic equation with constants a=2.2, b=8.9 and c=5.6.

**Hint:** In Python you a function can return multiple values by separating them using with a comma, e.g. `return a, b, c` for variables `a`, `b` and `c`.


<small>This exercise has been copied from [python_course_2020](https://bitbucket.org/durozlikovski/python_course_2020), licensed under [BY-NC-SA-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).</small>

In [1]:
### BEGIN SOLUTION

In [2]:
import numpy as np


def solve_quadratic_equation(a, b, c):
    sq = np.sqrt(b * b - 4 * a * c)
    sol1 = (-b - sq) / (2 * a)
    sol2 = (-b + sq) / (2 * a)

    return sol1, sol2


solve_quadratic_equation(2.2, 8.9, 5.6)

A more fancy solution, raises a nicer error if `a` equals zero (i.e. explicitly mentions what went wrong). One could instead also return the imaginary solution in case the determinant is negative.

In [11]:
import numpy as np


def solve_quadratic_equation(a, b, c):
    if a == 0:
        raise ValueError("invalid value for a, a equals zero")

    sq = np.emath.sqrt(b * b - 4 * a * c)
    sol1 = (-b - sq) / (2 * a)
    sol2 = (-b + sq) / (2 * a)

    return sol1, sol2


sol1, sol2 = solve_quadratic_equation(10, 8.9, 5.6)

In [None]:
### END SOLUTION

In [12]:
# # # # # RUN THIS CELL TO CHECK YOUR RESULTS # # # # # 

from urllib.request import urlretrieve
import os.path
if not os.path.exists('check.py'):
    urlretrieve('https://raw.githubusercontent.com/inwe-boku/lecture-scientific-programming/refs/heads/main/check.py', filename='check.py')
from check import check_solution
# note: we check only with values a, b, c where the discriminant is 0, to make sure x1==x2, because
# the order of returned solutions should not matter - we could also sort the, but that's a bit too much
check_solution([("solve_quadratic_equation(1,2,1)", (-1.0, -1.0))], globals())
check_solution([("solve_quadratic_equation(4,4,1)", (-0.5, -0.5))], globals())
check_solution([("solve_quadratic_equation(1,4,4)", (-2, -2))], globals())
check_solution([("solve_quadratic_equation(2,-2,0.5)", (0.5, 0.5))], globals())

# Exercise 3 - Determine fossil backup power generation in system with high shares of renewables

Introduction to some vocabulary:
- *Load* is the consumption of electricity in the network
- *Residual load* is the residual demand for electricity on the network, which cannot be covered by the generation of solar PV and wind power. The residual load has to be covered by thermal power plants fuelled by e.g. coal or gas.
- *Generation*: Actual production of a power plant at a certain point in time
- *Capacity*: Theoretical maximum generation of a power plant. E.g. a solar park with 1 MW *capacity* may *generate* only 0.1MW in a certain hour because of low solar irradiance (e.g. due to low position of the sun or clouds).

You are asked to design the power system of Kakanien which contains a high share of renewable generation. Assume that the residual load (i.e. the load after reducing it by photovoltaic (PV) and wind power electricity generation) is supplied by a gas power plant, i.e. the remaining gap in load after accounting for renewable generation is filled by gas power generation. The time series of load, wind, and PV generation are given below (`load_kakanien_mw`, `generation_wind_power_kakanien_mw`, `generation_pv_kakanien_mw`). It is the average power in the given time period (hourly data).

First, write a function `residual_load_mw(load_mw, generation_wind_power_mw, generation_pv_mw)` which calculates the residual load in the system, i.e. the difference between load and wind power and PV generation (`load_mw - generation_wind_power_mw - generation_pv_mw`). Use the function to calculate the residual load in Kakanien and store it in the variable `residual_load_kakanien_mw`. Plot the resulting timeseries.

Second, we want to know, how big (in terms of capacity) the backup gas power plant has to be. And we also want to know, how much this power plant has to generate combined during the whole period. For that purpose, calculate the maximum of the residual load timeseries and store it in `gas_capacity_kakanien_mw`. This is the necessary capacity.  `numpy` has a function to calculate the maximum of an array, find it by googling. Second, calculate the sum of the residual load timeseries and store it in `gas_generation_kakanien_mwh`. This is how much the power plant has to generate during the whole period. `numpy` has a function to calculate the sum of an array, find it by googling. 

Bonus part: The approach described above may not work if the renewable generation becomes very high. Instead of `generation_wind_power_kakanien_mw` use `generation_wind_power_bonus_mw` defined below and redo the exercise. Can you find the semantic bug in that case? Can you fix it? Store the results in `gas_capacity_bonus_mw` and `gas_generation_bonus_mw`.

In [11]:
import numpy as np
import matplotlib.pyplot as plt

# the following time series are 9 hourly time stamps

# The demand in the system in MW is given:
load_kakanien_mw = np.array([7000, 7200, 7900, 8200, 8500, 8900, 9000, 8500, 7500])

# The production of the windpower plant is given by:
generation_wind_power_kakanien_mw = np.array([3, 2, 0.1, 0.3, 5, 5, 4.9, 4.7, 4.1]) * 400

# The production of the PV power plant is given by
generation_pv_kakanien_mw = np.array([0, 0, 0, 0.3, 0.7, 1, 1, 0.7, 0.3]) * 5000

In [12]:
### BEGIN SOLUTION

In [13]:
def residual_load_mw(load_mw, generation_wind_power_mw, generation_pv_mw):
    return load_mw - generation_wind_power_mw - generation_pv_mw

residual_load_kakanien_mw = residual_load_mw(load_kakanien_mw, generation_wind_power_kakanien_mw, generation_pv_kakanien_mw)
plt.plot(residual_load_kakanien_mw)
plt.xlabel('Time (hours)')
plt.ylabel('Residual load (MW)')

gas_capacity_kakanien_mw = np.max(residual_load_kakanien_mw)
gas_generation_kakanien_mwh = np.sum(residual_load_kakanien_mw)

In [None]:
### END SOLUTION

In [14]:

from urllib.request import urlretrieve
import os.path
if not os.path.exists('check.py'):
    urlretrieve('https://raw.githubusercontent.com/inwe-boku/lecture-scientific-programming/refs/heads/main/check.py', filename='check.py')
from check import check_solution
check_solution([
    ("gas_capacity_kakanien_mw", 7_860),
    ("gas_generation_kakanien_mwh", 41_060),
], globals())

## Bonus part for exercise 3

In [16]:
generation_wind_power_bonus_mw = np.array([3, 2, 0.1, 0.3, 5, 5, 4.9, 4.7, 4.1]) * 1000

In [17]:
### BEGIN SOLUTION

In [18]:
residual_load_bonus_mw = residual_load_mw(load_kakanien_mw, generation_wind_power_bonus_mw, generation_pv_kakanien_mw)
plt.plot(residual_load_bonus_mw)

gas_capacity_bonus_mw = np.max(residual_load_bonus_mw)

residual_load_bonus_clipped_mw = np.clip(residual_load_bonus_mw, 0, gas_capacity_bonus_mw)

plt.plot(residual_load_bonus_clipped_mw)
plt.xlabel('Time (hours)')
plt.ylabel('Residual load (MW)')

gas_generation_bonus_mwh = np.sum(residual_load_bonus_clipped_mw)

In [None]:
### END SOLUTION

In [19]:
# # # # # RUN THIS CELL TO CHECK YOUR RESULTS # # # # # 

check_solution([
    ("gas_capacity_bonus_mw", 7_800),
    ("gas_generation_bonus_mwh", 25_600)
], globals())

# Exercise 4 - Spurious correlations

Reproduce a [_spurious correlations_](https://www.tylervigen.com/spurious-correlations) plot using `plt.plot()`.

Plot the time series `sociology_doctorates` and `space_launches` on a relative scale, i.e. as percentage of the last data point (i.e. years on the x-axis, relative data on the y-axis, the last data point in both time series should be 100%). Add a second plot which displays the relation between both data sets as scatter plot by using `plt.plot(dataset1, dataset2, 'o')` (data set 1 on x-axis, data set 2 on y-axis absolute units, not percentage).

Instead of using the given time series, you can also use any other spurious correlation, if you want to search a different dataset.

Analyze both plots, come up with a wrong conclusion and explain why it is wrong.

Don't forget to label the axis!

![Spurious correlations](images/spurious-correlations.png) 

Source: https://www.tylervigen.com/spurious-correlations

Note: Having two y-axis can be very often misleading. Hadley Wickham, statistician and chef scientist at RStudio, says:
> ["Also illustrates why I dislike two y axes"](https://twitter.com/hadleywickham/status/711891650058932225?lang=en)

See also [this Stackoverflow Q&A](https://stackoverflow.com/questions/3099219/ggplot-with-2-y-axes-on-each-side-and-different-scales/3101876#3101876).

In [21]:
import numpy as np

sociology_doctorates = np.array([601, 579, 572, 617, 566, 547, 597, 580, 536, 579, 576, 601, 664])
space_launches = np.array([54, 46, 42, 50, 43, 41, 46, 39, 37, 45, 45, 41, 54])
years = np.arange(1997, 2009 + 1)

In [22]:
### BEGIN SOLUTION

In [23]:
import matplotlib.pyplot as plt

plt.plot(years, 100 * sociology_doctorates/sociology_doctorates[-1])
plt.plot(years, 100 * space_launches/space_launches[-1])
plt.ylabel('Values relative to last data point (%)');

In [24]:
plt.plot(space_launches, sociology_doctorates, 'o')
plt.xlabel('Space launches')
plt.ylabel('Sociology doctorates');

In [None]:
### END SOLUTION

# Exercise 5 - Factorial recursion (optional bonus exercise)

Implement a function `factorial(n)` which calculates the factorial $n! = 1\cdot 2 \cdot \ldots \cdot n$ for integer values $n \geq 0$. Use a recursive implementation, that means that `factorial()` calls itself inside its function body.

Note: If you run into a `RecursionError` this means that `factorial(n)` calls it self in an endless loop.

Bonus task: explain how one could easily implement wrong code which leads to an `RecursionError`.

In [26]:
### BEGIN SOLUTION

In [27]:
def factorial(n):
    if n > 1:
        return n * factorial(n - 1)
    else:
        return 1

In [28]:
factorial(5)

In [None]:
### END SOLUTION

In [29]:
# # # # # RUN THIS CELL TO CHECK YOUR RESULTS # # # # # 

from urllib.request import urlretrieve
import os.path
if not os.path.exists('check.py'):
    urlretrieve('https://raw.githubusercontent.com/inwe-boku/lecture-scientific-programming/refs/heads/main/check.py', filename='check.py')
from check import check_solution
check_solution([
    ("factorial(5)", 120),
    ("factorial(1)", 1),
    ("factorial(0)", 1),
], globals())