# Python review

This notebook covers all the Python you need to know _for our tutorial_ (today and tomorrow).

Python has a lot of nice features, but we'll only be using what is presented in this notebook.

I'm also assuming that this is all familiar, just a review. If so, we'll go through it quickly. If not, ask me to slow down and explain any features that are new to you.

To run cells along with me, click in the cell and type shift-enter or control-enter.

Order matters, so if you need to start over, select a cell and choose "Restart Kernel and Run up to Selected Cell" from the "Kernel" menu.

<br><br><br>

## Expressions and assignment

In [None]:
2 + 2

<br><br><br>

In [None]:
# 2 + 2

<br><br><br>

In [None]:
x = 2 + 2

In [None]:
x

<br><br><br>

| math | | Python |
|:--:|:--:|:--:|
| $x + y$ | | `x + y` |
| $x - y$ | | `x - y` |
| $x \times y$ | | `x * y` |
| $x \div y$ | | `x / y` |
| $x \bmod y$ | | `x % y` |
| $x^n$ | | `x**n` |
| $(x + y) \times z$ | | `(x + y) * z` |

<br><br><br>

In [None]:
print(f"addition:       {3 + 2 = }")
print(f"subtraction:    {3 - 2 = }")
print(f"multiplication: {3 * 2 = }")
print(f"division:       {3 / 2 = }")
print(f"modulo:         {3 % 2 = }")
print(f"exponentiation: {3**2  = }")
print(f"parentheses:    {(3 + 2) * 100 = } versus {3 + 2 * 100 = }")

<br><br><br>

$$ x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} $$

In [None]:
a = 0.1
b = 1.2
c = 2.0

x_1 = -b + (b**2 - 4*a*c)**0.5 / (2*a)
x_2 = -b - (b**2 - 4*a*c)**0.5 / (2*a)

<br><br><br>

In [None]:
discriminant = b**2 - 4*a*c

x_1 = -b + discriminant**0.5 / (2*a)
x_2 = -b - discriminant**0.5 / (2*a)

<br><br><br>

## If-then-else

In [None]:
if True:
    print("yay")
else:
    print("boo")

In [None]:
if False:
    print("yay")
else:
    print("boo")

<br><br><br>

In [None]:
x = 5

In [None]:
x == 2 + 3

In [None]:
x == 2 * 3

<br><br><br>

In [None]:
if x == 2 + 3:
    print("yay")
else:
    print("boo")

In [None]:
if x == 2 * 3:
    print("yay")
else:
    print("boo")

<br><br><br>

In [None]:
x = 1

if x == 1:
    print("one")
elif x == 2:
    print("two")
elif x == 3:
    print("three")
else:
    print("whatever")

<br><br><br>

## For loops and while loops

In [None]:
for x in range(10):
    print(x)

<br><br><br>

In [None]:
x = 0
while x < 10:
    print(x)
    x += 1

<br><br><br>

`for` can be seen as a shorthand for `while`. Always use a `for` loop if possible!

<br><br><br>

**Exercise:** If you have one penny at the beginning of the year and double it every day, how much money will you have after 30 days?

(Without using `**`.)

In [None]:
# Try to solve it here before using the next cell to show solutions.

print("???")

<details>
    <summary><b>Solution (do not peek!)</b></summary>

```python

money = 0.01

for day in range(30):
    money = money * 2

money
```
    
</details>

<br><br><br>

## Functions

In [None]:
def quadratic_formula(a, b, c):
    discriminant = b**2 - 4*a*c
    
    x_1 = -b + discriminant**0.5 / (2*a)
    x_2 = -b - discriminant**0.5 / (2*a)

    return x_1, x_2

In [None]:
quadratic_formula(0.1, 1.2, 2.0)

<br><br><br>

In [None]:
def compound_interest(principal, rate, num_periods):
    money = principal

    for period in range(num_periods):
        money = money * rate

    return money

In [None]:
compound_interest(0.01, 2, 30)

<br><br><br>

## Importing functions

In [None]:
import math

In [None]:
math.sqrt

In [None]:
a = 0.1
b = 1.2
c = 2.0

x_1 = -b + math.sqrt((b**2 - 4*a*c)) / (2*a)
x_2 = -b - math.sqrt((b**2 - 4*a*c)) / (2*a)

<br><br><br>

In [None]:
0.01 * 2**30

In [None]:
0.01 * math.pow(2, 30)

<br><br><br>

## Data types

In [None]:
type(123)

In [None]:
type(3.14)

In [None]:
type("hello")

In [None]:
type(True)

<br><br><br>

In [None]:
type(quadratic_formula)

In [None]:
type(math.sqrt)

<br><br><br>

In [None]:
1 + "2"

In [None]:
1 + True

<br><br><br>

In [None]:
isinstance(123, int)

In [None]:
isinstance(True, int)

In [None]:
isinstance(3.14, int)

In [None]:
isinstance("2", int)

<br><br><br>

In [None]:
issubclass(bool, int)

In [None]:
issubclass(int, float)

<br><br><br>

## Collection types

In [None]:
some_list = [0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9]
some_list

In [None]:
type(some_list)

In [None]:
len(some_list)

<br><br><br>

In [None]:
some_dict = {"one": 1.1, "two": 2.2, "three": 3.3}
some_dict

In [None]:
type(some_dict)

In [None]:
len(some_dict)

<br><br><br>

In [None]:
some_list[2]

In [None]:
some_dict["two"]

<br><br><br>

In [None]:
some_list[2] = 222

In [None]:
some_list

<br><br><br>

In [None]:
some_dict["two"] = 222

In [None]:
some_dict

<br><br><br>

In [None]:
some_list.append("mixed types")

In [None]:
some_list

<br><br><br>

In [None]:
some_dict[123] = "mixed types"

In [None]:
some_dict

<br><br><br>

In [None]:
for x in some_list:
    print(x)

In [None]:
for x in some_dict:
    print(x)

<br><br><br>

In [None]:
for key in some_dict.keys():
    print(key)

In [None]:
for value in some_dict.values():
    print(value)

In [None]:
for key, value in some_dict.items():
    print(key, "   ->   ", value)

<br><br><br>

In [None]:
some_list[2:8]

<br><br><br>

**Exercise:** Before running the next cell, what will it do?

In [None]:
some_list[2:8][3]

<br><br><br>

Negative numbers slice from the other end of the list:

In [None]:
numbers = [0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9]

In [None]:
numbers[:-1]

In [None]:
numbers[:-2]

In [None]:
numbers[:-3]

<br><br><br>

In [None]:
for i in range(6):
    print(numbers[ i : -(i + 1) ])

<br><br><br>

## Arrays (NumPy)

<img src="img/numpy-logo.svg" width="350">

In [None]:
import numpy as np

In [None]:
some_array = np.array([0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9])
some_array

In [None]:
type(some_array)

In [None]:
len(some_array)

<br><br><br>

In [None]:
some_array[2]

In [None]:
some_array[2] = 222

In [None]:
some_array

In [None]:
some_array[2:8]

<br><br><br>

In [None]:
some_array[2] = "mixed types"

<br><br><br>

In [None]:
some_array.append(999)

<br><br><br>

In [None]:
big_2d_array = np.random.randint(0, 100, (2000, 2000))
big_2d_array

<br><br><br>

In [None]:
big_2d_array[2, :]

In [None]:
big_2d_array[:, 2]

In [None]:
big_2d_array[0:3, 0:3]

In [None]:
big_2d_array[-3:, -3:]

<br><br><br>

In [None]:
small_2d_array = np.array([
    [  1,   2,   3,   4,   5],
    [ 10,  20,  30,  40,  50],
    [100, 200, 300, 400, 500],
])

In [None]:
np.sum(small_2d_array)

In [None]:
np.sum(small_2d_array, axis=0)

In [None]:
np.sum(small_2d_array, axis=1)

<br><br><br>

In [None]:
array1 = np.array([ 1,  2,  3,  4,  5])
array2 = np.array([10, 20, 30, 40, 50])

In [None]:
array1 + array2

In [None]:
array1 * array2

<br><br><br>

In [None]:
a = np.random.uniform(5, 10, 1000000)
b = np.random.uniform(10, 20, 1000000)
c = np.random.uniform(-0.1, 0.1, 1000000)

In [None]:
a

In [None]:
b

In [None]:
c

In [None]:
quadratic_formula(a, b, c)

<br><br><br>

In [None]:
-b + np.sqrt(b**2 - 4*a*c) / (2*a)

<br><br><br>

**Exercise:** Compute the differences between consecutive elements:

In [None]:
array = np.array([1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9])
array

<details>
    <summary><b>Hint!</b></summary>

<img src="img/flat-operation.svg" width="400">
<img src="img/shifted-operation.svg" width="400">

</details>

In [None]:
# Try to solve it here before using the next cell to show solutions.

print("???")

<details>
    <summary><b>Solution (do not peek!)</b></summary>

```python

array[1:] - array[:-1]
```
    
</details>

<br><br><br>

In [None]:
array > 4

<br><br><br>

In [None]:
array[array > 4]

<br><br><br>

In [None]:
random_numbers = np.random.normal(0, 1, 10000)
random_numbers

In [None]:
np.sqrt(random_numbers)

<br><br><br>

**Exercise:** Take the square root of only the _non-negative_ numbers in `random_numbers`.

In [None]:
# Try to solve it here before using the next cell to show solutions.

print("???")

<details>
    <summary><b>Solution (do not peek!)</b></summary>

```python

np.sqrt(random_numbers[random_numbers >= 0])
```
    
</details>

<br><br><br>

## Data frames (Pandas)

<img src="img/pandas-logo.svg" width="350">

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame({
    "integers": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    "reals": [0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8, 9.9],
    "strings": ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine"],
})
df

In [None]:
type(df)

In [None]:
len(df)

In [None]:
df.columns

In [None]:
df.index

<br><br><br>

In [None]:
df.loc[2]

<br><br><br>

In [None]:
df.loc[2:8]

<br><br><br>

In [None]:
df["reals"]

<br><br><br>

In [None]:
df[["reals", "strings"]].loc[2:8]

<br><br><br>

In [None]:
np.sqrt(df[["integers", "reals"]])

<br><br><br>

In [None]:
df["integers"].values

In [None]:
type(df["integers"].values)

<br><br><br>

In [None]:
another_df = pd.DataFrame(
    {
        "more integers": [20, 30, 40, 50],
        "more reals": [22.2, 33.3, 44.4, 55.5],
    },
    index=pd.RangeIndex(2, 6),
)
another_df

<br><br><br>

In [None]:
joined_df = df.join(another_df)
joined_df

<br><br><br>

In [None]:
joined_df.dropna()

<br><br><br>

The [penguin dataset](https://www.kaggle.com/code/parulpandey/penguin-dataset-the-new-iris)!

In [None]:
penguins = pd.read_csv("data/penguins.csv")
penguins

<br><br><br>

In [None]:
penguins[penguins["island"] == "Biscoe"]

<br><br><br>

In [None]:
penguins[penguins["island"] == "Biscoe"].dropna()

<br><br><br>

In [None]:
import matplotlib  # force it to load (JupyterLite only)

In [None]:
penguins.plot.scatter("bill_length_mm", "bill_depth_mm")

<br><br><br>

## Plotting (Matplotlib)

<img src="img/matplotlib-logo.svg" width="200">

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot([1, 2, 3, 5, 10], [1.1, 2.9, 3.1, 0.5, 8.0])

<br><br><br>

In [None]:
fig, ax = plt.subplots()

ax.plot([1, 2, 3, 5, 10], [1.1, 2.9, 3.1, 0.5, 8.0])

None

<br><br><br>

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

ax1.plot([1, 2, 3, 5, 10], [1.1, 2.9, 3.1, 0.5, 8.0])

ax2.scatter([1, 2, 3, 5, 10], [1.1, 2.9, 3.1, 0.5, 8.0])

None

<br><br><br>

In [None]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

ax1.plot([1, 2, 3, 5, 10], [1.1, 2.9, 3.1, 0.5, 8.0])
ax1.set_xlabel("we control the horizontal")
ax1.set_ylabel("we control the vertical")

ax2.scatter([1, 2, 3, 5, 10], [1.1, 2.9, 3.1, 0.5, 8.0])
ax2.set_xlim(-5, 15)
ax2.set_ylim(0, 20)

None

<br><br><br>

In [None]:
fig, ax = plt.subplots()

penguins.plot.scatter("bill_length_mm", "bill_depth_mm", ax=ax)

ax.set_xlabel("bill length (mm)")
ax.set_ylabel("bill depth (mm)")
ax.set_title("penguins!")

None

<br><br><br>

In [None]:
fig, ax = plt.subplots()

penguins[penguins["species"] == "Adelie"].plot.scatter("bill_length_mm", "bill_depth_mm", color="blue", ax=ax)
penguins[penguins["species"] == "Gentoo"].plot.scatter("bill_length_mm", "bill_depth_mm", color="orange", ax=ax)
penguins[penguins["species"] == "Chinstrap"].plot.scatter("bill_length_mm", "bill_depth_mm", color="green", ax=ax)

ax.set_xlabel("bill length (mm)")
ax.set_ylabel("bill depth (mm)")

None

<br><br><br>

**Exercise:** Plot bill depth (`bill_depth_mm`) versus bill length (`bill_length`) in one plot and body mass (`body_mass_g`) versus flipper length (`flipper_length_mm`) in another plot beside it, colored by species (`species`), such that they look like this:

<img src="img/02-side-by-side-plots.svg" width="1000">

In [None]:
# Try to solve it here before using the next cell to show solutions.

print("???")

<details>
    <summary><b>Solution (do not peek!)</b></summary>

```python

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

penguins[penguins["species"] == "Adelie"].plot.scatter("bill_length_mm", "bill_depth_mm", color="blue", ax=ax1)
penguins[penguins["species"] == "Gentoo"].plot.scatter("bill_length_mm", "bill_depth_mm", color="orange", ax=ax1)
penguins[penguins["species"] == "Chinstrap"].plot.scatter("bill_length_mm", "bill_depth_mm", color="green", ax=ax1)
ax1.set_xlabel("bill length (mm)")
ax1.set_ylabel("bill depth (mm)")

penguins[penguins["species"] == "Adelie"].plot.scatter("flipper_length_mm", "body_mass_g", color="blue", ax=ax2)
penguins[penguins["species"] == "Gentoo"].plot.scatter("flipper_length_mm", "body_mass_g", color="orange", ax=ax2)
penguins[penguins["species"] == "Chinstrap"].plot.scatter("flipper_length_mm", "body_mass_g", color="green", ax=ax2)
ax2.set_xlabel("flipper length (mm)")
ax2.set_ylabel("body mass (g)")

None
```
    
</details>

<details>
    <summary><b>Slick solution (if you're really cool)</b></summary>

```python

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

color = {"Adelie": "blue", "Gentoo": "orange", "Chinstrap": "green"}

for species in ["Adelie", "Gentoo", "Chinstrap"]:
    penguins[penguins["species"] == species].plot.scatter("bill_length_mm", "bill_depth_mm", color=color[species], ax=ax1)
    penguins[penguins["species"] == species].plot.scatter("flipper_length_mm", "body_mass_g", color=color[species], ax=ax2)

ax1.set_xlabel("bill length (mm)")
ax1.set_ylabel("bill depth (mm)")

ax2.set_xlabel("flipper length (mm)")
ax2.set_ylabel("body mass (g)")

None
```
    
</details>