<a href="https://colab.research.google.com/github/fsk-lab/scics/blob/main/09_Scientific_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python for Scientific Applications

In the first eight tutorials, we have learned about the most important concepts of programming in Python: Variables and data types, conditions, loops, functions, and classes. Moreover, we got to know the most important built-in data types (including `int`, `float`, `bool`, `None`, `list`, `tuple`, `set` and `dict`); and we learned how to use Python to interact with the operating system to navigate the file tree, and read/write files. In principle, this knowledge provides us with all the tools to solve various scientific tasks.

At the same time, Python is arguably the programming language with the most active community that constantly develops new tools that make our lives easier. There are a number of packages out there which contain useful functionality for scientific applications. Four of these packages will be introduced in the following tutorial:
* `numpy`: Numerical and logical operations on large arrays and matrices.
* `pandas`: Useful tools for tabular data.
* `scipy`: Advanced mathematical tools for scientific problems
* `matplotlib`: Plotting and visualization

## Numerical Operations with `numpy`

In science, we often deal with larger amounts of data – for example, an experimental spectrum can easily consist of tens of thousands of individual data points. In these scenarios, the built-in data types and their operations are not necessarily the most efficient solutions. In particular, looping over very long lists, and performing a mathematical operation on each list element, can become relatively inefficient.

The `numpy` library (short for "numerical Python") addresses these issues by providing highly efficient array structures and mathematical operations.

```
🧠 `numpy` makes a number of operations more efficient by providing specific
data types and implementing a large number of looping operations
in hardware-oriented languages like C or Fortran.
```

In a regular Python setup, `numpy` can be installed from the Python Package Index (PyPi) through
```
pip install numpy
```

Google Colabs provides a `numpy` installation by default.

In this chapter, we will learn about the most important functionalities of `numpy`, and how to efficiently use it.

### `numpy` Arrays

#### Basics of `numpy` arrays

The fundamental data structure in `numpy` is called an **array** (data type: `ndarray`, short for "n-dimensional array"). We can think about an array like a list – on many occasions, it behaves very similarly.

Numpy arrays can be created from lists:

In [None]:
import numpy as np  # it is common practice to import numpy as the shortcut `np`

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print(a)
print(type(a))

In fact, if we print it, it looks exactly like a list.

If we want to access specific elements (or sub-arrays) from a `numpy` array, we can use **indexing** and **slicing** – just as we have learned it for the case of lists.

In [None]:
print(a[2])
print(a[2:6])

Note that, in principle, `numpy` arrays are mutable. That means, we can also use indexing and slicing to modify the value of a specific element in a `numpy` array.

In [None]:
a[2] = 100
print(a)

However, there are also a number of very notable differences between a list and a `numpy` array. Most importantly, `numpy` arrays have a fixed data type – meaning that all elements within the array must be of the same data type. We can get the data type of an array through the `dtype` attribute.

In [None]:
print(a.dtype)

This has an important consequence: New elements of the array can only be set to values that can be casted to the specific data type – otherwise, this will cause an error!

In [None]:
a[4] = "Test"

We can convert an array to a new data type using the `array.astype(new_type)` function.

#### Multi-dimensional Arrays

One of the key features of `numpy` is that it supports multi-dimensional arrays. So far, we have looked at a one-dimensional array, i.e. a list of values (or, in mathematical words, a *vector*). In `numpy`, we can also work with arrays of any other dimensionality. A two-dimensional array would be a table (or in mathematical wording, a *matrix*). Arrays of three or more dimensions are difficult to imagine – but they can become useful at times.

We can instantiate a two-dimensional array from a nested list:

In [None]:
b = np.array([[1, 2, 3], [4, 5, 6]])
# table of two rows and three columns

Each `numpy` array has the `shape` attribute, which tells us about what the array looks like. For a 1D array, it is a tuple with one element – which is the length of the array. For a 2D array, it is a tuple with two elements – the first one being the number of rows, and the second one being the number of columns. This works similar for higher-dimensional arrays.

In [None]:
print(a.shape)
print(b.shape)

In a 2D array, both dimensions of the array (often called the *axes*) can be indexed separately:

In [None]:
print(b[1, 2])

```
🔁  Remember that the ":" can be used to slice all elements from a list or
a string. We can use this to get e.g. all values from a specific row
or column of a `numpy` array!
````

```
🎮  Create a numpy array of the following matrix:

 1  4 -2
 0 -1  0
 7 -4  0
 1  2 -1

Predict the shape of the numpy array!

 Print out a slice of the matrix that represents...
 a) the first row
 b) the second column
```

In [None]:
# Try it out!

#### Creating Arrays with Special Values

Numpy provides several functions to create arrays filled with specific values:
* `np.zeros(shape)` creates an array of the given shape filled with zeros.
* `np.ones(shape)` creates an array of the given shape filled with ones.
* `np.full(shape, val)` creates an array of the given shape filled with the value `val`.
* `np.random.rand(shape)` creates an array of the given shape filled with random numbers from the [0, 1] interval.

In [None]:
arr_1 = np.zeros((5, 7))
arr_2 = np.full((2, ), 2.0)

print(arr_1)
print(arr_2)

```
❗  Remember: A `shape` is always a tuple, even for 1D arrays!
```

#### Combining Arrays

In `numpy`, we can combine two arrays into a single array – provided that the shapes of the two arrays match. Let us consider the following example of two 2D arrays:

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

There could be two ways to combine these arrays.
1. We get a new array with four rows and two columns ("vertical stacking").
2. We get a new array with two rows and four columns ("horizontal stacking").

For these two scenarios, `numpy` provides the functions `np.hstack()` (**h**orizontal stacking) and `np.vstack()` (**v**ertical stacking). Both functions take a list or tuple of arrays as their argument.

In [None]:
ver_stacked = np.vstack([a, b])
print(ver_stacked)

In [None]:
hor_stacked = np.hstack([a, b])
print(hor_stacked)

```
❗ Note that, for stacking arrays, the shapes of both arrays need to match.
- For `np.vstack([a, b])`, `a` and `b` need to have the same number of columns.
- For `np.hstack([a, b])`, `a` and `b` need to have the same number of rows.
```

This may require changing the shape of an array (see below)!

```
🧠  `np.hstack` and `np.vstack` are special cases of `np.concatenate` for 2D arrays.
    `np.concatenate` provides a generalized interface for combining arrays of
    arbitrary dimensionality.

np.concatenate(
    arrays,  # list or tuple of arrays to combine
    axis  # axis along which to combine
)
```

#### 🧠 Changing the Shape of Arrays

At times, it can become necessary to change the shape of an array.

Here, the `array.reshape(m, n, ...)` changes the shape of an array to `(m, n, ...)` without changing its data. The product of `m * n * ...` must match the total number of elements.

In [None]:
a = np.array([1, 2, 3, 4, 5, 6])

# Reshape to 2 rows and 3 columns
b = a.reshape(2, 3)
print("Reshaped array (2x3):\n", b)

In addition to *reshaping*, we often want (or rather, need) to add or remove a dimension of size 1. This is usually necessary when we want to combine multiple arrays.

As an example, we want to turn an array (one-dimensional) into a matrix (two-dimensional) with a single column. This can be done with the `expand_dims` function.

In [None]:
a = np.array([1, 2, 3, 4, 5, 6])
print(a.shape)

b = np.expand_dims(a, axis=-1)
print(b.shape)

print(b)


Similarly, dimensions of size 1 can be removed using `np.squeeze()`.

```
🧠 Note that `expand_dims` and `squeeze` are just special cases of the
  `reshape` function discussed above!
```

### Basic Operations on `numpy` Arrays

`numpy`'s most important strength is its ability to support **vectorized** operations – which means we can apply certain operations to entire arrays without the explicit need to write a `for` loop. This does not only make our code more readable – but most importantly, is computationally much (!) more efficient than the `for` loop.

#### Element-Wise Operations with Scalars

Let us start with a simple example – we have an array of ten numbers, and want to create a new array in which every element from the original array is multiplied by 2.

```
🔁 How would we do this with a normal Python `list`?
```

In [None]:
a = [1, 7, -2, 4, 3, 1, 5, 0, 2, -3]

# Complete the code using "classical" Python.

In `numpy`, this can be done in a single line – by "multiplying" the entire array with 2, which automatically performs an element-wise multiplication.

In [None]:
a = np.array(a)

b = 2 * a

print(b)

Such element-wise operations cannot only be done with multiplication, but with a number of further operations that we have learned about in the previous chapters, including:
* The basic mathematical operations `+`, `-`, `*`, `**`, `/`, `//`, `%`
* Comparisons like `>`, `>=`, `<`, `<=`, `==`
* Element-wise Boolean operations like `~`(NOT), `&` (AND), `|` (OR)


```
🎮 Predict the outcome of the following code cell!
```

In [None]:
a = np.array(
    [
        [1, -2, 4],
        [2, 0, -1]
    ]
)

b = a >= 0

print(b)

```
❗ Element-wise operations in `numpy` are never performed in-place!
These operations create a new array, and the original array is not modified!
```

#### Advanced Element-Wise Mathematical Operations

In addition to the basic mathematical operations that we have known from "normal" Python, `numpy` provides us with *vectorized* versions of many further mathematical operations.
* Square root: `np.sqrt(array)`
* Exponential function: `np.exp(array)`
* Logarithms: `np.log(array)`, `np.log2(array)`, `np.log10(array)`, ...
* Trigonometric functions: `np.sin(array)`, `np.cos(array)`, `np.tan(array)`, `np.arcsin(array)`, `np.arccos(array)`, `np.arctan(array)`, `np.sinh(array)`, `np.cosh(array)`, `np.tanh(array)`
* Rounding operations: `np.round(array)`, `np.floor(array)`, `np.ceil(array)`
* and many more

Their usage is very similar to the "basic" element-wise operations.

In [None]:
a = np.array([1, 7, -2, 4, 3, 1, 5, 0, 2, -3])

b = np.sqrt(a)
c = np.sin(a)

print(b)
print(c)

This example shows us an important feature of `numpy`: The result of infeasible calculations (e.g. square roots of negative numbers, divisions by zero etc.) is a value called **nan** (for "**n**ot **a** **n**umber). `numpy` does not raise errors in these scenarios!

Similarly, numpy contains the `inf` value for any value that is larger than what the current data type can represent.

Moreover, `numpy` has the number of important mathematical constants stored, which we can directly use in calculations, including
* `np.pi` for the value of π
* `np.e` for the value of Euler's number e

In [None]:
print(np.pi)
print(np.e)

#### Element-Wise Operations between Multiple Arrays

So far, we have only seen (mathematical) operations that are performed on each element of a *single* array – e.g. multiplying each element by 2, or taking the square root of each element.

Importantly, `numpy` also supports element-wise operations between two arrays – as long as these arrays have the same shape.

As an example, we can do an element-wise addition of two arrays:

In [None]:
arr1 = np.array([1, 2, 3, 4])
arr2 = np.array([10, 20, 30, 40])

# Element-wise addition
print("Element-wise addition:", arr1 + arr2)

# Element-wise multiplication
print("Element-wise multiplication:", arr1 * arr2)

```
🎮  Find out what happens if the two arrays do not have matching shapes!
```

In [None]:
# Try it!

#### 🧠 Further Element-Wise Operations

* `np.diff(array)`: Each element at index *i* in the new array is the difference between the old array's elements at index *i+1* and *i*. This is particularly useful for calculating derivatives or integrals.
* `np.cumsum(array)`: Each element at index *i* in the new array is the cumulative sum of all elements in the old array from index 0 to *i*. This is particularly useful for calculating integrals.  

#### Reducing Operations



In addition to element-wise operations, `numpy` provides vectorized versions of further useful operations, including
* `np.sum()` for calculating the sum of all elements in an array
* `np.prod()` for calculating the product of all elements in an array
* `np.mean()` for calculating the mean of all elements in an array
* `np.max()` for finding the maximum of all elements in an array
* `np.min()` for finding the minimum of all elements in an array

In [None]:
a = np.array(
    [
        [1, -2, 4],
        [2, 0, -1]
    ]
)

print(np.sum(a))

Importantly, these operations can either be applied on the full array, or *along a dimension* of the array. In other words, we could also use the `np.sum()` function to calculate the sum of all values in each row of the array, or the sum of all values in each column of the array.

For this purpose, we have to specify the `axis` keyword in the function, specifying the dimension along which we want to consider all values – i.e. the dimension that should "disappear" after we perform the operation. In the example above, if we want to calculate the sum of all values in each row, we want to sum over all columns, so we would have to specify `axis = 1`.

In [None]:
a = np.array(
    [
        [1, -2, 4],
        [2, 0, -1]
    ]
)

print(np.sum(a, axis=1))

```
🎮  Predict the outcome of

print(np.sum(a, axis=0))
```

These functions are often referred to as "reducing" functions – since they reduce the dimensionality / shape of the resulting array by the dimension specified as `axis`.

### Selecting Elements from `numpy` Arrays

#### Array Indexing and Boolean Masks

In the introductory chapter on `numpy` arrays, we have seen that we can access elements from `numpy` arrays through indexing or slicing, as we have learned it for lists.

However, `numpy` provides two further, advanced ways of getting specific elements from a `numpy` array, which makes `numpy` a very powerful tool for performing complex operations in few, simple lines of code: **array indexing** and **boolean masks**.

In **array indexing**, we can provide an array of integer indices inside the square brackets. This will return us a new array of the same size as the index array – and at each position, it contains the value from the original array at the specific index position.

Let us illustrate that at a specific example:

In [None]:
a = np.array([1, 7, -2, 4, 3, 1, 5, 0, 2, -3])

indices = [1, 1, 2, 4, 8]

b = a[indices]
print(b)

A **boolean mask** is an array of the same shape as the original array. If we use this boolean mask in the square brackets, we get a new array back – which only contains those values from the original array where the mask was `True`.

Let us, again, illustrate that at an example:

In [None]:
a = np.array([1, 7, -2, 4, 3, 1, 5, 0, 2, -3])

mask = [True, False, True, False, False, False, True, False, False, True]

b = a[mask]
print(b)

This allows us to perform more complex logical operations on numpy arrays: As an example, we can select all elements from `a` which are greater than 1. This requires two steps:
1. We create a Boolean array via an element-wise comparison (see chapter on Element-wise operations).
2. We use this array as a Boolean mask to select the respective elements from `a`.

In [None]:
mask = a > 1
print(mask)

b = a[mask]
print(b)

In principle, we can write all of this in a single line:

In [None]:
b = a[a > 1]
# "b contains all elements from a where the value is greater than 1"

This feature of `numpy` arrays can be very useful for complex logical operations!

#### Finding Special Indices in Numpy Arrays

Boolean comparisons allow us to get a Boolean mask that indicates which elements of an array satisfy a specific criterion.

We can get the corresponding indices using `np.where()`.

In [None]:
a = np.array([1, 7, -2, 4, 3, 1, 5, 0, 2, -3])

b = np.where(a > 1)
print(b)

Moreover, we can get the specific indices of the `max` or `min` operations – i.e. the index at which the maximum or minimum value can be found. The corresponding functions are called `np.argmax()` and `np.argmin()`.

In [None]:
a = np.array([1, 7, -2, 4, 3, 1, 5, 0, 2, -3])

print(np.argmax(a))
print(np.argmin(a))

Especially when dealing with larger (and/or numerical) data, it is highly recommended to work with `numpy` for storing arrays and matrices for two reasons:
1. The implementation in C / Fortran makes element-wise operations much faster and more efficient.
2. Vectorized operations and advanced indexing significantly simplifies your code. This makes it more readable, and less prone to errors!

## Handling Tabular Data with `pandas`

In many practical use cases, data comes in the form of tables – where each row and column have specific names / headers. In principle, `numpy` already provides us with a  toolbox to handle such data – a 2D `numpy` array would, in most of the cases, be suitable for storing, accessing and modifying data in a tabular format.

However, this is not always the most practical solution. Rows and columns can only be accessed by their index (and not by their row/column name); and the entire table would need to have a single data type.

To make handling of tabular data more practical, there is the `pandas` package – which we are going to learn about in this section.

Pandas can be installed from the Python Package Index (*PyPI*) using

```
pip install pandas
```

In a Colabs environment, we do not need to worry about installing `pandas`. Since the package is so widespread (and useful), it is installed by default!

Typically, `pandas` is imported as the shortcut `pd`.

In [None]:
import pandas as pd

### `pandas` DataFrames

The central data type in `pandas` – which is used to represent tabular data – is called a **DataFrame**.

DataFrames are effectively tables of rows and columns. Each row and each column have specific row/column names and can be accessed by those. Notably, each column can have its own data type – which is very practical for many applications!

DataFrames can be created in Python directly (e.g. from a dictionary). Here, each key becomes a column name, and the values are the data for that column.

In [None]:
raw_data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35]
}

df = pd.DataFrame(raw_data)

df

```
💡  `pandas` has an excellent integration with IPython (i.e. the Python that is used for e.g. Colabs or
Jupyter notebooks. Therefore, DataFrames, and a number of further `pandas` data structures,
can be nicely displayed in a notebook if we only provide the variable name.
```

#### Loading and saving `pandas` DataFrames

Usually, we would not create a `pandas` DataFrame directly in the code. One of the most useful features of `pandas` is the ability to read data from external files – including both `.csv` files and Excel files – and save the data back to these formats.

In [None]:
df = pd.read_csv("sample_data/california_housing_train.csv")

df

*Microsoft* Excel files (`.xls` or `.xlsx`) can be read using `pd.read_excel(...)`.

Analogously, an existing `pandas` DataFrame can be written to a `csv` or `excel` file:

In [None]:
df.to_csv("test.csv")
df.to_excel("test.xlsx", index=False)

```
❗  The `index` argument specifies whether the row index (usually: the row
number) should be written to the file, too.
```

#### Inspecting `pandas` DataFrames

`pandas` DataFrames are most commonly used to load tabular data from external sources. Therefore, it is usually important to understand the structure and content of the dataset a bit better.

For this, a `pandas.DataFrame` provides a number of useful attributes and methods, including:
* The `df.info()` method, which provides the names, data types, missing values for each column.
* The `df.describe()` method, which provdes some summary statistics of the DataFrame
* The `df.shape` attribute, which provides the shape of the DataFrame (i.e. the number of rows and columns). This is analogous to the `shape` attribute of a `numpy` array.

In [None]:
df.info()

In [None]:
df.describe()

In [None]:
df.shape

#### Accessing Data from `pandas` DataFrames

`pandas` supports a number of different ways to access individual cells, or segments of cells, from a DataFrame.

*Indexing* using the square brackets can be used to access a single (index: column name), or multiple columns (index: list of column names) from the DataFrame. This gives us a copy of the individual column, or a new DataFrame copied from the original DataFrame.

In [None]:
latitude_column = df["latitude"]
latitude_column

In [None]:
coordinates = df[["longitude", "latitude"]]
coordinates

Individual cells, or custom ranges of cells can be accessed using the `loc` and `iloc` attributes, respectively, which can be indexed as follows:
* `df.loc[row_name, col_name]` returns the content of a cell based on row and column **names**
* `df.iloc[row_idx, col_idx]` returns the content of a cell based on row and column **index** (similar to indexing a `numpy` array).

The `loc` and `iloc` attributes of a `pandas` DataFrame also support indexing by Boolean masks (as we have learned for `numpy` arrays), or indexing by slices (for `iloc`).

In [None]:
print(df.loc[10000, "total_bedrooms"], "\n\n")
print(df.loc[10000, ["longitude", "latitude"]], "\n\n")
print(df.iloc[:10, :])

The `df.loc` and `df.iloc` attributes can also be used to set the respective values in a `pandas` DataFrame (see below).

### Processing Data with `pandas`

`pandas` provides a number of useful tools to process data in Python.

#### Filtering DataFrames

As a simple example, we can use `pandas` to filter data – e.g. to take only the ones from the Housing dataset where the median house value is greater than 300,000 $. For this, we can create a Boolean mask.

In [None]:
mask = df["median_house_value"] > 300000

mask

As in the case of `numpy` arrays, such a Boolean mask can be used for indexing – keeping only those rows in which the Boolean index is `True`.

In [None]:
filtered_houses = df[mask]

filtered_houses

We could write the same in one line:

In [None]:
filtered_houses = df[df["median_house_value"] > 300000]

filtered_houses

```
❗  Note that, in the new DataFrame obtained from this "filtering", the row names from the original table
are kept! To drop the original row numbers, and get a fresh set of row numbers, we can use

`df = df.reset_index(drop = True)`
```

Especially when reading data from files, having invalid entries in a column is a common phenomenon. In `pandas`, we can filter for these invalid entries using the `df.isna()` function – which checks which entries are `nan`. This function gives a Boolean mask – which we can then use to select, modify or remove values from the DataFrame.

#### Modifying Data

The methods to access specific cells or groups of cells from a DataFrame can also be used to modify the data within a DataFrame.

In [None]:
df.loc[0, "longitude"] = 1000.0

df

The same can also be done for groups of cells, or even entire columns.

We can select an entire column, perform a mathematical operation on it, and then assign the newly obtained values to the column in the original dataframe. Similar to `numpy`, many mathematical operations are vectorized, meaning that they can be executed on a full row / column without the need to write a loop.

For example, we can increase all "latitudes" by a value of 1.2:

In [None]:
new_latitudes = df["latitude"] + 1.2
df["latitude"] = new_latitudes

df

```
🎮 For all houses where the median age is greater than 50, double the value of the median age!
```

In [None]:
# Try it!

#### Adding or Removing Rows or Columns

The existing methods for accessing or setting specific elements in a `pandas` DataFrame make it easy to add new columns or rows:
* A new column is added e.g. by `df[col_name] = [val1, val2, ...]`
* A new row is added e.g. by `df.loc[row_name, :] = [val1, val2, ...]`

Rows or columns are removed using the `df.drop()` method:
* `df.drop(labels, axis=0)` removes a row or multiple rows specified by `labels`.
* `df.drop(labels, axis=1)` removes a row or multiple rows specified by `labels`.

```
❗ Note that, by default, `df.drop()` returns a copy of the dataframe without the specified column(s).
Otherwise, the `inplace` argument can be used to indicate that the dataframe object should be modified directly.

In other words, `df = df.drop(...)` and `df.drop(..., inplace=True)` behave identically.
```

## 🧠 Advanced Scientific Calculations: `scipy`

Chapter coming soon!

## Plotting with `matplotlib`

So far, we have mainly seen how to read and process data using Python. Ideally, we also want to make data visually accessible – by e.g. plotting graphs, surfaces, diagrams, and many more. In Python, this can be done using the `matplotlib` package.  

`matplotlib` can be installed from the Python Package Index (*PyPI*) using

```
pip install matplotlib
```

In a Colabs environment, again, we do not need to worry about installing `matplotlib` – it is installed by default!

`matplotlib` is a very large and powerful Python library. In this tutorial, we will only cover some of the fundamentals of how to visualize data using `matplotlib`. The official package documentation provides many more insights on what can be done using `matplotlib` – and, in fact, Large Language Models like *ChatGPT* can be useful assistants when customizing the behavior of `matplotlib` figures.

Usually, we only import the `pyplot` submodule using the shortcut `plt`.

In [None]:
import matplotlib.pyplot as plt

The most straightforward way of plotting data in `matplotlib` is the scatter plot. We provide a number of `(x, y)` pairs, which are then plotted into a single diagram.


In [None]:
x = [1, 2, 3, 4, 5]
y = [0.5, 2, 4.5, 8, 12.5]

plt.scatter(x, y)
plt.show()  # visualizes the plot

```
💡 Note that `matplotlib` has a very advanced IPython integration. In notebooks (e.g. in Google Colab),
the plots are visualized directly, and a call to the `plt.show()` method is not strictly necessary.
```

We can also customize the plots, e.g. by adding axis labels, or a plot title:

In [None]:
x = [1, 2, 3, 4, 5]
y = [0.5, 2, 4.5, 8, 12.5]

plt.scatter(x, y)
plt.xlabel("The X Axis")
plt.ylabel("The Y Axis")
plt.title("The Title")
plt.show()

Alternatively, `matplotlib` allows us to do **line plots** using `plt.plot()`

In [None]:
x = [1, 2, 3, 4, 5]
y = [0.5, 2, 4.5, 8, 12.5]

plt.plot(x, y)
plt.xlabel("The X Axis")
plt.ylabel("The Y Axis")
plt.title("The Title")
plt.show()

We can also draw multiple plots in the same figure, and create a legend.

In [None]:
x = [1, 2, 3, 4, 5]
points = [0.5, 2, 4.5, 8, 12.5]
line = [1, 4, 7, 10, 13]

plt.scatter(x, points, label="Points")
plt.plot(x, line, label="Line")
plt.xlabel("The X Axis")
plt.ylabel("The Y Axis")
plt.title("The Title")
plt.legend()
plt.show()

We can save such a plot to a file using the `plt.savefig()` method. Matplotlib automatically infers the file format from the file extension, and converts the file accordingly!

In [None]:
x = [1, 2, 3, 4, 5]
y = [0.5, 2, 4.5, 8, 12.5]

plt.plot(x, y)
plt.xlabel("The X Axis")
plt.ylabel("The Y Axis")
plt.title("The Title")
plt.savefig("test.png")

Plotting directly with `plt.plot()`, `plt.scatter()` etc. is useful when creating a quick, single plot in a single code cell. However, this approach has severe limitations, e.g. if we want to modify the same plot at different places of the code, if we want to include multiple plots in a single figure, etc.

---

Therefore, `matplotlib` provides us with an object-oriented way of handling figures. The `plt.subplots()` function generates a new `Figure` object (which represents the entire figure), as well as an `Axes` object, which represents the plot within that figure. While the distinction between `Figure` and ` Axes` does not seem logical right now, it will become important once we are looking at multiple sub-plots in a single figure.

We can then use the `Axes` object to do the same plotting operations as before:

In [None]:
x = [1, 2, 3, 4, 5]
y = [0.5, 2, 4.5, 8, 12.5]

fig, ax = plt.subplots()

ax.plot(x, y)
ax.set_xlabel("The X Axis")
ax.set_ylabel("The Y Axis")
ax.set_title("The Title")

fig.savefig("test.png")

```
❗  Note that the methods to set titles and axis labels have slightly different names now!
```

As indicated above, `plt.subplots()` allows us to create multi-panel figures. The syntax for this is `plt.subplots(n_rows, n_columns)`. This function returns a `Figure` object, as well as a `numpy` array of `Axes` objects of the shape `(n_rows, n_columns)`.

For example, if we want to create a plot with two panels next to each other, we could do this the following way:

In [None]:
x = [1, 2, 3, 4, 5]
y1 = [0.5, 2, 4.5, 8, 12.5]
y2 = [2, 4, 6, 8, 10]

fig, axs = plt.subplots(1, 2)  # one row, two columns

axs[0].plot(x, y1)
axs[0].set_xlabel("The X Axis")
axs[0].set_ylabel("The Y Axis")
axs[0].set_title("Plot Number 1")

axs[1].plot(x, y2)
axs[1].set_xlabel("The X Axis")
axs[1].set_ylabel("The Y Axis")
axs[1].set_title("Plot Number 2")

fig.tight_layout()  # This cleans up the figure and avoids e.g. overlap. of panels!
fig.show()

`matplotlib` provides an enormous number of options to customize the appearance of a single plot. As such, functions like `plt.plot()` or `plt.scatter()` take additional keyword arguments, e.g. :
* `c: str` or `color: str` The color of the plot (either one of the internally defined colors, or any color as a hex-encoded string).
* `lw: float` or `linewidth: float` The line width.
* `alpha: float` The transparency of the plot (0: fully transparent; 1: not transparent)
* `marker: str` The marker used for representing points. In addition, marker properties can be set by `markeredgecolor`, `markeredgewidth`, `markerfacecolor`.

At the same time, the appearance of a plot (i.e. an `Axes` instance) can be customized with a number of methods:
* `set_title(title: str)`
* `set_xlabel(xlabel: str)`
* `set_ylabel(ylabel: str)`
* `set_xlim(min = None, max = None)`
* `set_ylim(min = None, max = None)`
* `set_xticks(...)` / `set_yticks(...)` set the positions at which the axis ticks are drawn.
* `set_xticklabels(...)` / `set_yticklabels(...)` set the values which are depcited at the axis ticks.
* ...  

The number of individual configurations in `matplotlib` is enormous – and the package documentation provides many resources on how to configure plots in `matplotlib`. As indicated above, LLMs can also be a great resource for that!

It should be noted that `matplotlib` is not limited to simple line or scatter plots – but can be used for a vast number of different plot types, including...
* 1D and 2D histograms
* bar charts
* box plots and violin plots
* pie charts
* grid visualizations and countour plots
* 3D plots and surfaces
* ...

The [package documentation](https://matplotlib.org/stable/plot_types/index.html) and the [example gallery](https://matplotlib.org/stable/gallery/index.html) provide a large catalogue of further options – along with examples on how to create them with `matplotlib`. Play around with it to get some hands-on experience!