All explanations for code cells are written above the code cells.

# Using `numpy`

## Importing `numpy` library

In [None]:
import numpy as np

## Creating Matrices

- A 3 by 3 matrix is created

In [None]:
x = np.array([[1, 2, 3],
			[4, 5, 6]])

In [None]:
print(x)

# Using `matplotlib`

- `%matplotlib inline` is a magic function
- Allows for inline plotting of graphs (bellow the cells)

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

- Creates an array for 20 elements from 0 to 19
- Similar to a list comprehension

In [None]:
x = np.arange(20)

In [None]:
print(x)

- Checks if the a list comprehension is the same as the `x` that was generated

In [None]:
print([x for x in range(20)] in x)

---

- For each value of `x`, it computes `sin(x)` 
- Each value of `sin(x)` is added into the array `y`
- Radians is the units used

In [None]:
y = np.sin(x)

In [None]:
print(y)

## Plotting

- `marker` is used to specify what sign is used to mark the coordinates
- There are several markers that can be used specified in the [documentation](https://matplotlib.org/stable/api/markers_api.html#:~:text=All%20possible%20markers%20are%20defined%20here%3A)

- Plotting `x` should return a straight like graph
- Circles are the markers used

In [None]:
plt.plot(x, marker = "o")

- Plotting `y` returns the points for each value of `sin(x)`
- Crosses are the markers used

In [None]:
plt.plot(y, marker = "x")

- Why does reversing `x` and `y` change the graph?
- The same graph is returned but transformed?

In [None]:
plt.plot(x, y, marker = "x")

In [None]:
plt.plot(y, x, marker = "x")

# Experiments with `iris`
In this part we will go through a simple machine learning application and create our first model. A hobby botanist would like to tell the species of iris flowers that she found. She has a training set of labelled flowers. The features are the length and width of the petals, and the length and width of the sepal, all measured in centimeters. 

There are three possible labels (species): Setosa, Versicolor, or Virginica. The iris dataset is a classical dataset in machine learning and statistics, collected by Ronald A. Fisher. It is included in scikit-learn in the dataset module. 

- The `iris` dataset is a precompiled dataset 

In [None]:
from sklearn.datasets import load_iris
from sklearn.utils import Bunch # For type hinting iris return type

- Loads the `iris` dataset 
- This can be of the following types `tuple[DataFrame | ndarray, Series | DataFrame | ndarray] | Bunch`
  - It normally returns a `Bunch` which is similar to a `dict`

In [None]:
iris: Bunch = load_iris()

- As mentioned before, iris returns a `Bunch` which is similar to a `dict`
- This means that each key will have some values that it store

In [None]:
iris.keys()

- Because `iris` is similar to a `dict`, some of the operations from a `dict` can be used
- `iris['DESCR']` finds what is stored in the `DESCR` key

In [None]:
print(iris['DESCR'])

- `target_names` is a list of strings containing the labels 
- In this case, it contains the species of flowers that need to be predicted (dependent variable)

In [None]:
print(iris['target_names'])

- `iris['feature_names']` returns a list of descriptions for each feature

In [None]:
print(iris['feature_names'])
# print(*iris['feature_names'], sep="\n") # Print each element in a new line

- `iris['data']` returns the data which is the matrix of features
- `.shape` returns the shape of the matrix which is the dimensions (rows and columns)
  - For this data, the matrix is 150 by 4 meaning there are 150 rows (values) and 4 columns

In [None]:
print(iris['data'].shape) # Dimensions of the matrix
print(iris['data']) # Matrix of features

- Using slicing, it is possible to get a range of values (slice) from the data (matrix of features)

In [None]:
print(iris['data'][:5]) # Splice from start of list to 4th index (5 elements)

- Contains the species of each flower that was measured
  - 0 = Setosa, 1 = Versicolor, 2 = Virginica

In [None]:
print(iris['target'])

- Returns the shape of the Numpy array as the dimensions of the matrix
- There are 150 rows similar similar to the number of rows in `iris['data']` 
  - This is because each target applies to the data

In [None]:
print(iris['target'].shape)

# Visualizing Data
- It is often a good idea to visualize your data:
  - To see if the task is easily solvable without machine learning
  - Or if the desired information might not be contained in the data
- Computer screens have only two dimensions, which allows us to only plot two (or maybe three) features at a time

- The *Matrix of Features* is usually denoted with `X`
- The *Dependent Variable* vector is denoted with `y`

In [None]:
X: Bunch = iris['data'] # Matrix of Features
y: Bunch = iris['target'] # Dependent Variable Vector

- The axis are `sepal_length` and `petal length`
- `X[:, 0]` denotes 
  - Everything from the start to the end of the Matrix of Features `X`
  - `, 0` serves as proxy for the size of sepals
  - `, 2` serves as proxy for the size of petals
- `c=y` denotes:
  - `c` = colour
  - `y` = vector of labels
- `s=60` denotes the size of the dots

In [None]:
plt.scatter(X[:, 0], X[:, 2], c=y, s=60)

In [None]:
plt.scatter(X[:, 3], X[:, 2], c=y, s=30)

In [None]:
fig, ax = plt.subplots(3, 3, figsize=(15, 15))
plt.suptitle("iris pairplot")

for i in range(3):
	for j in range(3):
		ax[i,j].scatter(X[:,j], X[:,i+1], c=y, s=60) #
		ax[i,j].set_xticks(())
		ax[i,j].set_yticks(())
		if i == 2:
			ax[i,j].set_xlabel(iris['feature_names'][j])
		if j == 0:
			ax[i,j].set_ylabel(iris['feature_names'][i+1])
		if j > i:
			ax[i,j].set_visible(False)
		

In [None]:
print(len(fig.axes)) # Number of axes in the figure
print(len(fig.axes) - len(fig.get_axes())) # Number of invisible axes