# Module 1: Recap

## **Exercise 1.1: Numpy**

Numpy let's you work efficiently with arrays of values. Pandas is an extension on top of numpy which adds additional functionality and usability.

In [None]:
import numpy as np
# Creating an example array of ones.
a = np.ones(10)
a

In [None]:
# Perform mathematical operations on the whole array
a + 1

In [None]:
# Divide each value in "a" by two.
# TODO

In [None]:
# Double every number in "a".
# TODO

In [None]:
# You can also create arrays from normal python lists
b = np.array([2, 4, 6, 8, 10, 1, 3, 5, 7, 9])
b

In [None]:
# Operations between arrays is also supported
a - b

In [None]:
# Divide "a" by "b" and see what happens.
# TODO

In [None]:
# You can access specific elements inside an array by using the "[]"-operator
# This gives us the first element in the array
b[0]

In [None]:
# This gives you the last element in the array
b[-1]
# which is the same as b[len(b)-1]

In [None]:
# Return the third element in the array b
# TODO

In [None]:
# You can also access ranges of elements.
# Example: Accessing the second to fourth element in the array.
b[1:4]

In [None]:
# Access all element but the last
# TODO

In [None]:
# Multi-dimensional arrays are also possible with numpy
# You can reshape an array into a multi-dimensional array.
# Example: Reshaping the one-dimensional array "b" into a two-dimensional one.
c = b.reshape(2, 5)
c

In [None]:
# Access the first dimension in array "c".
c[0]

In [None]:
# Now access the second dimension in array "c".
# TODO

In [None]:
# You can select an element by specifying the position in both dimensions (axis) separated by a comma.
# Example: Selecting the element in the first row and second column.
c[0, 1]

In [None]:
# Now select the fourth column in the second row.
# TODO

In [None]:
# You can also select all elements by using the ":"-operator.
# Example: Selecting all elements of the third column.
c[:, 2]

In [None]:
# Select all elements of the first column.
# TODO

## **Exercise 1.1.2: Pandas**

Pandas is an additional layer on top of numpy. Pandas internally uses the efficient numpy arrays, but adds additional named indexing and other convenience functions.

In [None]:
import pandas as pd
# You can convert every numpy array into a pandas DataFrame
df = pd.DataFrame(b)
df

In [None]:
# You can access the numpy array with the .values attribute
df.values

In [None]:
# DataFrames are always 2-D tables
# You can give names to columns and rows
df = pd.DataFrame(b, columns=['values'], index=[f'index {i}' for i in range(len(b))])
df

In [None]:
# You can add columns at any time
df['array_a'] = a
df

In [None]:
# You can access the column with the same method
df['array_a']

In [None]:
# Create the new column "added" which is the sum of "array_a" and "values"
# TODO

In [None]:
# You can index by the index column with
df.loc['index 2']
# which gives you a single row of the DataFrame

### **Exercise 1.1.3: Matplotlib**

Matplotlib will be the go-to visualization library for the following exercises.

We can use matplotlib directly or use pandas plotting for visualizations.

In [None]:
# Import matplotlib
import matplotlib.pyplot as plt

In [None]:
# Create a plot
plt.plot([1, 2, 3, 4], [1, 2, 4, 8])

In [None]:
# The graph will automatically be plotted after the cell ends.
# You can control when the plot is shown with plt.show()
# You can customize aspects of the plot with setter functions
plt.plot([1, 2, 3, 4], [1, 2, 4, 8], label='line1')
plt.plot([1, 2, 3, 4], [1.5, 2.5, 4.5, 8.5], label='line2')
plt.xlabel('x-axis')
plt.ylabel('y-axis')
plt.title('Line plot')
# Show a legend with the label information of all lines
plt.legend()
plt.show()

In [None]:
# There are different plotting functions available
# E.g., scatter plot
plt.scatter([1, 2, 3, 4], [1, 2, 4, 8])
plt.title('Scatter plot')

In [None]:
# Create a scatter plot from the "df" of the previous exercise
# The column "added" will be the x-value and the "values" column the y-axis
# TODO

In [None]:
# Create the same plot by using df.plot.scatter()
# TODO

### **Exercise 1.1.3: Scikit-learn**

Scikit-learn or sklearn for short is a python machine learning framework, which includes the most important machine learning models and a number of preprocessing methods.

The framework has a consistent framework which applies to everything you can find inside sklearn.

Let's take a look at this framework.

In [None]:
# This statement imports a simple dataset to play around with
from sklearn.datasets import make_classification

x, y = make_classification(n_features=2, n_redundant=0, n_informative=2, random_state=1, n_clusters_per_class=1)

In [None]:
# Visualize the dataset with a scatter plot. Using the two features as x-axis and y-axis and the class as the coloring
# TODO

In [None]:
# "Transformers" are used to preprocess the data before they are used in a model
from sklearn.preprocessing import StandardScaler

# First initialize a new StandardScaler object
scaler = StandardScaler()
# Call the fit method of the scaler and pass the x variable as parameter
scaler.fit(x)
# This will fit the scaler object on the data
# Now call the transformer method of the scaler object and pass the x variable again
# Save the transformed x in a new variable
# TODO

In [None]:
# Now that we scaled the data it can be used in an "estimator"
# Instead of "fit" and "transform", estimators have the method "fit" and "predict"
# Based on the data seen during the training ("fit") the model can make predicitons ("predict")
from sklearn.linear_model import LogisticRegression
# Initialize the LogisticRegression model
# TODO
# Fit the LogisticRegression object on the data (x, y)
# TODO
# Perform a prediciton on the features (x) and save the results in a variable
# TODO

In [None]:
# Now that you made a predicition, we have to test how well the model performs
# Sklearn provides a number of metrics to test your models
from sklearn.metrics import accuracy_score
# Use the accuracy_score to calculate the accuracy of your model
# Tip: The accuracy_score needs the true labels and the predicted labels as input
# TODO