# Numpy, Pandas, Scikit-Learn

## NumPy: [NumPy](https://numpy.org/) [Quickstart](https://numpy.org/devdocs/user/quickstart.html)

From the last practicals -> Go through the Quickstart and learn the basic commands of numpy. Why do you think that we should use this for further machine learning tasks?

1. NumPy helps crunch numbers super fast, which is crucial for handling large amounts of data in machine learning.
2. It's good for doing math like multiplying matrices, which is common in machine learning.
3. NumPy buddies up with other popular Python tools used in machine learning, making it easy to share and use data between different parts of your project.
4. It's got handy tools for generating random numbers, which is important for things like setting up initial values in algorithms or creating test data.
5. NumPy doesn't store memory like regular Python lists do, which is great when you're working with lots of data.
6. It lets you do math operations on arrays of different shapes without a hassle, making your code cleaner and easier to understand.

### Some more NumPy tasks

In [None]:
import numpy as np



#### Loops

In [None]:
# use numpy for loops, e. g. how can you loop floating point numbers in Python e. g. (0.1, 0.2, 0.3, 0.4, ..., 1).
# Of course, there are solutions to do that but try it out with numpy arange.


# 1) Code the pythonic way for looping floating point numbers: [0.1, 0.2, 0.3, 0.4, ..., 1]

floating_list = [i / 10 for i in range(1, 11)]
print(floating_list)


# 2) Code the Numpy way:

float_array = np.arange(0.1, 1.1, 0.1)
print(float_array)

# This applies for more than only this example. But what do you find more readable?

'''For this specific example, both methods achieve the same result. However,
the NumPy way often feels more concise and readable, especially when dealing with larger arrays or more complex operations. 
Plus, it leverages the optimized implementations within NumPy for better performance.'''

#### List operations

In [None]:
# Define the list with name _list in pythonic way and numpy way
_list = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

# 1) pythonic way
_list = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
print(_list)
# 2) Numpy way
_list = np.array([[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]])
print(_list)

#### Matrix multiplication

In [None]:
# Define the pythonic way for the following numpy expressions

#### Task 1
_list = [[2.1, 3.5, 7.2], [4.5, 6.6, 7.7]]
# 1) Numpy way

# Numpy transpose
_np_list = np.array(_list).T
print('Numpy transpose', _np_list)

# 2) Pythonic way for numpy transpose
# YOUR CODE...

_list = [[2.1, 3.5, 7.2], [4.5, 6.6, 7.7]]

_py_transpose = [[row[i] for row in _list] for i in range(len(_list[0]))]
print('Pythonic transpose:', _py_transpose)

#### Task 2
_list1 = [[2, 3, 7], [4, 6, 7]]
_list2 = [[3, 4], [7, 1], [1, 8]]
# 1) Numpy mat mul
_np_mul = np.matmul(_list1, _list2)
print('\nNumpy mat mul:\n', _np_mul)

# 2) Pythonic way

# YOUR CODE...

_list1 = [[2, 3, 7], [4, 6, 7]]
_list2 = [[3, 4], [7, 1], [1, 8]]

# Pythonic way for matrix multiplication
def matrix_multiply(a, b):
    result = []
    for i in range(len(a)):
        row = []
        for j in range(len(b[0])):
            sum = 0
            for k in range(len(a[0])):
                sum += a[i][k] * b[k][j]
            row.append(sum)
        result.append(row)
    return result

_py_mul = matrix_multiply(_list1, _list2)
print('\nPythonic mat mul:\n', _py_mul)


## Pandas: [Pandas](https://pandas.pydata.org/) [Quickstart](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html)

In [None]:
import pandas as pd

In [None]:
# Download the churn dataset from Kaggle: https://www.kaggle.com/datasets/shubh0799/churn-modelling?resource=download
# 1) Use the .csv file and read it via pandas
df = pd.read_csv("./Churn_Modelling.csv")


# 2) Print the columns of the pandas table
print("Columns:", df.columns)


# 3) Drop the column customerid
df.drop(columns=['CustomerId'], inplace=True)


# 4) Print the values for the columns 'Gender', 'Age', 'Tenure', 'Balance' only
print(df[['Gender', 'Age', 'Tenure', 'Balance']])


# 5) Return only the rows where Geography == 'France' and columns 'Gender', 'Age', 'Tenure', 'Balance'

france_data = df[df['Geography'] == 'France'][['Gender', 'Age', 'Tenure', 'Balance']]
print("France data:\n", france_data)
# 6) Group by the columns 'Geography', and 'Gender' and use the mean function to aggregate the churn rate ('exited' column)
grouped_data = df.groupby(['Geography', 'Gender']).mean()['Exited']
print("Grouped data:\n", grouped_data)

## SKlearn: [Scikit-learn](https://scikit-learn.org/stable/) [Quickstart](https://scikit-learn.org/stable/getting_started.html)

In [None]:
# We start the k nearest neighbor algorithm with sklearn
from sklearn.neighbors import KNeighborsClassifier

In [None]:
# Use the following values and labels to calcualte the kNN
X = np.arange(0, 9).reshape(9,1)
y = [0, 0, 0, 1, 1, 1, 2, 2, 2]
# we know that we have three clusters [0 to 2] has label 0; [3 to 5] has label 1; [6 to 8] has label 2

In [None]:
# Start the KNN algorithm with the values above
# YOUR CODE ...
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

X = np.arange(0, 9).reshape(9, 1)
y = [0, 0, 0, 1, 1, 1, 2, 2, 2]

# Start the KNN algorithm
knn = KNeighborsClassifier(n_neighbors=3)  
knn.fit(X, y)  



In [None]:
# predict the label for value 4
# YOUR CODE ...

prediction = knn.predict([[4]])
print("Predicted label for value 4:", prediction[0])
