# Numpy, Pandas, Scikit-Learn

## NumPy: [NumPy](https://numpy.org/) [Quickstart](https://numpy.org/devdocs/user/quickstart.html)

From the last practicals -> Go through the Quickstart and learn the basic commands of numpy. Why do you think that we should use this for further machine learning tasks?

### Some more NumPy tasks

In [71]:
import numpy as np

#### Loops

In [None]:
# use numpy for loops, e. g. how can you loop floating point numbers in Python e. g. (0.1, 0.2, 0.3, 0.4, ..., 1).
# Of course, there are solutions to do that but try it out with numpy arange.


# 1) Code the pythonic way for looping floating point numbers: [0.1, 0.2, 0.3, 0.4, ..., 1]

import array as arr

a = arr.array('d', [])
for i in range(10):
    a.insert(i, (i+1)/10)

print(a)

# 2) Code the Numpy way:

a = np.arange(.1,1.1,.1)
print(a)

# This applies for more than only this example. But what do you find more readable?
'''
Ans:
Numpy is more readable as it clearly indicates for 'arange' syntax if there is three parantheses,
first one indicates start, second one indicates stop, and third one indicates steps.
'''

#### List operations

In [None]:
# Define the list with name _list in pythonic way and numpy way
_list = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

# 1) pythonic way

_list = []
n = 1
for i in range(5):
    rowList = []
    for j in range(2):
            rowList.append(n)
            n += 1
    _list.append(rowList)
print(_list)

# 2) Numpy way

_list = np.arange(1,11).reshape(5,2)
print(_list)


#### Matrix multiplication

In [None]:
# Define the pythonic way for the following numpy expressions

#### Task 1
_list = [[2.1, 3.5, 7.2], [4.5, 6.6, 7.7]]
# 1) Numpy way

_list = np.array([[2.1, 3.5, 7.2],[4.5, 6.6, 7.7]])
print(_list)

# Numpy transpose
_np_list = np.array(_list).T
print('Numpy transpose', _np_list)

# 2) Pythonic way for numpy transpose

result = [[_list[j][i] for j in range(len(_list))] for i in range(len(_list[0]))]

for r in result:
   print(r)

#### Task 2
_list1 = [[2, 3, 7], [4, 6, 7]]
_list2 = [[3, 4], [7, 1], [1, 8]]
# 1) Numpy mat mul

_np_mul = np.matmul(_list1, _list2)
print('\nNumpy mat mul:\n', _np_mul)

# 2) Pythonic way

result = [[sum(a*b for a,b in zip(X_row,Y_col)) for Y_col in zip(*_list2)] for X_row in _list1]

for r in result:
   print(r)

## Pandas: [Pandas](https://pandas.pydata.org/) [Quickstart](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html)

In [None]:
import pandas as pd

In [None]:
# Download the churn dataset from Kaggle: https://www.kaggle.com/datasets/shubh0799/churn-modelling?resource=download
# 1) Use the .csv file and read it via pandas
df = pd.read_csv("./Churn_Modelling.csv")


# 2) Print the columns of the pandas table
list_of_col = list(df.columns)
print(list_of_col)

# 3) Drop the column customerid
df.drop(columns = ['CustomerId'], axis=1)

# 4) Print the values for the columns 'Gender', 'Age', 'Tenure', 'Balance' only
g_a_t_b = df[['Gender', 'Age', 'Tenure', 'Balance']]
print(g_a_t_b.head())

# 5) Return only the rows where Geography == 'France' and columns 'Gender', 'Age', 'Tenure', 'Balance'
france_only = df.loc[df['Geography'] == 'France', ['Gender', 'Age', 'Tenure', 'Balance']]
print(france_only)

# 6) Group by the columns 'Geography', and 'Gender' and use the mean function to aggregate the churn rate ('exited' column)


## SKlearn: [Scikit-learn](https://scikit-learn.org/stable/) [Quickstart](https://scikit-learn.org/stable/getting_started.html)

In [None]:
# We start the k nearest neighbor algorithm with sklearn
from sklearn.neighbors import KNeighborsClassifier

In [None]:
# Use the following values and labels to calcualte the kNN
X = np.arange(0, 9).reshape(9,1)
y = [0, 0, 0, 1, 1, 1, 2, 2, 2]
# we know that we have three clusters [0 to 2] has label 0; [3 to 5] has label 1; [6 to 8] has label 2

In [None]:
# Start the KNN algorithm with the values above
knn_model = KNeighborsClassifier(n_neighbors = 3, algorithm= 'kd_tree').fit(y)


In [None]:
# predict the label for value 4
'''
Ans:
pridiction is label 4
we can see the pattern that if labels follows the values from the previous pattern
'''