# Numpy, Pandas, Scikit-Learn

## NumPy: [NumPy](https://numpy.org/) [Quickstart](https://numpy.org/devdocs/user/quickstart.html)

From the last practicals -> Go through the Quickstart and learn the basic commands of numpy. Why do you think that we should use this for further machine learning tasks?

### Some more NumPy tasks

In [14]:
import numpy as np

#### Loops

In [2]:
# use numpy for loops, e. g. how can you loop floating point numbers in Python e. g. (0.1, 0.2, 0.3, 0.4, ..., 1).
# Of course, there are solutions to do that but try it out with numpy arange.


# 1) Code the pythonic way for looping floating point numbers: [0.1, 0.2, 0.3, 0.4, ..., 1]
python_ls = [i/10 for i in range(0, 11, 1)]
print('Python list', python_ls)

# 2) Code the Numpy way:
numpy_ls = np.arange(0, 1.1, 0.1)
print('Numpy list', numpy_ls)

# This applies for more than only this example. But what do you find more readable?
# > Numpy arange is more readable due to the lack of calculations (in this case /10) and list iteration

Python list [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
Numpy list [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]


#### List operations

In [3]:
# Define the list with name _list in pythonic way and numpy way
_list = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

# 1) pythonic way
python_ls = [[i, i+1] for i in range(1, 11, 2)]
print('Python list', python_ls)

# 2) Numpy way
numpy_ls = np.arange(1, 11).reshape(5,2)
print('Numpy list', numpy_ls)

Python list [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
Numpy list [[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]]


#### Matrix multiplication

In [4]:
# Define the pythonic way for the following numpy expressions

#### Task 1
_list = [[2.1, 3.5, 7.2], [4.5, 6.6, 7.7]]
# 1) Numpy way

# Numpy transpose
_np_list = np.array(_list).T
print('Numpy transpose', _np_list)

# 2) Pythonic way for numpy transpose
# YOUR CODE...
python_ls = []
for col in range(len(_list[0])):
    rowT = []
    for row in range(len(_list)):
        rowT.append(_list[row][col])
    python_ls.append(rowT)
print('Python transpose', python_ls)




#### Task 2
_list1 = [[2, 3, 7], [4, 6, 7]]
_list2 = [[3, 4], [7, 1], [1, 8]]
# 1) Numpy mat mul
_np_mul = np.matmul(_list1, _list2)
print('\nNumpy mat mul:\n', _np_mul)

# 2) Pythonic way
# YOUR CODE...
_python_mul = []
for mrow in range(len(_list1)): # matrix row = 2 (len of _list1)
    _row = []
    for mcol in range(len(_list2[0])): # matrix col = 2 (col of _list2)
        _sum = 0
        for i in range(len(_list1[0])): # to iterate through col of _list1 or row of _list2
            _sum += (_list1[mrow][i] * _list2[i][mcol])
        
        _row.append(_sum)
    _python_mul.append(_row)

print('Python mat mul', _python_mul)



Numpy transpose [[2.1 4.5]
 [3.5 6.6]
 [7.2 7.7]]
Python transpose [[2.1, 4.5], [3.5, 6.6], [7.2, 7.7]]

Numpy mat mul:
 [[34 67]
 [61 78]]
Python mat mul [[34, 67], [61, 78]]


## Pandas: [Pandas](https://pandas.pydata.org/) [Quickstart](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html)

In [6]:
import pandas as pd

In [30]:
# Download the churn dataset from Kaggle: https://www.kaggle.com/datasets/shubh0799/churn-modelling?resource=download
# 1) Use the .csv file and read it via pandas
df = pd.read_csv("./Churn_Modelling.csv")

# 2) Print the columns of the pandas table
print('\n2)\n', 'Columns:', df.columns)

# 3) Drop the column customerid
df.drop(columns=["CustomerId"], inplace=True)
# print('\n3)\n', df.head())

# 4) Print the values for the columns 'Gender', 'Age', 'Tenure', 'Balance' only
print('\n4)\n', df.loc[:, ['Gender', 'Age', 'Tenure', 'Balance']].head())

# 5) Return only the rows where Geography == 'France' and columns 'Gender', 'Age', 'Tenure', 'Balance'
france_filtered = df[(df['Geography'] == 'France')].loc[:, ['Gender', 'Age', 'Tenure', 'Balance']]
# print('\n5)\n', france_filtered.head())

# 6) Group by the columns 'Geography', and 'Gender' and use the mean function to aggregate the churn rate ('exited' column)
grouped_mean = df.groupby(['Geography', 'Gender'])['Exited'].mean()
# print('\n6)\n', grouped_mean)



2)
 Columns: Index(['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geography',
       'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard',
       'IsActiveMember', 'EstimatedSalary', 'Exited'],
      dtype='object')

4)
    Gender  Age  Tenure    Balance
0  Female   42       2       0.00
1  Female   41       1   83807.86
2  Female   42       8  159660.80
3  Female   39       1       0.00
4  Female   43       2  125510.82


## SKlearn: [Scikit-learn](https://scikit-learn.org/stable/) [Quickstart](https://scikit-learn.org/stable/getting_started.html)

In [19]:
# We start the k nearest neighbor algorithm with sklearn
from sklearn.neighbors import KNeighborsClassifier

In [20]:
# Use the following values and labels to calcualte the kNN
X = np.arange(0, 9).reshape(9,1)
y = [0, 0, 0, 1, 1, 1, 2, 2, 2]
# we know that we have three clusters [0 to 2] has label 0; [3 to 5] has label 1; [6 to 8] has label 2

In [25]:
# Start the KNN algorithm with the values above
# YOUR CODE ...
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X, y)

In [29]:
# predict the label for value 4
# YOUR CODE ...
test = np.array([4]).reshape(1,1)
predicted = knn.predict(test)
print('Predicted label:', predicted)

Predicted label: [1]
