# Numpy, Pandas, Scikit-Learn

## NumPy: [NumPy](https://numpy.org/) [Quickstart](https://numpy.org/devdocs/user/quickstart.html)

From the last practicals -> Go through the Quickstart and learn the basic commands of numpy. Why do you think that we should use this for further machine learning tasks?

### Some more NumPy tasks

In [6]:
import numpy as np

#### Loops

In [15]:
# use numpy for loops, e. g. how can you loop floating point numbers in Python e. g. (0.1, 0.2, 0.3, 0.4, ..., 1).
# Of course, there are solutions to do that but try it out with numpy arange.


# 1) Code the pythonic way for looping floating point numbers: [0.1, 0.2, 0.3, 0.4, ..., 1]
array1 = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]
print ("Python:", end = " ")
for element in array1:
  print (str(element) + ',', end = " ")

# 2) Code the Numpy way:
print("\n \nNumpy:")
array2 = np.array(array1)
array2

# This applies for more than only this example. But what do you find more readable?

Python: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 
 
Numpy:


array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

#### List operations

In [17]:
# Define the list with name _list in pythonic way and numpy way
_list = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

# 1) pythonic way
list1 = []
for element in _list:
  list1.append(element)
print ("Python:",list1, end = "\n")

# 2) Numpy way
print("\nNumpy:")
list2 = np.array(_list)
list2

Python: [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

Numpy:


array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10]])

#### Matrix multiplication

In [20]:
# Define the pythonic way for the following numpy expressions

#### Task 1
_list = [[2.1, 3.5, 7.2], [4.5, 6.6, 7.7]]
# 1) Numpy way

# Numpy transpose
_np_list = np.array(_list).T
print('Numpy transpose', _np_list)

# 2) Pythonic way for numpy transpose
# YOUR CODE...

list1 = []
columns = len(_list[0])
for column in range(columns):
    list1.append([_list[0][column], _list[1][column]])

print ("\nPython:", list1, end = "\n")

#### Task 2
_list1 = [[2, 3, 7], [4, 6, 7]]
_list2 = [[3, 4], [7, 1], [1, 8]]
# 1) Numpy mat mul
_np_mul = np.matmul(_list1, _list2)
print('\nNumpy mat mul:\n', _np_mul)

# 2) Pythonic way

# YOUR CODE...

result_matrix = []
for row1 in range(len(_list1)):
  summed_row = []

  for column in range(len(_list2[0])):
    sum = 0

    for row2 in range(len(_list2)):
      sum += _list1[row1][row2] * _list2[row2][column]
    summed_row.append(sum)
  result_matrix.append(summed_row)

print ("\nPython:", result_matrix)

Numpy transpose [[2.1 4.5]
 [3.5 6.6]
 [7.2 7.7]]

Python: [[2.1, 4.5], [3.5, 6.6], [7.2, 7.7]]

Numpy mat mul:
 [[34 67]
 [61 78]]

Python: [[34, 67], [61, 78]]


## Pandas: [Pandas](https://pandas.pydata.org/) [Quickstart](https://pandas.pydata.org/docs/getting_started/intro_tutorials/index.html)

In [7]:
import pandas as pd

In [39]:
# Download the churn dataset from Kaggle: https://www.kaggle.com/datasets/shubh0799/churn-modelling?resource=download
# 1) Use the .csv file and read it via pandas
df = pd.read_csv("./Churn_Modelling.csv")

# 2) Print the columns of the pandas table
# print (df)
print ("Answer 2: \n")
print(df.columns, "\n")

# 3) Drop the column customerid
print ("Answer 3: \n")
del df['CustomerId']
print (df, "\n")

# 4) Print the values for the columns 'Gender', 'Age', 'Tenure', 'Balance' only

print ("Answer 4: \n")
print (df[['Gender', 'Age', 'Tenure', 'Balance']], "\n")

# 5) Return only the rows where Geography == 'France' and columns 'Gender', 'Age', 'Tenure', 'Balance'

print ("Answer 5: \n")
print (df[df['Geography'] == 'France'][['Gender', 'Age', 'Tenure', 'Balance']], "\n")

# 6) Group by the columns 'Geography', and 'Gender' and use the mean function to aggregate the churn rate ('exited' column)
print ("Answer 6: \n")
print(df.groupby(['Geography', 'Gender'])['Exited'].mean())


Answer 2: 

Index(['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geography',
       'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard',
       'IsActiveMember', 'EstimatedSalary', 'Exited'],
      dtype='object') 

Answer 3: 

      RowNumber    Surname  CreditScore Geography  Gender  Age  Tenure  \
0             1   Hargrave          619    France  Female   42       2   
1             2       Hill          608     Spain  Female   41       1   
2             3       Onio          502    France  Female   42       8   
3             4       Boni          699    France  Female   39       1   
4             5   Mitchell          850     Spain  Female   43       2   
...         ...        ...          ...       ...     ...  ...     ...   
9995       9996   Obijiaku          771    France    Male   39       5   
9996       9997  Johnstone          516    France    Male   35      10   
9997       9998        Liu          709    France  Female   36       7   
9998     

## SKlearn: [Scikit-learn](https://scikit-learn.org/stable/) [Quickstart](https://scikit-learn.org/stable/getting_started.html)

In [8]:
# We start the k nearest neighbor algorithm with sklearn
from sklearn.neighbors import KNeighborsClassifier

In [9]:
# Use the following values and labels to calcualte the kNN
X = np.arange(0, 9).reshape(9,1)
y = [0, 0, 0, 1, 1, 1, 2, 2, 2]
# we know that we have three clusters [0 to 2] has label 0; [3 to 5] has label 1; [6 to 8] has label 2

In [10]:
# Start the KNN algorithm with the values above
# YOUR CODE ...
clf = KNeighborsClassifier(n_neighbors=3)
clf.fit(X, y)

In [None]:
# predict the label for value 4
# YOUR CODE ...
print(clf.predict([[4]]))