First, we start by importing modules and reading the dataset to train to a pandas dataframe.

In [18]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

dataset_train = f'../datasets/dataset_train.csv'
df = pd.read_csv(dataset_train)

# Model training

## Multinomial logistic regression

When there is only 2 possible outcomes for the target, classification is ```binomial```.

```Multinomial``` : The target variable has three or more possible classes.
Our dataset has a discrete number, `4` of possible outcomes = ['Ravenclaw', 'Slytherin', 'Gryffindor', 'Hufflepuff'].

```One-Vs-All Classification``` is a method of multi-class classification.
Breaking down by splitting up the multi-class classification problem into `multiple binary classifier models`.

For a One-vs-All multi-class classification, our dataset, which has k = 4 class labels, we will be using k = 4  ```binary classifiers```.


In [19]:
df['Hogwarts House'].unique().tolist()

['Ravenclaw', 'Slytherin', 'Gryffindor', 'Hufflepuff']


### Binary classifier

For building our model, there will be a set of ```weights```, $W =(𝑤_1,𝑤_2,⋯,𝑤_𝑛)$, that is specific to each ```binary classifier```.

For any given student, characterized by 𝑛 features, the Inputs values are $X =(𝑥_1,𝑥_2,⋯,𝑥_𝑛)$.

The ```dot product``` of ```X``` inputs and ```W``` weights, plus 𝑏 being the bias parameter. will be ```z```.

$$
\mathbf{z} = 𝑏 + 𝑥_1.𝑤_1 + 𝑥_2.𝑤_2 + ... +𝑥_n.𝑤_n
$$

$$
\mathbf{z} = b + \mathbf{X} \cdot \mathbf{W}
$$

with
$$
\mathbf{X} \cdot \mathbf{W} = \sum_{i=1}^n 𝑥_i 𝑤_i
$$

### Sigmoid function

The ```sigmoid function``` also called ```logistic function```  can map input values from a wide range into a limited interval.

$Sigmoid function$
$$ y = g(z) = \frac{1}{1 + e^{-z}} = \frac{e^z}{1 + e^z}$$

This formula represents the `probability of observing the output y = 1`` of a Bernoulli random variable. This variable is either 1 or 0 :
$$
y \in \{0,1\}
$$

The ```sigmoid function``` will transform ```z``` into a value between 0 and 1.
The resulting ```predicted output``` is a probability that a student is meeting the ```binary classifier``` outcome.

$$
y = g(z) = \frac{1}{1 + e^{-z}} = \frac{1}{1 + e^{-(X.W + 𝑏)}}
$$

Since we have 4 possible outcomes,  4 ```binary classifiers``` [(Gryffindor, not Gryffindor), (Ravenclaw, not Ravenclaw), ...],
4 set of weights are needed. 


## Pre-training data processing

- pd.drop() : down to to 10 meaningful features, independent from each other
- pd.dropna() : Dropping rows that contain NaN => down to 1333 rows
- standardize with z-score method

In [20]:

df.drop(df.columns[2:6], inplace=True, axis = 1)
excluded_features = ["Arithmancy", "Defense Against the Dark Arts", "Care of Magical Creatures"]
df.drop(excluded_features, inplace=True, axis=1)
df.dropna(inplace=True)
df.head()

Unnamed: 0,Index,Hogwarts House,Astronomy,Herbology,Divination,Muggle Studies,Ancient Runes,History of Magic,Transfiguration,Potions,Charms,Flying
0,0,Ravenclaw,-487.886086,5.72718,4.722,272.035831,532.484226,5.231058,1039.788281,3.790369,-232.79405,-26.89
1,1,Slytherin,-552.060507,-5.987446,-5.612,-487.340557,367.760303,4.10717,1058.944592,7.248742,-252.18425,-113.45
2,2,Ravenclaw,-366.076117,7.725017,6.14,664.893521,602.585284,3.555579,1088.088348,8.728531,-227.34265,30.42
3,3,Gryffindor,697.742809,-6.497214,4.026,-537.001128,523.982133,-4.809637,920.391449,0.821911,-256.84675,200.64
4,4,Gryffindor,436.775204,-7.820623,2.236,-444.262537,599.324514,-3.444377,937.434724,4.311066,-256.3873,157.98


In [21]:
def standardize(arr: pd.Series):
    """z-score method, using pandas std"""
    mean = arr.mean()
    std = arr.std()
    return (arr - mean) / std


df_class = df['Hogwarts House'].copy(deep=True)
df_train= df.drop(df.columns[:2], axis = 1)
df_std_train = df_train.agg(lambda feature: standardize(feature))
df_std_train.head()


Unnamed: 0,Astronomy,Herbology,Divination,Muggle Studies,Ancient Runes,History of Magic,Transfiguration,Potions,Charms,Flying
0,-1.019405,0.86751,0.366766,1.01501,0.341729,0.50466,0.220663,-0.70162,1.193099,-0.508231
1,-1.142486,-1.376697,-2.140728,-0.547946,-1.205529,0.251192,0.657019,0.412017,-1.012445,-1.395502
2,-0.785784,1.250242,0.710837,1.823594,1.000191,0.126793,1.320875,0.888527,1.813171,0.079217
3,1.254526,-1.474355,0.197885,-0.650158,0.261869,-1.759797,-2.499039,-1.657499,-1.542783,1.824033
4,0.754013,-1.727884,-0.23645,-0.459282,0.969563,-1.451893,-2.110816,-0.53395,-1.490523,1.386752


In [9]:
df_std_train['Real Output'] = df_class
df_std_train.head()

Unnamed: 0,Astronomy,Herbology,Divination,Muggle Studies,Ancient Runes,History of Magic,Transfiguration,Potions,Charms,Flying,Real Output
0,-1.019787,0.867836,0.366904,1.015391,0.341857,0.50485,0.220746,-0.701884,1.193547,-0.508422,Ravenclaw
1,-1.142914,-1.377213,-2.141531,-0.548152,-1.205982,0.251287,0.657265,0.412172,-1.012825,-1.396026,Slytherin
2,-0.786079,1.250711,0.711103,1.824278,1.000567,0.126841,1.321371,0.88886,1.813851,0.079247,Ravenclaw
3,1.254996,-1.474908,0.197959,-0.650402,0.261967,-1.760458,-2.499977,-1.658121,-1.543362,1.824717,Gryffindor
4,0.754296,-1.728533,-0.236538,-0.459455,0.969927,-1.452438,-2.111608,-0.53415,-1.491082,1.387273,Gryffindor


### Example for One-vs-all :

Actual class y set to 1 or 0 (respectively y or not y)
- 1 = is in house,
- 0 = is in another house

In [11]:
houses = df['Hogwarts House'].unique()
houses[0]
# gives bool : df['Hogwarts House'] == houses[0]
y_actual = np.where(df['Hogwarts House'] == houses[0], 1, 0)
y_actual

array([1, 0, 1, ..., 0, 0, 0])

## Model training

for each `classifier` : 
- Input values of 𝑛 features $\mathbf{X}  =(𝑥_1,𝑥_2,⋯,𝑥_𝑛)$ 
- Weights $\mathbf{W} =(𝑤_1,𝑤_2,⋯,𝑤_𝑛)$
- 𝑏, the bias parameter.

so that $$\mathbf{z}= 𝑏 + 𝑥_1.𝑤_1 + 𝑥_2.𝑤_2 + ... +𝑥_n.𝑤_n = 𝑏 + \mathbf{X} \cdot \mathbf{W} = b + \sum_{i=1}^n 𝑥_i 𝑤_i$$,

$\mathbf{z} = b + \mathbf{X} \cdot \mathbf{W} =$ is feeding the logistic function 𝑔, and projects the output as the predicted probability of 𝑦 being equal to 1.

$$
\mathbf{y} = g(\mathbf{z}) = \frac{1}{1 + e^{-\mathbf{z}}} = \frac{1}{1 + e^{-(𝑏 + \mathbf{X} \cdot \mathbf{W})}}
$$

### Input : X Array

a column of ones is added to x_train array so that the bias is multiplied by 1.

$$
\begin{pmatrix}
\ 1 \\
\ x_1 \\
\ 𝑥_2 \\
\ ⋯ \\
\ 𝑥_𝑛 \\
\end{pmatrix}
$$

#### Weights

Transposed matrix of zeros, 
Shape : (one intercept + number of features) x (number of k classifiers)

$$
\begin{pmatrix}
\ b & \ 𝑤_1 & \  𝑤_2 & \ ⋯ & \ 𝑤_𝑛 \\ 
\end{pmatrix}
$$

### Dot product for one output 
$$
\mathbf{z} = 
\begin{pmatrix}
\ 1 \\
\ x_1 \\
\ 𝑥_2 \\
\ ⋯ \\
\ 𝑥_𝑛 \\
\end{pmatrix}
\cdot
\begin{pmatrix}
\ b & \ 𝑤_1 & \  𝑤_2 & \ ⋯ & \ 𝑤_𝑛 \\ 
\end{pmatrix}
=  𝑏 + 𝑥_1.𝑤_1 + 𝑥_2.𝑤_2 + ... +𝑥_n.𝑤_n
$$

### Intializing Model

- Standardized data in a numpy array 
- a column of ones added to the left for the `ìntercept` or bias


In [13]:
""" Parameters : unstandardized data to train without NaN, output """
df_std_train = df_train.agg(lambda course: standardize(course))
x_train_std = np.array(df_std_train)
ones = np.ones((len(x_train_std), 1), dtype=float)
x_train = np.concatenate((ones, x_train_std), axis=1)
features = df_std_train.columns.tolist()
df_class = df['Hogwarts House'].copy(deep=True)
houses = df_class.unique().tolist()
w_indexes = df_std_train[:-1].columns.insert(0, ['Intercept'])
df_weights = pd.DataFrame(columns=houses, index=w_indexes).fillna(0)
df_weights.head(11)

Unnamed: 0,Ravenclaw,Slytherin,Gryffindor,Hufflepuff
Intercept,0,0,0,0
Astronomy,0,0,0,0
Herbology,0,0,0,0
Divination,0,0,0,0
Muggle Studies,0,0,0,0
Ancient Runes,0,0,0,0
History of Magic,0,0,0,0
Transfiguration,0,0,0,0
Potions,0,0,0,0
Charms,0,0,0,0


In [14]:
df_std_train.head()

Unnamed: 0,Astronomy,Herbology,Divination,Muggle Studies,Ancient Runes,History of Magic,Transfiguration,Potions,Charms,Flying
0,-1.019787,0.867836,0.366904,1.015391,0.341857,0.50485,0.220746,-0.701884,1.193547,-0.508422
1,-1.142914,-1.377213,-2.141531,-0.548152,-1.205982,0.251287,0.657265,0.412172,-1.012825,-1.396026
2,-0.786079,1.250711,0.711103,1.824278,1.000567,0.126841,1.321371,0.88886,1.813851,0.079247
3,1.254996,-1.474908,0.197959,-0.650402,0.261967,-1.760458,-2.499977,-1.658121,-1.543362,1.824717
4,0.754296,-1.728533,-0.236538,-0.459455,0.969927,-1.452438,-2.111608,-0.53415,-1.491082,1.387273


# Training one-vs-all

In [15]:
def sigmoid(arr: np.ndarray) -> np.ndarray:
    return 1 / (1 + np.exp(-arr))

def update_weight_loss(weights, learning_rate, grad_desc):
    return weights - learning_rate * grad_desc

def train_one_vs_all(house, df_class, features, x_train, learning_rate, epochs):
    """
    loss_iter = LogRegTrain.loss_function(y_actual, h_pred)
    gradient = np.dot(x_train.T, (h_pred - y_actual))
    """
    y_actual = np.where(df_class == house, 1, 0)
    weights = np.ones(len(features) + 1).T
    for iter in range(epochs):
        z_output = np.dot(x_train, weights)
        h_pred = sigmoid(z_output)
        tmp = np.dot(x_train.T, (h_pred - y_actual))
        grad_desc = tmp / y_actual.shape[0]
        weights = update_weight_loss(weights, learning_rate, grad_desc)
    return weights


learning_rate = 0.1
epochs = 1000
for house in houses:
    weights = train_one_vs_all(house, df_class, features, x_train, learning_rate, epochs)
    df_weights[house] = weights
print("alpha = ", learning_rate, "  iterations =", epochs)
df_weights.head(11)

alpha =  0.1   iterations = 1000


Unnamed: 0,Ravenclaw,Slytherin,Gryffindor,Hufflepuff
Intercept,-2.063096,-3.078773,-2.551754,-1.838897
Astronomy,-0.624236,-0.670175,0.744075,2.292742
Herbology,0.251666,-1.008444,-1.105922,1.198677
Divination,-0.124341,-1.982947,0.20734,0.582539
Muggle Studies,1.658823,-0.162779,0.083572,-0.767586
Ancient Runes,1.302175,-0.275755,1.209117,-1.222594
History of Magic,0.456105,0.376232,-0.460339,1.317395
Transfiguration,0.740743,0.885948,-0.592045,1.13428
Potions,-0.037885,0.618717,0.033944,-0.453786
Charms,0.96041,-0.606031,-0.472412,-0.395965


    We have the parameters (biases and weights) for our logistic regression model !

STOPS HERE