# CSC 578D / Data Mining / Fall 2018 / University of Victoria
# Python Notebook explaining Assignment 02 / Problem 01

**Author:** Andreas P. Koenzen (akoenzen => uvic.ca)
<br>
**Version:** 0.1

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

from IPython.core.display import display, HTML
HTML('<style>table {float:left} table th,td {font-size:100%}</style>')

In [None]:
data = pd.DataFrame(data={
    'GPA': [1, 0.9, 0.9, 0.7, 0.6],
    'GRE': [1, 1, 0.875, 0.75, 0.875],
    'Dummy': [1, 1, 1, 1, 1],
    'y': [1, 1, 1, -1, -1]
})
data

## Plot the data:

In [None]:
_, ax = plt.subplots()
ax.set(xlabel='GRE', ylabel='GPA')
data.plot.scatter(x=1, 
                  y=0, 
                  c=3, 
                  colormap='jet', 
                  ax=ax)
plt.title('GPA vs. GRE Plot')
plt.show();

## Solution to Problem #1 of Assignment #2:
### (4 points) Logistic Regression (by hand):
Suppose we have the following normalized dataset:

$
\begin{array}{lllr}
GPA  &GRE   &Dummy  &y  \\ \hline
1    &1     &1      &1  \\
0.9  &1	    &1	    &1  \\
0.9	 &0.875	&1	    &1  \\
0.7	 &0.75	&1	    &-1 \\
0.6	 &0.875	&1	    &-1 \\
\end{array}
$
			
and want to construct a Logistic Regression classifier using Gradient Descent for Maximum Likelihood. If we start with an all zero weight vector, what will the weight vector be after the first iteration? (Consider kappa=2)
Show the details of your calculations. This is a pencil and paper exercise. See the posted Excel spreadsheet for an example.

#### Notes:
- 3 significant digits are used for this exercise.
- Results are rounded up if 3rd decimal digit is >= 5.
- Since the probability threshold is not specified, I will use $P\left(y \mid \vec{x}\right) > 0.5$.

## Initial Values:

- The value of Kappa (Rate of descent) is: $\kappa = 2$
- The initial weight vector is: 
$
\vec{w} = 
\left[
    \begin{array}{c}
    0.0^{(x_2)} & 0.0^{(x_1)} & 0.0^{(x_0)}
    \end{array}
\right]
$

## Iteration 1:

### i = 0

$\frac{y_i \vec{x_i}^{\,2}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (1 x 1) / (1 + exp(1 x \[(0 x 1) + (0 x 1) + (0 x 1)\]) = **0.5**

$\frac{y_i \vec{x_i}^{\,1}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (1 x 1) / (1 + exp(1 x \[(0 x 1) + (0 x 1) + (0 x 1)\]) = **0.5**

$\frac{y_i \vec{x_i}^{\,0}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (1 x 1) / (1 + exp(1 x \[(0 x 1) + (0 x 1) + (0 x 1)\]) = **0.5**

$E(\vec{w})$ = ln(1 + exp(-1 x \[(0 x 1) + (0 x 1) + (0 x 1)\]) = **0.69**

$P\left(y=1 \mid \vec{x}\right)$ = 1 / (1 + exp(-1 x \[(0 x 1) + (0 x 1) + (0 x 1)\]) = **0.5**

### i = 1

$\frac{y_i \vec{x_i}^{\,2}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (1 x 0.9) / (1 + exp(1 x \[(0 x 0.9) + (0 x 1) + (0 x 1)\]) = **0.45**

$\frac{y_i \vec{x_i}^{\,1}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (1 x 1) / (1 + exp(1 x \[(0 x 0.9) + (0 x 1) + (0 x 1)\]) = **0.5**

$\frac{y_i \vec{x_i}^{\,0}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (1 x 1) / (1 + exp(1 x \[(0 x 0.9) + (0 x 1) + (0 x 1)\]) = **0.5**

$E(\vec{w})$ = ln(1 + exp(-1 x \[(0 x 0.9) + (0 x 1) + (0 x 1)\]) = **0.69**

$P(y=1 \mid \vec{x})$ = 1 / (1 + exp(-1 x \[(0 x 0.9) + (0 x 1) + (0 x 1)\]) = **0.5**

### i = 2

$\frac{y_i \vec{x_i}^{\,2}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (1 x 0.9) / (1 + exp(1 x \[(0 x 0.9) + (0 x 0.875) + (0 x 1)\]) = **0.45**

$\frac{y_i \vec{x_i}^{\,1}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (1 x 0.875) / (1 + exp(1 x \[(0 x 0.9) + (0 x 0.875) + (0 x 1)\]) = **0.44**

$\frac{y_i \vec{x_i}^{\,0}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (1 x 1) / (1 + exp(1 x \[(0 x 0.9) + (0 x 0.875) + (0 x 1)\]) = **0.5**

$E(\vec{w})$ = ln(1 + exp(-1 x \[(0 x 0.9) + (0 x 0.875) + (0 x 1)\]) = **0.69**

$P(y=1 \mid \vec{x})$ = 1 / (1 + exp(-1 x \[(0 x 0.9) + (0 x 0.875) + (0 x 1)\]) = **0.5**

### i = 3

$\frac{y_i \vec{x_i}^{\,2}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (-1 x 0.7) / (1 + exp(-1 x \[(0 x 0.7) + (0 x 0.75) + (0 x 1)\]) = **-0.35**

$\frac{y_i \vec{x_i}^{\,1}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (-1 x 0.75) / (1 + exp(-1 x \[(0 x 0.7) + (0 x 0.75) + (0 x 1)\]) = **-0.38**

$\frac{y_i \vec{x_i}^{\,0}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (-1 x 1) / (1 + exp(-1 x \[(0 x 0.7) + (0 x 0.75) + (0 x 1)\]) = **-0.5**

$E(\vec{w})$ = ln(1 + exp(1 x \[(1 x 0.7) + (1 x 0.75) + (1 x 1)\]) = **0.69**

$P(y=-1 \mid \vec{x})$ = 1 / (1 + exp(1 x \[(0 x 0.7) + (0 x 0.75) + (0 x 1)\]) = **0.5**

### i = 4

$\frac{y_i \vec{x_i}^{\,2}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (-1 x 0.6) / (1 + exp(-1 x \[(0 x 0.6) + (0 x 0.875) + (0 x 1)\]) = **-0.3**

$\frac{y_i \vec{x_i}^{\,1}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (-1 x 0.875) / (1 + exp(-1 x \[(0 x 0.6) + (0 x 0.875) + (0 x 1)\]) = **-0.44**

$\frac{y_i \vec{x_i}^{\,0}}{1 + e^{y_i\vec{w}^{\,T}\vec{x}}}$ = (-1 x 1) / (1 + exp(-1 x \[(0 x 0.6) + (0 x 0.875) + (0 x 1)\]) = **-0.5**

$E(\vec{w})$ = ln(1 + exp(1 x \[(0 x 0.6) + (0 x 0.875) + (0 x 1)\]) = **0.69**

$P(y=-1 \mid \vec{x})$ = 1 / (1 + exp(1 x \[(0 x 0.6) + (0 x 0.875) + (0 x 1)\]) = **0.5**

### Computation table:

| i | GPA | GRE | Dummy | y | $\frac{y_i \vec{x_i}^{\,2}}{1 + e^{y_i\vec{w}^{\,T}\vec{x_i}}}$ | $\frac{y_i \vec{x_i}^{\,1}}{1 + e^{y_i\vec{w}^{\,T}\vec{x_i}}}$ | $\frac{y_i \vec{x_i}^{\,0}}{1 + e^{y_i\vec{w}^{\,T}\vec{x_i}}}$ | Error | $P(y \mid \vec{x})$ |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 0 | 1   | 1     | 1 | 1  | 0.5 | 0.5 | 0.5 | 0.69 | 0.5 |
| 1 | 0.9 | 1     | 1 | 1  | 0.45 | 0.5 | 0.5 | 0.69 | 0.5 |
| 2 | 0.9 | 0.875 | 1 | 1  | 0.45 | 0.44 | 0.5 | 0.69 | 0.5 |
| 3 | 0.7 | 0.75  | 1 | -1 | -0.35 | -0.38 | -0.5 | 0.69 | 0.5 |
| 4 | 0.6 | 0.875 | 1 | -1 | -0.30 | -0.44 | -0.5 | 0.69 | 0.5 |
| - |  -  |   -   | - | -  | $\sum$ 0.75 | $\sum$ 0.62 | $\sum$ 0.5 | $\sum$ 0.69 | - |

### Results of iteration 1:

$E(\vec{w}) = \frac{1}{n} \sum_{k=1}^{n} \ln{(1 + e^{-y_i\vec{w}^{\,T}\vec{x}})} = 0.69$

$\nabla_{E}(\vec{w}) = (\frac{-1}{5}) \left[\begin{array}{c}0.75 & 0.62 & 0.5\end{array}\right] = \left[\begin{array}{c}-0.15 & -0.124 & -0.1\end{array}\right]$

#### Update the vector w:

$\vec{w} = \vec{w} - \kappa\nabla_{E}(\vec{w}) = \left[\begin{array}{c}0.3 & 0.25 & 0.2\end{array}\right]$

## Verification using Python:

In [None]:
kappa = 2
w = np.array([[0.0, 0.0, 0.0]])
x = np.array(data.values[:, :3])
y = np.array(data.values[:, 3:])
w = w - (kappa * (-1 / len(x)) * np.sum((y * x) / (1 + np.exp(y * (x @ w.T))), axis=0, keepdims=True))
print("Vector w after the first iteration is: {}".format(np.squeeze(w)))

In [None]:
_, ax = plt.subplots()
ax.set(xlabel='GRE', ylabel='GPA')
data.plot.scatter(
    x=1, 
    y=0, 
    c=3, 
    colormap='jet', 
    ax=ax)
w2 = np.asscalar(w[0][0])
w1 = np.asscalar(w[0][1])
w0 = np.asscalar(w[0][2])
plt.plot(
    np.linspace(0.0, 1.0, num=10),
    [((-(w1 * k) - (w0)) / w2) for k in np.linspace(0.0, 1.0, num=10)],
    c= "red"
)
plt.title('GPA vs. GRE Plot')
plt.show();

***
# END