<center><img src='https://drive.google.com/uc?id=1_utx_ZGclmCwNttSe40kYA6VHzNocdET' height="60"></center>

AI TECH - Akademia Innowacyjnych Zastosowań Technologii Cyfrowych. Programu Operacyjnego Polska Cyfrowa na lata 2014-2020
<hr>

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>

<center>
Projekt współfinansowany ze środków Unii Europejskiej w ramach Europejskiego Funduszu Rozwoju Regionalnego
Program Operacyjny Polska Cyfrowa na lata 2014-2020,
Oś Priorytetowa nr 3 "Cyfrowe kompetencje społeczeństwa" Działanie  nr 3.2 "Innowacyjne rozwiązania na rzecz aktywizacji cyfrowej"
Tytuł projektu:  „Akademia Innowacyjnych Zastosowań Technologii Cyfrowych (AI Tech)”
    </center>

# Logistic regression

In this exercise you will train a logistic regression model via gradient descent in two simple scenarios.

The general setup is as follows:
* we are given a set of pairs $(x, y)$, where $x \in R^D$ is a vector of real numbers representing the features, and $y \in \{0,1\}$ is the target,
* for a given $x$ we model the probability of $y=1$ by $h(x):=g(w^Tx)$, where $g$ is the sigmoid function: $g(z) = \frac{1}{1+e^{-z}}$,
* to find the right $w$ we will optimize the so called logarithmic loss: $J(w) = -\frac{1}{n}\sum_{i=1}^n y_i \log{h(x_i)} + (1-y_i) \log{(1-h(x_i))}$,
* with the loss function in hand we can improve our guesses iteratively:
    * $w_j^{t+1} = w_j^t - \text{step_size} \cdot \frac{\partial J(w)}{\partial w_j}$,
* we can end the process after some predefined number of epochs (or when the changes are no longer meaningful).

Let's start with the simplest example - linear separated points on a plane.

In [68]:
import numpy as np

np.random.seed(123)

# these parametrize the line
a = 0.3
b = -0.2
c = 0.001

# True/False mapping
def lin_rule(x, noise=0.):
    return a * x[0] + b * x[1] + c + noise < 0.

# Just for plotting
def get_y_fun(a, b, c):
    def y(x):
        return - x * a / b - c / b
    return y

lin_fun = get_y_fun(a, b, c)

In [69]:
# Training data

n = 500
range_points = 1
sigma = 0.05

X = range_points * 2 * (np.random.rand(n, 2) - 0.5)
y = [lin_rule(x, sigma * np.random.normal()) for x in X]

print(X[:10])
print(y[:10])

[[ 0.39293837 -0.42772133]
 [-0.54629709  0.10262954]
 [ 0.43893794 -0.15378708]
 [ 0.9615284   0.36965948]
 [-0.0381362  -0.21576496]
 [-0.31364397  0.45809941]
 [-0.12285551 -0.88064421]
 [-0.20391149  0.47599081]
 [-0.63501654 -0.64909649]
 [ 0.06310275  0.06365517]]
[False, True, False, False, False, True, False, True, True, False]


Let's plot the data.

In [70]:
import plotly.express as px

# plotly has a problem with coloring boolean values, hence stringify
# see https://community.plotly.com/t/plotly-express-scatter-color-not-showing/25962
fig = px.scatter(x=X[:, 0], y=X[:, 1], color=list(map(str, y)))
x_range = [np.min(X[:, 0]), np.max(X[:, 1])]
fig.add_scatter(x=x_range, y=list(map(lin_fun, x_range)), name='ground truth border')
fig.show()

Now, let's implement and train a logistic regression model. You should obtain accuracy of at least 96%.

In [71]:
################################################################
# TODO: Implement logistic regression and compute its accuracy #
################################################################
a = 10. # our initial guess for _a
b = 10. # our initial guess for _b
c = 10. # our initial guess for _c
lr = 0.1 # step size

n_epochs = 2000 # number of passes over the training data

def sigmoid(x):
  return 1.0 / (1.0 + np.exp(-x))

def predict(a, b, c, xs=X):
    return [sigmoid(a * x[0] + b * x[1] + c) >= 0.5 for x in xs]

def evaluate(a, b, c, xs=X, ys=y):
    return np.sum(np.array(ys) == predict(a, b, c, xs)) / len(ys)

def get_gradient(a, b, c, xs=X, ys=y):
    num_of_obs = len(ys)
    lin_fun_vals = np.array(sigmoid(np.array(a * xs[:, 0] + b * xs[:, 1] + np.full((num_of_obs, ), c))))
    aux = lin_fun_vals - np.ones(num_of_obs) * ys
    g_a = (1 / num_of_obs) * np.dot(aux, np.array(xs[:, 0]))
    g_b = (1 / num_of_obs) * np.dot(aux, np.array(xs[:, 1]))
    g_c = (1 / num_of_obs) * np.sum(aux)

    return [g_a, g_b, g_c]

accs = [evaluate(a, b, c)]

for i in range(n_epochs):
    [g_a, g_b, g_c] = get_gradient(a, b, c, X, y)
    a = a - lr * g_a
    b = b - lr * g_b
    c = c - lr * g_c

    acc = evaluate(a, b, c, X, y)
    accs.append(acc)

    print(f'Iter: {i:>3} Acc: {acc:8.8f} a: {a:8.5f}, b: {b:8.5f}, c: {c:8.5f}')

Iter:   0 Acc: 0.47400000 a:  9.97271, b: 10.00636, c:  9.96313
Iter:   1 Acc: 0.47400000 a:  9.94541, b: 10.01269, c:  9.92633
Iter:   2 Acc: 0.47200000 a:  9.91808, b: 10.01897, c:  9.88959
Iter:   3 Acc: 0.47200000 a:  9.89073, b: 10.02522, c:  9.85291
Iter:   4 Acc: 0.47400000 a:  9.86336, b: 10.03142, c:  9.81630
Iter:   5 Acc: 0.47400000 a:  9.83596, b: 10.03759, c:  9.77976
Iter:   6 Acc: 0.47400000 a:  9.80855, b: 10.04372, c:  9.74328
Iter:   7 Acc: 0.47400000 a:  9.78112, b: 10.04981, c:  9.70686
Iter:   8 Acc: 0.47400000 a:  9.75366, b: 10.05586, c:  9.67051
Iter:   9 Acc: 0.47400000 a:  9.72618, b: 10.06187, c:  9.63422
Iter:  10 Acc: 0.47600000 a:  9.69869, b: 10.06784, c:  9.59800
Iter:  11 Acc: 0.47600000 a:  9.67117, b: 10.07378, c:  9.56185
Iter:  12 Acc: 0.47400000 a:  9.64364, b: 10.07967, c:  9.52576
Iter:  13 Acc: 0.47600000 a:  9.61608, b: 10.08552, c:  9.48973
Iter:  14 Acc: 0.47600000 a:  9.58850, b: 10.09133, c:  9.45378
Iter:  15 Acc: 0.47600000 a:  9.56091, b

Let's visually asses our model. We can do this by using our estimates for $a,b,c$.

In [72]:
#################################################################
# TODO: Pass your estimates for a,b,c to the get_y_fun function #
#################################################################
e_a = -7.16953
e_b = 4.98727
e_c =  0.05301
lin_fun2 = get_y_fun(e_a, e_b, e_c)

fig = px.scatter(x=X[:, 0], y=X[:, 1], color=list(map(str, y)))
x_range = [np.min(X[:, 0]), np.max(X[:, 1])]
fig.add_scatter(x=x_range, y=list(map(lin_fun, x_range)), name='ground truth border')
fig.add_scatter(x=x_range, y=list(map(lin_fun2, x_range)), name='estimated border')
fig.show()

Let's now complicate the things a little bit and make our next problem nonlinear.

In [73]:
# Parameters of the ellipse
s1 = 1.
s2 = 2.
r = 0.75
m1 = 0.15
m2 = 0.125

# 0/1 mapping, checks whether we are inside the ellipse
def circle_rule(x, y, noise=0.):
    return 1 if s1 * (x - m1) ** 2 + s2 * (y - m2) ** 2 + noise < r ** 2 else 0

In [74]:
# Training data

n = 500
range_points = 1

sigma = 0.1

X = range_points * 2 * (np.random.rand(n, 2) - 0.5)

y = [circle_rule(x, y, sigma * np.random.normal()) for x, y in X]

print(X[:10])
print(y[:10])

[[ 0.18633789  0.87560968]
 [-0.81999293  0.61838609]
 [ 0.22604784  0.28001611]
 [ 0.9846182  -0.35783437]
 [-0.27962406  0.07170775]
 [ 0.2501677  -0.37650776]
 [ 0.41264707 -0.8357508 ]
 [-0.61039043 -0.97349628]
 [ 0.49924022  0.89579621]
 [ 0.537422   -0.65425777]]
[0, 0, 1, 0, 1, 1, 0, 0, 0, 0]


Let's plot the data.

In [75]:
import plotly.graph_objects as go

fig = px.scatter(x=X[:, 0], y=X[:, 1], color=list(map(str, y)))

xgrid = np.arange(np.min(X[:, 0]), np.max(X[:, 0]), 0.003)
ygrid = np.arange(np.min(X[:, 1]), np.max(X[:, 1]), 0.003)
contour =  go.Contour(
        z=np.vectorize(circle_rule)(*np.meshgrid(xgrid, ygrid, indexing="ij")),
        x=xgrid,
        y=ygrid
    )
fig.add_trace(contour)
fig.show()

Now, let's train a logistic regression model to tackle this problem. Note that we now need a nonlinear decision boundary. You should obtain accuracy of at least 90%.

Hint:
<sub><sup><sub><sup><sub><sup>
Use feature engineering.
</sup></sub></sup></sub></sup></sub>

In [80]:
################################################################
# TODO: Implement logistic regression and compute its accuracy #
################################################################
a = 0. # our initial guess for _a
b = 0. # our initial guess for _b
c = 0. # our initial guess for _c
d = 0. # our initial guess for _d
e = 0.
lr = 0.1 # step size

n_epochs = 3000 # number of passes over the training data

def sigmoid(x):
  return 1.0 / (1.0 + np.exp(-x))

def predict(a, b, c, d, e, xs=X):
    return [sigmoid(a * x[0] + b * x[1] + c * x[2] + d * x[3] + e) >= 0.5 for x in xs]

def evaluate(a, b, c, d, e, xs=X, ys=y):
    return np.sum(np.array(ys) == predict(a, b, c, d, e, xs)) / len(ys)

def get_gradient(a, b, c, d, e, xs=X, ys=y):
    num_of_obs = len(ys)
    lin_fun_vals = np.array(sigmoid(np.array(a * xs[:, 0] + b * xs[:, 1] + c * xs[:, 2] + d * xs[:, 3] + np.full((num_of_obs, ), e))))
    aux = lin_fun_vals - np.ones(num_of_obs) * ys
    g_a = (1 / num_of_obs) * np.dot(aux, np.array(xs[:, 0]))
    g_b = (1 / num_of_obs) * np.dot(aux, np.array(xs[:, 1]))
    g_c = (1 / num_of_obs) * np.dot(aux, np.array(xs[:, 2]))
    g_d = (1 / num_of_obs) * np.dot(aux, np.array(xs[:, 3]))
    g_e = (1 / num_of_obs) * np.sum(aux)

    return [g_a, g_b, g_c, g_d, g_e]

Z = np.hstack((X**2, X[:, 0].reshape(500, -1), X[:, 1].reshape(500, -1)))
accs = [evaluate(a, b, c, d, e, Z)]

for i in range(n_epochs):
    [g_a, g_b, g_c, g_d, g_e] = get_gradient(a, b, c, d, e, Z, y)
    a = a - lr * g_a
    b = b - lr * g_b
    c = c - lr * g_c
    d = d - lr * g_d
    e = e - lr * g_e

    acc = evaluate(a, b, c, d, e, Z, y)
    accs.append(acc)

    print(f'Iter: {i:>3} Acc: {acc:8.8f} a: {a:8.5f}, b: {b:8.5f}, c: {c:8.5f}, d: {d:8.5f}, e: {d:8.5f}')

Iter:   0 Acc: 0.70600000 a: -0.01161, b: -0.01413, c:  0.00549, d:  0.00496, e:  0.00496
Iter:   1 Acc: 0.70600000 a: -0.02294, b: -0.02798, c:  0.01093, d:  0.00987, e:  0.00987
Iter:   2 Acc: 0.70600000 a: -0.03400, b: -0.04156, c:  0.01630, d:  0.01473, e:  0.01473
Iter:   3 Acc: 0.70600000 a: -0.04480, b: -0.05489, c:  0.02163, d:  0.01954, e:  0.01954
Iter:   4 Acc: 0.70600000 a: -0.05535, b: -0.06796, c:  0.02689, d:  0.02430, e:  0.02430
Iter:   5 Acc: 0.70600000 a: -0.06566, b: -0.08078, c:  0.03211, d:  0.02902, e:  0.02902
Iter:   6 Acc: 0.70600000 a: -0.07574, b: -0.09337, c:  0.03727, d:  0.03368, e:  0.03368
Iter:   7 Acc: 0.70600000 a: -0.08559, b: -0.10573, c:  0.04238, d:  0.03830, e:  0.03830
Iter:   8 Acc: 0.70600000 a: -0.09521, b: -0.11786, c:  0.04744, d:  0.04287, e:  0.04287
Iter:   9 Acc: 0.70600000 a: -0.10463, b: -0.12979, c:  0.05245, d:  0.04740, e:  0.04740
Iter:  10 Acc: 0.70600000 a: -0.11384, b: -0.14150, c:  0.05741, d:  0.05188, e:  0.05188
Iter:  11 

Let's visually asses our model.

Contrary to the previous scenario, converting our weights to parameters of the ground truth curve may not be straightforward. It's easier to just provide predictions for a set of points in $R^2$.

In [84]:
h = .02

xgrid = np.arange(np.min(X[:, 0]), np.max(X[:, 0]), h)
ygrid = np.arange(np.min(X[:, 1]), np.max(X[:, 1]), h)

xx, yy = np.meshgrid(xgrid, ygrid, indexing="ij")
X_plot = np.c_[xx.ravel(), yy.ravel()]

print(X_plot.shape)

_X = np.concatenate([X_plot, X_plot**2], axis=1)

Z = np.hstack((_X**2, _X[:, 0].reshape(-1, 1), _X[:, 1].reshape(-1, 1)))
preds = np.array(predict(a, b, c, d, e, Z))
print(preds.shape)


(10000, 2)
(10000,)


In [83]:
fig = px.scatter(x=X[:, 0], y=X[:, 1], color=list(map(str, y)))

xx, yy = np.meshgrid(xgrid, ygrid, indexing="ij")

contour = go.Contour(z=preds.reshape(len(xgrid), len(ygrid)), x=xgrid, y=ygrid)
fig.add_trace(contour)
fig.show()

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>