<center><img src='https://drive.google.com/uc?id=1_utx_ZGclmCwNttSe40kYA6VHzNocdET' height="60"></center>

AI TECH - Akademia Innowacyjnych Zastosowań Technologii Cyfrowych. Programu Operacyjnego Polska Cyfrowa na lata 2014-2020
<hr>

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>

<center>
Projekt współfinansowany ze środków Unii Europejskiej w ramach Europejskiego Funduszu Rozwoju Regionalnego
Program Operacyjny Polska Cyfrowa na lata 2014-2020,
Oś Priorytetowa nr 3 "Cyfrowe kompetencje społeczeństwa" Działanie  nr 3.2 "Innowacyjne rozwiązania na rzecz aktywizacji cyfrowej"
Tytuł projektu:  „Akademia Innowacyjnych Zastosowań Technologii Cyfrowych (AI Tech)”
    </center>

# Logistic regression

In this exercise you will train a logistic regression model via gradient descent in two simple scenarios.

The general setup is as follows:
* we are given a set of pairs $(x, y)$, where $x \in R^D$ is a vector of real numbers representing the features, and $y \in \{0,1\}$ is the target,
* for a given $x$ we model the probability of $y=1$ by $h(x):=g(w^Tx)$, where $g$ is the sigmoid function: $g(z) = \frac{1}{1+e^{-z}}$,
* to find the right $w$ we will optimize the so called logarithmic loss: $J(w) = -\frac{1}{n}\sum_{i=1}^n y_i \log{h(x_i)} + (1-y_i) \log{(1-h(x_i))}$,
* with the loss function in hand we can improve our guesses iteratively:
    * $w_j^{t+1} = w_j^{t} - \eta \cdot \frac{\partial J(w)}{\partial w_j}$

* we can end the process after some predefined number of epochs (or when the changes are no longer meaningful).

Let's start with the simplest example - linear separated points on a plane.

In [47]:
import numpy as np

np.random.seed(123)

# these parametrize the line
a = 0.3
b = -0.2
c = 0.001

# True/False mapping
def lin_rule(x, noise=0.):
    return a * x[0] + b * x[1] + c + noise < 0.

# Just for plotting
def get_y_fun(a, b, c):
    def y(x):
        return - x * a / b - c / b
    return y

lin_fun = get_y_fun(a, b, c)

In [48]:
# Training data

n = 500
range_points = 1
sigma = 0.05

X = range_points * 2 * (np.random.rand(n, 2) - 0.5)
y = [lin_rule(x, sigma * np.random.normal()) for x in X]

print(X[:10])
print(y[:10])

[[ 0.39293837 -0.42772133]
 [-0.54629709  0.10262954]
 [ 0.43893794 -0.15378708]
 [ 0.9615284   0.36965948]
 [-0.0381362  -0.21576496]
 [-0.31364397  0.45809941]
 [-0.12285551 -0.88064421]
 [-0.20391149  0.47599081]
 [-0.63501654 -0.64909649]
 [ 0.06310275  0.06365517]]
[np.False_, np.True_, np.False_, np.False_, np.False_, np.True_, np.False_, np.True_, np.True_, np.False_]


Let's plot the data.

In [49]:
import plotly.express as px

# plotly has a problem with coloring boolean values, hence stringify
# see https://community.plotly.com/t/plotly-express-scatter-color-not-showing/25962
fig = px.scatter(x=X[:, 0], y=X[:, 1], color=list(map(str, y)))
x_range = [np.min(X[:, 0]), np.max(X[:, 1])]
fig.add_scatter(x=x_range, y=list(map(lin_fun, x_range)), name='ground truth border')
fig.show()

Now, let's implement and train a logistic regression model. You should obtain accuracy of at least 96%.

In [50]:
################################################################
# TODO: Implement logistic regression and compute its accuracy #
################################################################

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def lossigistic_loss(y, y_predicted):
    n = len(y)
    loss = - (1 / n) * np.sum(y * np.log(y_predicted) + (1 - y) * np.log(1 - y_predicted))
    return loss

def logistic_regression(X, y, lr=0.2, n_iters=100000):
    X = np.array(X)
    y = np.array(y)
    # X -- input data, shape (m, d), so each data point has d features
    n, d = X.shape
    # we try to learn d weights and 1 bias
    weights = np.zeros(d)
    bias = 0

    for _ in range(n_iters):
        linear_model = np.dot(X, weights) + bias
        y_predicted = sigmoid(linear_model)
        loss = lossigistic_loss(y, y_predicted)
        if _ % 10000 == 0:
            print(f'loss at iteration {_}: {loss}')
        # Compute gradients
        dw = (1 / n) * np.dot(X.T, (y_predicted - y))
        db = (1 / n) * np.sum(y_predicted - y)

        # Update weights and bias
        weights -= lr * dw
        bias -= lr * db
        
    return weights, bias

weights, bias = logistic_regression(X, y)

loss at iteration 0: 0.6931471805599454
loss at iteration 10000: 0.09522073987942631
loss at iteration 20000: 0.09227419897277017
loss at iteration 30000: 0.09164066737008737
loss at iteration 40000: 0.09145769412500761
loss at iteration 50000: 0.09139807238937943
loss at iteration 60000: 0.09137743061612798
loss at iteration 70000: 0.09137004220469953
loss at iteration 80000: 0.0913673464861133
loss at iteration 90000: 0.09136635174825711


In [52]:
def accuracy(y, y_predicted):
    y_pred_labels = [1 if prob >= 0.5 else 0 for prob in y_predicted]
    acc = np.sum(np.array(y_pred_labels).astype(int) == np.array(y).astype(int)) / len(y)
    return acc

print(f'Learned weights: {weights}, bias: {bias}')
print(f'Accuracy: {accuracy(y, sigmoid(np.dot(X, weights) + bias))}')

Learned weights: [-16.51144727  11.43987835], bias: 0.07709772601718659
Accuracy: 0.966


Let's visually asses our model. We can do this by using our estimates for $a,b,c$.

In [53]:
#################################################################
# TODO: Pass your estimates for a,b,c to the get_y_fun function #
#################################################################
lin_fun2 = get_y_fun(weights[0], weights[1], bias)

fig = px.scatter(x=X[:, 0], y=X[:, 1], color=list(map(str, y)))
x_range = [np.min(X[:, 0]), np.max(X[:, 1])]
fig.add_scatter(x=x_range, y=list(map(lin_fun, x_range)), name='ground truth border')
fig.add_scatter(x=x_range, y=list(map(lin_fun2, x_range)), name='estimated border')
fig.show()

Let's now complicate the things a little bit and make our next problem nonlinear.

In [55]:
# Parameters of the ellipse
s1 = 1.
s2 = 2.
r = 0.75
m1 = 0.15
m2 = 0.125

# 0/1 mapping, checks whether we are inside the ellipse
def circle_rule(x, y, noise=0.):
    return 1 if s1 * (x - m1) ** 2 + s2 * (y - m2) ** 2 + noise < r ** 2 else 0

In [56]:
# Training data

n = 500
range_points = 1

sigma = 0.1

X = range_points * 2 * (np.random.rand(n, 2) - 0.5)

y = [circle_rule(x, y, sigma * np.random.normal()) for x, y in X]

print(X[:10])
print(y[:10])

[[ 0.18633789  0.87560968]
 [-0.81999293  0.61838609]
 [ 0.22604784  0.28001611]
 [ 0.9846182  -0.35783437]
 [-0.27962406  0.07170775]
 [ 0.2501677  -0.37650776]
 [ 0.41264707 -0.8357508 ]
 [-0.61039043 -0.97349628]
 [ 0.49924022  0.89579621]
 [ 0.537422   -0.65425777]]
[0, 0, 1, 0, 1, 1, 0, 0, 0, 0]


Let's plot the data.

In [57]:
import plotly.graph_objects as go

fig = px.scatter(x=X[:, 0], y=X[:, 1], color=list(map(str, y)))

xgrid = np.arange(np.min(X[:, 0]), np.max(X[:, 0]), 0.003)
ygrid = np.arange(np.min(X[:, 1]), np.max(X[:, 1]), 0.003)
contour =  go.Contour(
        z=np.vectorize(circle_rule)(*np.meshgrid(xgrid, ygrid, indexing="ij")),
        x=xgrid,
        y=ygrid
    )
fig.add_trace(contour)
fig.show()

Now, let's train a logistic regression model to tackle this problem. Note that we now need a nonlinear decision boundary. You should obtain accuracy of at least 90%.

Hint:
<sub><sup><sub><sup><sub><sup>
Use feature engineering.
</sup></sub></sup></sub></sup></sub>

In [69]:
################################################################
# TODO: Implement logistic regression and compute its accuracy #
################################################################

def add_feat_engineering(X):
    X_features = np.array([[x[0], x[1], x[0]**2, x[1]**2, (x[0]+x[1])] for x in X])
    return X_features

def predict(weights, bias, X):
    X = add_feat_engineering(X)
    linear_model = np.dot(X, weights) + bias
    y_predicted = sigmoid(linear_model)
    y_predicted_labels = [1 if prob >= 0.5 else 0 for prob in y_predicted]
    return np.array(y_predicted_labels)

In [70]:
X_features = add_feat_engineering(X)
_w, _b = logistic_regression(X_features, y)
print(f'Learned weights: {_w}, bias: {_b}')
print(f'Accuracy: {accuracy(y, sigmoid(np.dot(X_features, _w) + _b))}')

loss at iteration 0: 0.6931471805599454
loss at iteration 10000: 0.11835589361472289
loss at iteration 20000: 0.10294436518883925
loss at iteration 30000: 0.09682188047297215
loss at iteration 40000: 0.09354047433298113
loss at iteration 50000: 0.09152446566395563
loss at iteration 60000: 0.09018361772211038
loss at iteration 70000: 0.08924414078783355
loss at iteration 80000: 0.0885611725536617
loss at iteration 90000: 0.08805085807688025
Learned weights: [ 5.14443338e-03  4.44327553e+00 -1.71656277e+01 -3.45951934e+01
  4.44841996e+00], bias: 8.928731546517305
Accuracy: 0.962


Let's visually asses our model.

Contrary to the previous scenario, converting our weights to parameters of the ground truth curve may not be straightforward. It's easier to just provide predictions for a set of points in $R^2$.

In [71]:
h = .02

xgrid = np.arange(np.min(X[:, 0]), np.max(X[:, 0]), h)
ygrid = np.arange(np.min(X[:, 1]), np.max(X[:, 1]), h)

xx, yy = np.meshgrid(xgrid, ygrid, indexing="ij")
X_plot = np.c_[xx.ravel(), yy.ravel()]

print(X_plot.shape)

_X = np.concatenate([X_plot, X_plot**2], axis=1)

preds = predict(_w, _b, _X)
print(preds.shape)


(10000, 2)
(10000,)


In [72]:
fig = px.scatter(x=X[:, 0], y=X[:, 1], color=list(map(str, y)))

xx, yy = np.meshgrid(xgrid, ygrid, indexing="ij")

contour = go.Contour(z=preds.reshape(len(xgrid), len(ygrid)), x=xgrid, y=ygrid)
fig.add_trace(contour)
fig.show()

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>