# Logistic regression exercises

We will use a toy dataset from `sklearn` to illustrate fitting a logistic regression classifier.

**1. Create a toy dataset using the `make_blobs` function from `sklearn` (see [here](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html)). The dataset should have 150 samples from 2 clusters (centers), using a cluster standard deviation of 3.0. Save the input data points and labels into numpy arrays `X` and `y`, and print their shapes.**

**Create a scatter plot of the dataset, colouring the points according to their cluster label.**

In [None]:
from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=150, centers=2, cluster_std=3)

In [None]:
X.shape, y.shape

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(5, 3))
plt.scatter(X[:, 0], X[:, 1], c=y, s=50, cmap='spring')

**2. Create training and test splits with a 75/25 ratio. Make scatter plots showing the training and test splits separately, again colouring the points according to their label.**

In [None]:
from sklearn.model_selection import train_test_split

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.25)

In [None]:
import numpy as np

fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(10, 3))
ax1.set_title("Training set")
ax1.scatter(Xtrain[:, 0], Xtrain[:, 1], c=ytrain, s=50, cmap='spring')

ax2.set_title("Test set")
ax2.scatter(Xtest[:, 0], Xtest[:, 1], c=ytest, s=50, cmap='spring')

plt.show()

**3. Use the `LogisticRegression` class from `sklearn` (see [here](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html), it is similar to the `LinearRegression` class) to train a logistic regression classifier on your toy dataset.**

**Print the coefficients and intercept from your learned model.**

In [None]:
from sklearn.linear_model import LogisticRegression

blobs_model = LogisticRegression()
blobs_model.fit(Xtrain, ytrain)

In [None]:
blobs_model.coef_, blobs_model.intercept_

**4. Display the training and test data points (separately), including the decision boundary from your logistic regression classifier. The decision boundary is defined as $\{\mathbf{x}: f_\theta(\mathbf{x}) = 0.5\}$.**

In [None]:
import numpy as np

w = blobs_model.coef_[0]
b = blobs_model.intercept_[0]

x1 = np.linspace(X[:, 0].min(), X[:, 0].max(), 100)
x2 = (-b - w[0] * x1) / w[1]

fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(10, 3))
ax1.set_title("Training set")
ax1.scatter(Xtrain[:, 0], Xtrain[:, 1], c=ytrain, s=50, cmap='jet')
ax1.plot(x1, x2, c='b', linewidth=3)

ax2.set_title("Test set")
ax2.scatter(Xtest[:, 0], Xtest[:, 1], c=ytest, s=50, cmap='spring')
ax2.plot(x1, x2, c='b', linewidth=3)

plt.show()

**5. Calculate the accuracy of your logistic regression classifier on the training and test datasets.**

In [None]:
preds_train = blobs_model.predict(Xtrain)
preds_test = blobs_model.predict(Xtest)

In [None]:
preds_train.shape, preds_test.shape

In [None]:
print(f"Accuracy on training set: {(ytrain == preds_train).mean():.4f}")
print(f"Accuracy on test set: {(ytest == preds_test).mean():.4f}")