## Exercise 3. GLMs, Logistic Regression

1. Convince yourself, that, if we encode the binary outcome as $Y \in {\pm 1}$, the (conditional) log-likelihood under the logistic regression model simplifies to:

$L(w) := \log P(Y \mid X, w) = \sum_i \log \sigma(y_i \Phi(x_i)^T w)$

2. Convince yourself, that $\sigma'(x) = \sigma(x) (1 - \sigma(x))$ for the logistic function. What is $\nabla_w L(w)$?
3. Fit a binary logistic regression model to the problem of distinguishing 4s vs 7s in the MNIST dataset.
    - See e.g.: [https://docs.rapids.ai/api/cuml/nightly/api.html#logistic-regression](https://docs.rapids.ai/api/cuml/nightly/api.html#logistic-regression)


In [1]:
import numpy as np
import matplotlib.pyplot as plt
import sklearn
from sklearn.datasets import fetch_openml

import cupy as cp
import cuml
from cuml.linear_model import LogisticRegression
from cuml.model_selection import train_test_split

# Download the data.
X, y = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False) 

In [None]:
# Move the data onto the device.
# Cupy implements the array interface that cuML understands, among other libraries. 
X_d = cp.asarray(X, dtype=cp.float32)
y_d = cp.asarray(y, dtype=cp.float32)

# Scale features to [0,1]
X_d /= 255.0

# Set up the binary classification problem 4 vs 7.
class_0 = 4
class_1 = 7
idx_ = cp.logical_or((y_d == class_0), (y_d == class_1))
X_ = X_d[idx_, :]
y_ = y_d[idx_]
y_[y_ == class_0] = 0.0
y_[y_ == class_1] = 1.0

# Prepare the train and test data.
X_train, X_test, y_train, y_test = train_test_split(X_, y_, random_state=77)

# 