In [1]:
# Initialize Otter
import otter
grader = otter.Notebook("ps11.ipynb")

# Problem Set 11
## Logistic regression, automatic differentiation, and neural networks

In this problem set you will study binary classification using logistic regression, and implement neural networks

In [2]:
import sklearn
import sklearn.preprocessing
import sklearn.model_selection
import sklearn.linear_model
import numpy as np
import torch 
from torch import nn
from torch.nn import functional as F
from torch.utils.data import Dataset, DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
from tqdm import tqdm
import matplotlib.pyplot as plt
rng_seed = 507
torch.manual_seed(rng_seed)

<torch._C.Generator at 0x7f8211663eb0>

In [3]:
from sklearn import datasets
digits = datasets.load_digits()
X = digits.data
y = digits.target
n, p = X.shape
n, p

(1797, 64)

## Question 1: Binary classification

In this exercise you will use `sklearn` to build a binary classifier.

Logistic regression assumes the probability model 

$$\mathbb{P}(y_i=1\mid \mathbf{x}_i) = \sigma(\mathbf{x}_i^T\boldsymbol{\beta}),$$ 

where $\mathbf{x}_i$ are rows of the data matrix $\mathbf{X}\in\mathbb{R}^{n\times p}$, $\mathbf{y}\in\{0,1\}^n$ is a vector of binary responses, and

$$\sigma(z)=\frac{1}{1+\exp(-z)}$$ 

is a *sigmoid* function which maps  real numbers into the interval $[0,1]$. 

In simple terms, a linear regression formula can be converted into a logistic regression by applying the sigmoid function. Using scikit-learn module, it gets very simple to apply these algorithms and that is what you will explore in this exercise.

The classifier will take as input a $28\times 28$ grayscale MNIST image, and return `1` if the image represents the number 5, and `0` otherwise.

*Note*: various algorithms implemented in `sklearn` are randomized. To utilize the same randomness as we did when generating the solutions (and hence, to ensure that your output passes the test cases), use `random_state=1` wherever necessary when calling `sklearn` methods.

**1(a)** (1 pt) Using the `mnist` data loaded above, create a standardized version of `X` where each column has zero mean and variance one. (Hint: use the `sklearn.preprocessing` module.)

In [4]:
standard_scaler = sklearn.preprocessing.StandardScaler()
Xs = standard_scaler.fit_transform(X)

In [5]:
grader.check("1a")

**1(b)** (1 pt) Using the `mnist` data loaded above, create a vector `y5` which equals `1` if the the corresponding MINST image equals is of the number 5, and `0` otherwise.

In [6]:
y5 = (y == 5).astype(int)

In [7]:
grader.check("1b")

**1(c)**(1pt) Using `sklearn.model_selection.train_test_split`, divide the data into 70% training data and 30% test data. To ensure that your output matches our tests, pass the option `random_state=1` into the method.

In [13]:
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(Xs, y5, test_size=0.3, random_state=1)

In [14]:
len(y5==1)

1797

In [15]:
grader.check("1c")

**1(d)**(2pt) Use `sklearn.linear_model.LogisticRegression` to train a binary classifier on the *training data only*. 

In [16]:
clf = sklearn.linear_model.LogisticRegression()
clf.fit(X_train, y_train)

In [17]:
grader.check("1d")

<!-- BEGIN QUESTION -->

**1(e)**(1pt) How accurate is your trained classifier on `X_train`/`y_train`?  Use confusion matrix. 

Refer: 
* https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
* https://en.wikipedia.org/wiki/Confusion_matrix

Assign values to variables called TP, FP, FN, TN

In [25]:
from sklearn.metrics import confusion_matrix
y_test_pred = clf.predict(X_test)

In [26]:
TN, FP, FN, TP = confusion_matrix(y_test, y_test_pred).ravel()

In [28]:
confusion_matrix([0, 1, 0, 1], [1, 1, 1, 0])

array([[0, 2],
       [1, 1]])

In [29]:
grader.check("1e")

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**1(f)**(2pt) The regularization parameter can be varied by setting `LogisticRegression(C=C)` , where `C` is the value of the regularization penalty. What happens to the test error that you computed in the previous step as you vary `C`? Can you find a setting of `C` that results in lower test error than the default value `C=1`?

In general, a smaller C specifies stronger regularization. This means the model becomes more generalized but might also be more underfit. On the contrary, a larger C results in less regularization, which allows the model to fit more to the training data but it might also overfit. 

A larger C will result in lower train error, but the test error might be higher or lower. Only if we train and compare different C can we know whether there is a setting of C that results in lower test error than default. 

<!-- END QUESTION -->

## Question 2


**2(a)**(2pt) In this problem you will understand how to calculate derivatives using pytorch. 
**Note:** If you do not use pytorch tensors to solve these problems, hidden tests will fail.

Create a function called get_grad that receives a floating point value for 'x' and returns the gradient of the function at the given input value of 'x':

$$y = x^2 + 3x + 5$$



In [30]:
def get_grad(x):
    return 2 * x + 3

In [31]:
grader.check("2a")

**2(b)**(2pt) 
We now extend the previous concept and get partial derivatives when you have two variables applied to the below function
$$ùëì(ùë¢,ùë£)=ùë£ùë¢+ùë¢^3$$

Write a function 'get_grads' that takes in two floating point numbers corresponding to the two variables; u and v, and returns the gradients as a tuple with respect to the function.

In [34]:
def get_grads(u, v):
    return (v + 3 * u * u, u)

In [35]:
grader.check("2b")

**2(c)**(2pt) Extend the linear regression model from scratch shown in class by adding the bias 'b'
The linear function would be

$$y = w * x + b$$


In [41]:
X = torch.arange(-3, 3, 0.1).view(-1, 1)
f = 1 * X - 1
Y = f + 0.1 * torch.randn(X.size())
w = torch.tensor(-10.0, requires_grad = True)
b = torch.tensor(10.0, requires_grad = True)
lr = 0.1
loss_list = []

In [42]:
def criterion(yhat, y):
    return torch.mean((yhat - y) ** 2)

In [52]:
def forward(x):
    yhat = w * x + b
    return yhat

In [58]:
def train_model(epochs, X, Y, lr):
    
    optimizer = torch.optim.SGD([w, b], lr)
    
    for epoch in range (epochs):
        Yhat = forward(X)
        
        # calculate the loss per iteration
        loss = criterion(Yhat, Y)

        # store the loss at every iteration
        loss_list.append(loss.item())
        
        # backward pass: compute gradient 
        loss.backward()
        
        optimizer.step()
            
        w.grad.zero_()
        b.grad.zero_()

In [59]:
grader.check("2c")

**2(d)**(2pt)
In this problem, you will learn to use the deep learning framework PyTorch
We'll be using the Fashion MNIST dataset, which consists of 28x28 images that could be 10 different articles of clothing.


In [60]:
from torchvision import datasets as visiondata
training_data = visiondata.FashionMNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to data/FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 26421880/26421880 [00:01<00:00, 15102004.97it/s]


Extracting data/FashionMNIST/raw/train-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 29515/29515 [00:00<00:00, 266443.87it/s]


Extracting data/FashionMNIST/raw/train-labels-idx1-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4422102/4422102 [00:00<00:00, 5015793.00it/s]


Extracting data/FashionMNIST/raw/t10k-images-idx3-ubyte.gz to data/FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5148/5148 [00:00<00:00, 9748206.32it/s]

Extracting data/FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to data/FashionMNIST/raw






Run this cell to view a random sample from the training dataset.

In [61]:
labels_map = {
    0: "T-Shirt",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle Boot",
}
figure = plt.figure(figsize=(8, 8))
cols, rows = 3, 3

<Figure size 800x800 with 0 Axes>

Here are some helper functions used for this assignment.

In [62]:
def train_loop(model, transform_fn, loss_fn, optimizer, dataloader, num_epochs):
    tbar = tqdm(range(num_epochs))
    for _ in tbar:
        loss_total = 0.
        for i, (x, y) in enumerate(dataloader):
            x = transform_fn(x)
            pred = model(x)
            loss = loss_fn(pred, y.squeeze(-1))
            ## Parameter updates
            model.zero_grad()
            loss.backward()
            optimizer.step()

            loss_total += loss.item()
        tbar.set_description(f"Train loss: {loss_total/len(dataloader)}")
        
    return loss_total/len(dataloader)

In [63]:
def calculate_test_accuracy(model, transform_fn, test_dataloader):
    y_true = []
    y_pred = []
    tf = nn.Flatten()
    for (xi, yi) in test_dataloader:
        xi = transform_fn(xi)
        pred = model(xi)
        yi_pred = pred.argmax(-1)
        y_true.append(yi)
        y_pred.append(yi_pred)
    y_true = torch.cat(y_true, dim = 0)
    y_pred = torch.cat(y_pred, dim = 0)

    accuracy = (y_true == y_pred).float().mean()
    return accuracy

NN consists of an input layer, an activation function, and another output layer. Write a class called MultiClassNN that subclasses nn.Module. This module contains one attribute, net, which is an nn.Sequential object that is called on the .forward(x) method. Your task is to write the __init__() method to correctly construct net.

For example, if num_features=784, num_hidden=256, num_classes=10:

>>> mlp = MultiClassNN(28**2, 256, 10)
>>> mlp.net

Sequential(
  (0): Linear(in_features=784, out_features=256, bias=True)
  (1): Sigmoid()
  (2): Linear(in_features=256, out_features=10, bias=True)
  (3): LogSoftmax(dim=-1)
)

In [64]:
class MultiClassNN(nn.Module):
    def __init__(self, num_features, num_hidden, num_classes):
        """
        Arguments:
            num_features: The number of features in the input.
            num_hidden: Number of hidden features in the hidden layer:
            num_classes: Number of possible classes in the output
        """
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(num_features, num_hidden, bias=True),
            nn.Sigmoid(),
            nn.Linear(num_hidden, num_classes, bias=True),
            nn.LogSoftmax(dim = -1)
        )
        
    def forward(self, x):
        return self.net(x)

In [65]:
grader.check("2d")

**2(e)**(1pt) 
Construct a `DataLoader` object of the Fashion MNIST training dataset.

In [66]:
train_dataloader = DataLoader(training_data, batch_size = 128, shuffle = True, num_workers = 0)


<!-- BEGIN QUESTION -->

**2(f)** 
(3pt) Initialize a `MultiClassNN` object called `mlp` and train it using the `train_loop()` function given at the beginning of the assignment (do not modify the `train_loop()` function). We will test your  `mlp` object on unseen test data.

Hints:
-  You need to initialize a `torch.optim.Optimizer` object for gradient descent. The standard choice is `torch.optim.Adam` with a learning rate `1e-3`.
-  You need to flatten the Fashion MNIST dataset to use within the `MultiClassNN`. This should be done with the `transform_fn` argument to `train_loop`. Try `nn.Flatten()`.
-  The output of `MultiClassNN` are the log probabilities of each class. To test the accuracy of your model, you should use the negative log-likelihood loss, `nn.NLLLoss()`, as loss function.

In [75]:
mlp = MultiClassNN(28**2, 256, 10)
mlp_optimizer = torch.optim.Adam(mlp.parameters(), lr = 1e-3)
loss_total = nn.NLLLoss()
transform_fn = nn.Flatten()
train_loop(mlp, transform_fn, loss_total, mlp_optimizer, train_dataloader, 20)

Train loss: 0.21252329909661685: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 20/20 [03:46<00:00, 11.32s/it]


0.21252329909661685

<!-- END QUESTION -->



## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Upload this .zip file to Gradescope for grading.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)