## Implement a binary classifier with `torch.nn.Linear`

In [None]:
import torch as pt
from torch import nn
import matplotlib.pyplot as plt
%matplotlib inline
pt.manual_seed(42);

## Start by practicing boolean operations on boolean tensors. Don't worry, there is a point to this exercise :)

In [None]:
a = pt.randn(10)
a

## Create a boolean tensor named `b` by converting the `a` tensor such that `b` holds `True` for positive values of `a` and `False` for negative and zero values of `a`
* **hint:** `b` must have the same shape as `a`

In [None]:
b = a > 0
b

## Create another boolean tensor named `c` with values that are the opposite of `b`
* **hint:** you can use a different logical expression with the values of the `a` tensor or use the `~` operator

In [None]:
c = ~b
c

## Create 2 tensors that are the logical `and` as well as `or` of `b` and `c`. For each of the tensors, count the number of the `True` values in the tensor
* **hint:** in Python `&` is the logical `and` while `|` is the logical `or`

In [None]:
(b & c).sum(), (b | c).sum()

## Create a tensor named `d` by concatenating `b` and `c` using the `pt.cat` method that takes a list of tensors

In [None]:
d = pt.cat([b, c])
d

## Reshape the `d` tensor to be 3 dimensional with the shape of `[5, 2, 2]` and save the result to tensor `e`

In [None]:
e = d.reshape(5, 2, 2)
e

# Create a tensor `f` that contains the sum of the number of the `True` values along the last (trailing) dimension of the `e` tensor while keeping the original dimensions
* **hint:** check out the `keepdims` parameter

In [None]:
f = e.sum(dim = 2, keepdims = True)
f

## Use the `squeeze` method on the tensor with the sum of the `True` values and confirm that its shape changed to 2 dimensions instead of 3

In [None]:
f.shape, f.squeeze().shape

The `squeeze` method is useful when you need to reduce the dimension of a tensor that has one or more dimensions of length `1`. For example, if you have a tensor with a shape [3, 1, 4], the 2nd dimension can be `squeeze`d to `[3,4]`

## Next, get started on generating data for your spam/not-spam classification problem

In [None]:
NUM = 50

#X spam data points
Xs = pt.normal(0, 2, [NUM, 2]) - 3
plt.scatter(Xs[:, 0], Xs[:, 1], color = 'orange');

#X not spam data points
Xns = pt.normal(0, 3, [NUM, 2]) + 3
plt.scatter(Xns[:, 0], Xns[:, 1], color = 'blue');

plt.xlim([-10, 10])
plt.ylim([-10, 10]);

## Create a tensor array `X` with spam and not spam data values having the shape `[100, 2]`

In [None]:
X = pt.cat([Xs, Xns])
X.shape

## Create a `y` tensor with positive/negative values for the spam/not spam data in the `X` tensor. Let's have `1` be spam, and `-1` not spam.

In [None]:
ys = pt.ones([len(Xs)])
yns = -1 * pt.ones([len(Xns)])
y = pt.cat([ys, yns])
y

## Create a model using `nn.Linear`. Disable the `bias` term in the model.

In [None]:
model = nn.Linear(2, 1, bias = False)

## Implement the `forward` method for the model. Don't forget to check the shape of your predictions!

In [None]:
def forward(X):
  return model(X).squeeze()

y_pred = forward(X)
y_pred

## Implement the `loss` method to return the mean squared error of your predictions

In [None]:
def loss(y_pred, y):
  return ((y_pred - y) ** 2).mean()
  
loss(y_pred, y)

## Implement a `metric` method that takes the model predictions and the actual values and returns the accuracy (i.e. percentage correct) for the predictions.

In [None]:
def metric(y_pred, y):
  return ((y > 0) & (y_pred > 0) | ( (y <= 0) & (y_pred <= 0) )).sum() / float(len(y))

metric(y_pred, y)

## Implement a `for` loop that does 10 iterations of gradient descent, printing out the MSE and the accurary for each iteration
* **hint:** don't forget to use `zero_grad` function with your model
* **hint:** update the weights using the `weight.data` attribute of your model

In [None]:
LEARNING_RATE = 0.03

for _ in range(10):
  y_pred = forward(X)

  mse = loss(y_pred, y)
  accuracy = metric(y_pred, y)

  print("Loss: ", mse.item(), " Accuracy: ", accuracy.item())

  model.zero_grad()
  mse.backward()

  model.weight.data -= LEARNING_RATE * model.weight.grad

## Re-render the original scatter plot with spam/not spam data points and add the decision boundary line on the plot.

* **hint:** to pass tensors to `plt.plot` you need to convert them to `numpy` arrays using `.detach().numpy()`

In [None]:
plt.scatter(Xs[:, 0], Xs[:, 1], color = 'orange');
plt.scatter(Xns[:, 0], Xns[:, 1], color = 'blue');

xs = pt.linspace(-8, 8, 100).detach().numpy()
ys = model.weight.data[0,0].item() * xs + model.weight.data[0, 1].item()
plt.plot(xs, ys, color = 'black', linewidth = 5)

plt.ylim([-10, 10])
plt.xlim([-10, 10])

model.weight.data -= LEARNING_RATE * model.weight.grad

In [None]:
model.weight

## Compare the weights discovered by gradient descent to the weights according to the analytical solution for the problem
* you need the formula $ (X^T X)^{-1}X^Ty $
* **hint:** use `@` for tensor multiplication

In [None]:
model.weight.data,  (X.T @ X).inverse()  @ X.T @ y

Copyright 2021 CounterFactual.AI LLC. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.