In [None]:
import torch

device = 'cuda' if torch.cuda.is_available() else 'cpu'

1. Make a binary classification dataset with Scikit-Learn's ```make_moons()``` function.
- For consistency, the dataset should have 1000 sampes and 42 as random state.
- Turn the data into PyTorch tensors. Split the data into training and test sets using ```train_test_split``` with 80% training and 20% testing.

In [None]:
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import torch

x_samples, y_samples = make_moons(n_samples=1000, random_state=42)

x_samples, y_samples = torch.tensor(x_samples, dtype=torch.float), torch.tensor(y_samples, dtype=torch.float)
x_train, x_test, y_train,  y_test = train_test_split(x_samples, y_samples, train_size=0.8, random_state=42)

2. Build a model by subclassing ```nn.Module``` that incorporates non-linear activation functions and is capable of fitting the data you created in 1.
- Feel free to use any combination of PyTorch layers (linear and non-linear) you want.

In [None]:
import torch.nn as nn

class ccNet(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.linear_1 = nn.Linear(2, 10)
        self.linear_2 = nn.Linear(10, 10)
        self.linear_3 = nn.Linear(10, 1)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.linear_1(x)
        x = self.relu(x)
        x = self.linear_2(x)
        x = self.relu(x)
        x = self.linear_3(x)

        return x

model = ccNet().to(device)
model

3. Setup a binary classification compatible loss function and optimizer to use when training the model

In [None]:
loss_fn = nn.BCEWithLogitsLoss()
optimizer = torch.optim.SGD(params=model.parameters(), lr=0.1)

4. Create a training and testing loop to fit the model you created in 2 to the data you created in 1.
- To measure model accuracy, you can create your own function or use the accuracy function in TorchMetrics.
- Train the model for long enough for it to reach over 96% accuracy.
- The training loop should output progress every 10 epochs of the model's training and test set loss and accuracy.

In [None]:
import torchmetrics

torch.manual_seed(42)
epochs = 1000
acc_fn = torchmetrics.Accuracy().to(device)

x_train, y_train = x_train.to(device), y_train.to(device)
x_test, y_test = x_test.to(device), y_test.to(device)

for epoch in range(epochs):
    ## Training
    model.train()

    train_logits = model(x_train).squeeze()
    train_preds = torch.round(torch.sigmoid(train_logits))

    train_loss = loss_fn(train_logits, y_train)
    train_acc = acc_fn(y_train, train_preds.int())

    optimizer.zero_grad()

    train_loss.backward()

    optimizer.step()

    model.eval()
    with torch.inference_mode():

        test_logits = model(x_test).squeeze()
        test_preds = torch.round(torch.sigmoid(test_logits))

        test_loss = loss_fn(test_logits, y_test)
        test_acc = acc_fn(y_test, test_preds.int())

    if epoch % 100 == 0:
            print(f"Epoch: {epoch} | Loss: {train_loss:.5f}, Accuracy: {train_acc:.2f}% | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%")
        

5. Make predictions with your trained model and plot them using the ```plot_decision_boundary()``` fucntion created in this notebook

In [None]:
import requests
from pathlib import Path 

# Download helper functions from Learn PyTorch repo (if not already downloaded)
if Path("helper_functions.py").is_file():
  print("helper_functions.py already exists, skipping download")
else:
  print("Downloading helper_functions.py")
  request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
  with open("helper_functions.py", "wb") as f:
    f.write(request.content)

from helper_functions import plot_predictions, plot_decision_boundary

In [None]:
import matplotlib.pyplot as plt

# Plot decision boundaries for traininig and test sets
plt.figure(figsize =(10, 5))
plt.subplot(1,2,1)
plt.title('Train')
plot_decision_boundary(model, x_train, y_train)
plt.subplot(1,2,2)
plt.title('Test')
plot_decision_boundary(model, x_test, y_test)

6. Replicate the Tanh (hyperbolic tangent) activation frunction in pure PyTorch

In [None]:
import numpy as np

def tanh(z):
    return (np.exp(z) - np.exp(-z)) / (np.exp(z) + np.exp(-z))

7. Create a multi-class dataset using the spirals data creation function from CS231n
- Construct a model capable of fitting the data (you may need a combination of linear and non-linear layers).
- Build a loss function and optimizer capable of handling multi-class data (optional: use Adam optimizer instead of SGD)
- Make a training and testing loop for the multi-class data and train a model on it to reach over 95% testing accuracy .
- Plot the decision boundaries on the spirals dataset from your model predictions.

In [None]:
# Code for creating a spiral dataset from CS231n
import numpy as np
import matplotlib.pyplot as plt

N = 100 # number of points per class
D = 2 # dimensionality
K = 3 # number of classes

X = np.zeros((N*K, D)) # data matrix (each row = single example)
y = np.zeros(N*K, dtype='uint8') # class labels
for j in range(K):
    ix = range(N*j, N*(j+1))
    r = np.linspace(0.0, 1, N) # radius
    t = np.linspace(j*4, (j+1)*4, N) + np.random.randn(N) * 0.2 # theta
    X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
    y[ix] = j

# lets visualize the data
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)
plt.show()

In [None]:
from sklearn.model_selection import train_test_split

X, y = torch.tensor(X, dtype=torch.float).to(device), torch.tensor(y, dtype=torch.long).to(device)

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, random_state=42)

In [None]:
import torch.nn as nn

class ccNet(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.linear_1 = nn.Linear(2, 10)
        self.linear_2 = nn.Linear(10, 10)
        self.linear_3 = nn.Linear(10, 3)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.linear_1(x)
        x = self.relu(x)
        x = self.linear_2(x)
        x = self.relu(x)
        x = self.linear_3(x)

        return x

model = ccNet().to(device)
model

In [None]:
import torchmetrics

loss_fn = nn.CrossEntropyLoss()
acc_fn = torchmetrics.Accuracy().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

In [None]:
torch.manual_seed(42)

epochs = 100

for epoch in range(epochs):
    model.train()

    train_logits = model(X_train)
    train_preds = torch.argmax(train_logits, dim=1)

    train_loss = loss_fn(train_logits, y_train)
    train_acc = acc_fn(y_train, train_preds)

    optimizer.zero_grad()
    train_loss.backward()
    optimizer.step()

    model.eval()
    with torch.inference_mode():
        test_logits = model(X_test).squeeze()
        test_preds = torch.argmax(test_logits, dim=1)

        test_loss = loss_fn(test_logits, y_test)
        test_acc = acc_fn(y_test, test_preds)

        if epoch % 10 == 0:
            print(f'Epoch: {epoch} | Train Loss: {train_loss} | Train Acc: {train_acc:.2f} | Tests Loss: {test_loss}, | Test Acc: {test_acc:.2f}')
   

In [None]:
from helper_functions import plot_decision_boundary

In [None]:
plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.title('Train')
plot_decision_boundary(model, X_train, y_train)
plt.subplot(1,2,2)
plt.title('Test')
plot_decision_boundary(model, X_test, y_test)

1. Write down 3 problems where you think machine classification could be useful

    1. Medical diagnosis: Machine classification could be used to accurately diagnose diseases or conditions based on a patient's symptoms, test results, and other relevant information. This could potentially improve the accuracy and speed of diagnoses, and allow doctors to focus on more complex and nuanced aspects of patient care.
    2. Spam filtering: Machine classification could be used to automatically identify and filter out spam emails, based on patterns and characteristics that are typical of spam messages. This could help individuals and organizations reduce the amount of time and effort they spend sorting through and deleting unwanted emails.
    3. Sentiment analysis: Machine classification could be used to analyze text data, such as social media posts or online reviews, in order to determine the sentiment expressed in the text. For example, a classifier could be trained to identify whether a tweet or review is positive, negative, or neutral, based on the words and phrases used. This could be useful for companies looking to gauge public opinion about their products or services, or for researchers studying trends in public sentiment.

2. Research the concept of "momentum" in gradient-based optimizers (like SGD or Adam), what does it mean?


In gradient-based optimization algorithms, momentum is a term used to describe the accumulation of past gradients, which can help the optimizer make faster and more stable progress towards a minimum of a loss function.

In the case of stochastic gradient descent (SGD), momentum can be incorporated by adding a fraction of the previous update to the current update. This can help the optimizer escape from local minima or saddle points, and can also help the optimizer converge faster by allowing it to take larger steps in the direction of the minimum.

In the Adam optimization algorithm, momentum is implemented in a similar way, but it is called the "exponential moving average of the gradient" and is denoted by the symbol v. The moving average is calculated as a weighted average of the past gradients, with the weighting decaying exponentially over time. The Adam optimizer also uses another parameter, called the "exponential moving average of the squared gradient," which is denoted by the symbol s. Together, v and s are used to adaptively adjust the learning rate for each parameter, based on the historical gradient and curvature information.

Overall, the use of momentum in gradient-based optimization algorithms can help the optimizer make more stable and efficient progress towards a minimum of the loss function, especially in cases where the loss function has a complex or noisy landscape.