# Mini Workshop 1: Predicting Student Success

Welcome to your first neural network! In this notebook, you'll train a simple AI model to predict whether a student passes or fails based on how much they studied and slept.

Later, you'll explore how to expand this into predicting actual letter grades (A, B, C, D, F).

## Step 1: Import Libraries

In [None]:
# Import the libraries we need for this notebook
import torch  # PyTorch: a popular deep learning library
import torch.nn as nn  # nn: tools for building neural networks
import matplotlib.pyplot as plt  # For plotting graphs
import numpy as np  # For working with numbers and arrays

## Step 2: Define the Dataset (Pass/Fail)

Here we will make our own toy dataset that simulates data we have collected on students that have passed or failed, depending on the number of hours they studied, and the number of hours they slept the night before an exam.

In [None]:
# Our dataset: each row is [hours_studied, hours_slept]
X = torch.tensor([
    [1.0, 2.0], [3.0, 1.0], [5.0, 5.0], [6.0, 4.0],
    [8.0, 7.0], [2.0, 6.0], [7.0, 3.0]
])
# Labels: 0 = fail, 1 = pass
y = torch.tensor([
    [0.0], [0.0], [1.0], [1.0], [1.0], [0.0], [1.0]
])

## Step 3: Define the Model

This is a very simple model that will help us to make predictions. It only consists of two inputs that are being linearly combined and fed into a Sigmoid function to produce an output between 0 (0%) and 1 (100%) probability that a student will pass based on the input data.

In [None]:
# Build a simple neural network model
model = nn.Sequential(
    nn.Linear(2, 1),  # Linear layer: 2 inputs (study, sleep) -> 1 output
    nn.Sigmoid()      # Squash output to be between 0 and 1 (probability)
)

## Step 4: Loss and Optimization

Now that we’ve defined the model, we need to tell it how to **learn** from its mistakes. This is where the **loss function** and **optimizer** come in:

- **Loss Function (`BCELoss`)**:  
  The loss function measures how far off the model's prediction is from the true label.  
  - We're using **Binary Cross Entropy Loss**, which is commonly used for binary classification problems (like pass/fail or yes/no).
  - The closer the predicted value is to the true label, the lower the loss.

- **Optimizer (`Adam`)**:  
  The optimizer updates the model’s weights based on the computed loss.
  - We’re using the **Adam optimizer**, which is a popular and effective optimization algorithm.
  - The `lr` (learning rate) controls how big each update step is — too small and training is slow; too large and it may overshoot.

Together, the loss function tells the model how wrong it is, and the optimizer figures out how to adjust the model to do better next time.


In [None]:
# Set up how the model learns
loss_fn = nn.BCELoss()  # Binary Cross Entropy: good for yes/no problems
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)  # Adam: a way to update the model

## Step 5: Visualizing the Decision Boundary

After training a neural network, it's helpful to **see what it has learned**. One way to do this is by plotting a **decision boundary** — a line (or surface) that separates different prediction regions in the input space.

This function creates a visualization of that boundary:

- It builds a grid of (x, y) points covering the input space (in our case: hours studied vs. hours slept).
- It asks the model to make predictions at each point on the grid.
- It then colors the regions based on the model’s output — showing where the model predicts **pass (1)** or **fail (0)**.
- Finally, it overlays the original data points so we can see how well the boundary matches reality.

This is a powerful way to build intuition about what the neural network is doing under the hood!

In [None]:
def plot_decision_boundary(model, X, y):
    # This function draws the decision boundary of the model
    x_min, x_max = 0, 10
    y_min, y_max = 0, 10
    # Create a grid of points to cover the input space
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
                         np.linspace(y_min, y_max, 100))
    grid = torch.tensor(np.c_[xx.ravel(), yy.ravel()], dtype=torch.float32)
    with torch.no_grad():
        preds = model(grid).reshape(xx.shape)  # Model predictions for each point
    plt.contourf(xx, yy, preds, levels=50, cmap='RdBu', alpha=0.6)  # Color regions
    plt.scatter(X[:,0], X[:,1], c=y.squeeze(), cmap='RdBu', edgecolor='k')  # Plot data
    plt.xlabel('Hours Studied')
    plt.ylabel('Hours Slept')
    plt.title('Decision Boundary')
    plt.show()

## Step 6: Train the Model and Watch Decision Boundary Evolve

Now it's time to **train the model and watch it learn**.

This loop runs the full training process step-by-step, and also shows us how the decision boundary evolves:

- Every few steps, we **plot the decision boundary** to see how the model is changing.
- We also **track the loss**, which tells us how far off the model's predictions are from the correct labels.
- On each step:
  - The model makes predictions (`y_pred`)
  - We calculate how wrong those predictions are (`loss`)
  - We compute gradients and update the model weights to improve

As training progresses:
- The boundary becomes more accurate and aligns better with the data.
- The loss values should go down — a sign that the model is improving.

This is a great way to build **intuition** about how training works — not just as numbers, but visually.

In [None]:
# Train the model and watch how it learns
losses = []  # Keep track of loss at each step
for epoch in range(101):  # Do 101 training steps
    if epoch < 10 and epoch % 2 == 0:
        plot_decision_boundary(model, X, y)  # Show boundary early in training
    y_pred = model(X)  # Model's predictions for our data
    loss = loss_fn(y_pred, y)  # How wrong is the model?
    losses.append(loss.item())  # Save the loss
    optimizer.zero_grad()  # Clear previous gradients
    loss.backward()  # Compute new gradients
    optimizer.step()  # Update the model
    if epoch % 2 == 0:
        print(f'Epoch {epoch}, Loss: {loss.item()}')  # Print progress
plot_decision_boundary(model, X, y)  # Final boundary

#### Plot the loss

We can also use loss to visualize our model's performance during training

In [None]:
# Plot how the loss changed during training
plt.plot(losses)
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss over Epochs')
plt.show()

## Step 7: Try It Yourself

Create your own test example! Use values for `hours_studied` and `hours_slept` to see if the model thinks the student will pass.

In [None]:
# Try your own input!
your_input = torch.tensor([[3.0, 3.0]])  # Change these numbers to test different students
print("Predicted pass probability:", model(your_input).item())

## Bonus: Predicting Letter Grades (A–F)

Let's take it a step further. What if we want to predict the actual letter grade a student will get? This turns our binary classification problem into a **multiclass classification** problem.

### Mini Challenge!

Can you fill in the input and output dimensions for `model_multi` below? Consider the data that will go in, and the number of classes we are trying to predict in the output

In [None]:
# Multiclass dataset: [hours_studied, hours_slept]
X_multi = torch.tensor([
    [1.0, 2.0], [3.0, 1.0], [5.0, 5.0], [6.0, 4.0], [8.0, 7.0], [7.0, 3.0],
    [2.0, 3.0], [4.0, 2.0], [6.0, 2.0], [8.0, 2.0], [3.0, 7.0], [5.0, 8.0], [7.0, 8.0],
    [9.0, 5.0], [1.0, 8.0], [4.0, 6.0], [6.0, 8.0], [8.0, 4.0]
])
# Labels: 0=F, 1=D, 2=C, 3=B, 4=A
y_multi = torch.tensor([
    0, 1, 2, 3, 4, 3,
    0, 1, 2, 3, 1, 2, 4,
    4, 0, 2, 4, 3
])

# Build a model for multiclass (grades)
model_multi = nn.Sequential(
    ### YOUR CODE HERE ###
    # NOTE: no Sigmoid() layer will be used here, because softmax will be applied by the loss function
)


When building a neural network for **multiclass classification** (predicting more than two possible classes, like letter grades A–F), we use a different loss function than for binary (yes/no) problems.

- **BCELoss (Binary Cross Entropy Loss)** is designed for binary classification, where each example belongs to one of two classes (e.g., pass/fail). It expects the model to output a single probability (between 0 and 1) for each example, and uses a Sigmoid activation to squash the output.

- **CrossEntropyLoss** is designed for multiclass classification, where each example belongs to one of several classes (e.g., grades 0–4). It expects the model to output a vector of raw scores (called logits), one for each class. CrossEntropyLoss internally applies the Softmax function to these logits to turn them into probabilities, and then compares them to the true class label.

In [None]:
# loss and optimizer
loss_fn_multi = nn.CrossEntropyLoss()  # Good for multiclass problems
optimizer_multi = torch.optim.Adam(model_multi.parameters(), lr=0.1)

In [None]:
# Train the multiclass model
for epoch in range(100):
    y_pred_multi = model_multi(X_multi)  # Model's predictions
    loss = loss_fn_multi(y_pred_multi, y_multi)  # How wrong is the model?
    optimizer_multi.zero_grad()  # Clear previous gradients
    loss.backward()  # Compute new gradients
    optimizer_multi.step()  # Update the model
    if epoch % 10 == 0:
        print(f"Epoch {epoch}: Loss = {loss.item():.4f}")  # Print progress

### Try Your Own Grade Example

In [None]:
# Try your own grade example
my_student = torch.tensor([[5.0, 2.0]])  # Change these numbers to test different students
predicted_grade = torch.argmax(model_multi(my_student), dim=1)  # Pick the grade with highest predicted probability
print("Predicted grade (0=F to 4=A):", predicted_grade.item())

### Visualizing the Multiclass Decision Boundary
Let's plot the decision boundary for the multiclass model to see how it separates the different letter grades.

In [None]:
def plot_multiclass_decision_boundary(model, X, y, num_classes=5):
    # Plot the decision boundary for multiclass model
    x_min, x_max = 0, 10
    y_min, y_max = 0, 10
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    grid = torch.tensor(np.c_[xx.ravel(), yy.ravel()], dtype=torch.float32)
    with torch.no_grad():
        preds = model(grid)
        preds = torch.argmax(preds, dim=1).reshape(xx.shape)
    plt.figure(figsize=(7, 5))
    plt.contourf(xx, yy, preds, levels=np.arange(num_classes+1)-0.5, cmap='Spectral', alpha=0.6)
    scatter = plt.scatter(X[:,0], X[:,1], c=y, cmap='Spectral', edgecolor='k', s=80)
    plt.xlabel('Hours Studied')
    plt.ylabel('Hours Slept')
    plt.title('Multiclass Decision Boundary (Grades)')
    plt.colorbar(scatter, ticks=range(num_classes), label='Grade (0=F, 4=A)')
    plt.show()

def plot_multiclass_decision_boundary_continuous(model, X, y, num_classes=5):
    # Plot a smooth, continuous decision boundary for multiclass model
    x_min, x_max = 0, 10
    y_min, y_max = 0, 10
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    grid = torch.tensor(np.c_[xx.ravel(), yy.ravel()], dtype=torch.float32)
    with torch.no_grad():
        logits = model(grid)
        probs = torch.softmax(logits, dim=1)  # Convert to probabilities
        color_vals = (probs * torch.arange(num_classes, dtype=torch.float32)).sum(dim=1).reshape(xx.shape)  # Weighted sum for smooth color
        preds = torch.argmax(probs, dim=1).reshape(xx.shape)
    plt.figure(figsize=(7, 5))
    cont = plt.contourf(xx, yy, color_vals, levels=np.linspace(-0.5, 4.5, 101), cmap='Spectral', alpha=0.9)
    scatter = plt.scatter(X[:,0], X[:,1], c=y, cmap='Spectral', edgecolor='k', s=80)
    plt.xlabel('Hours Studied')
    plt.ylabel('Hours Slept')
    plt.title('Multiclass Decision Boundary (Continuous Colors)')
    cbar = plt.colorbar(cont, ticks=[0, 1, 2, 3, 4], boundaries=np.linspace(-0.5, 4.5, 101))
    cbar.ax.set_yticklabels(['F', 'D', 'C', 'B', 'A'])  # Show letter grades
    cbar.set_label('Predicted Grade')
    plt.show()



In [None]:
# Plot both types of decision boundaries
plot_multiclass_decision_boundary(model_multi, X_multi, y_multi)
plot_multiclass_decision_boundary_continuous(model_multi, X_multi, y_multi)

In [None]:
# prompt: disconnect the runtime

from google.colab import runtime
runtime.unassign()