# 🤖 Binary Classification with a Neural Network

In this exercise, we’ll simulate a real-world scenario:

> **Can we predict whether a student will pass a course based on the time spent studying?**

We’ll build a neural network to tackle this problem using just two inputs:

- `Lecture Hours`: how much time a student spent attending lectures  
- `Project Hours`: how much time was invested in project work

---

### What you’ll learn:

✅ How to generate synthetic data  
✅ How to train a neural network for binary classification  
✅ What **binary cross-entropy** means and why we use it  
✅ How to **visualize the decision boundary** of a model  
✅ And most importantly... how neural networks "think" 👀

Ready? Let’s dive in! 🚀


## Imports

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras import models, layers, Input

## 📊 Generate a Sample Dataset

In this section, we simulate a simple dataset representing students preparing for an exam.  
Each student is characterized by:

- The number of **lecture hours** attended.
- The number of **project hours** completed.

We assume that both features influence the probability of passing the course.  
The pass/fail outcome is then generated using a **logistic model**, which is commonly used for binary classification tasks.


In [None]:
# Set seed
np.random.seed(42)

# Number of students
n_students = 200

# Input features
lecture_hours = np.random.normal(30, 5, n_students)
project_hours = np.random.normal(20, 3, n_students)

# Logistic model for pass probability
z = (
    0.65 * lecture_hours +
    0.35 * project_hours -
    26.5
)
prob_pass = 1 / (1 + np.exp(-z))

# Simulate pass/fail outcome
pass_course = np.random.binomial(1, prob_pass)

# Create dataframe
df = pd.DataFrame({
    'lecture_hours': lecture_hours.round(2),
    'project_hours': project_hours.round(2),
    'pass_course': pass_course
})

# Preview
df.head()

In [None]:
df['pass_course'].value_counts()

### Dataset Scatter Plot

In [None]:
# Colors: red for 1 (PASS), blu for 0 (FAIL)
colors = ['blue' if label == 0 else 'red' for label in df['pass_course']]

plt.figure(figsize=(8, 6))
plt.scatter(df['lecture_hours'], df['project_hours'], c=colors, edgecolor='k')
plt.xlabel("Lecture Hours")
plt.ylabel("Project Hours")
plt.title("Student Dataset")
plt.grid(True)

# Legenda manuale
import matplotlib.patches as mpatches
legend_labels = [mpatches.Patch(color='red', label='Pass (1)'),
                 mpatches.Patch(color='blue', label='Fail (0)')]
plt.legend(handles=legend_labels)

plt.show()



## 🧪 Train-Test Split and Feature Standardization

To evaluate our model properly, we divide the dataset into two parts:

- **Training set**: used to train the model.
- **Test set**: used to assess how well the model generalizes to unseen data.

We also **standardize the features** to ensure that both variables (lecture hours and project hours) are on the same scale.  
This is important because many machine learning algorithms, including logistic regression, are sensitive to the magnitude of input values.


In [None]:
# Split features and target
X = df.drop(columns='pass_course')
y = df['pass_course']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## 🧠 Build the Neural Network

Now we define a simple **Multilayer Perceptron (MLP)** to predict whether a student will pass the course.  
This kind of neural network is made up of **fully connected layers** and is well-suited for tabular data like ours.

---

### 📐 Network Architecture

Our model includes:

- **Input layer**: receives the two features (`lecture_hours` and `project_hours`).
- **First hidden layer**: 16 neurons with ReLU activation.
- **Second hidden layer**: 8 neurons, also with ReLU.
- **Output layer**: 1 neuron with **sigmoid** activation, which outputs a probability between 0 and 1 (suitable for binary classification).

---

> 🔍 **Why ReLU?**  
> ReLU (Rectified Linear Unit) is a commonly used activation function that helps the network learn non-linear patterns efficiently.

> 🎯 **Why sigmoid at the output?**  
> It compresses the output to a value between 0 and 1 — ideal when predicting probabilities in binary classification problems.


In [None]:
# Build the MLP model
model = models.Sequential([
    Input(shape=(X_train.shape[1],)),
    layers.Dense(16, activation='relu'),
    layers.Dense(8, activation='relu'),
    layers.Dense(1, activation='sigmoid')  # Binary classification
])

## 📉 Binary Cross Entropy

To train our binary classifier, we use **Binary Cross Entropy (BCE)** as the loss function.

---

### 🧮 What is Binary Cross Entropy?

Binary Cross Entropy measures the difference between the predicted probabilities and the actual class labels (0 or 1).  
It is defined as:

$
\text{BCE} = -\left[y \cdot \log(\hat{y}) + (1 - y) \cdot \log(1 - \hat{y})\right]
$

Where:
- $ y $ is the true label (0 or 1)
- $ \hat{y} $ is the predicted probability (between 0 and 1)

---

> 📌 **Intuition**:  
> BCE penalizes predictions that are far from the true label.  
> For example, if the true label is 1 and the model predicts 0.01, the loss will be large.  
> If it predicts 0.99, the loss will be small — meaning the model is doing well.

> 🔁 The goal of training is to minimize this loss over all training examples.



In [None]:
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


## 🏋️ Train the Model

Now it’s time to train our neural network!

---

### ⚙️ Model Compilation

Before training, we **compile the model** by specifying:

- **Loss function**: we use `Binary Crossentropy`, which is suitable for binary classification tasks.
- **Optimizer**: `Adam` is a widely used optimizer that adapts the learning rate during training.
- **Metrics**: we track `accuracy` to measure how often the model correctly predicts the outcome.

---

### 🚀 Fit the Model

We train the model using the `.fit()` method, providing:

- **Training data** (`X_train_scaled`, `y_train`)
- **Validation data** (`X_test_scaled`, `y_test`) — useful to monitor generalization during training
- **Epochs**: the number of full passes through the training data
- **Batch size**: how many samples to process before updating weights

> 📌 Training will produce a history object containing accuracy and loss values for both training and validation sets — very helpful for diagnosing underfitting or overfitting.


In [None]:
# Train the model
history = model.fit(X_train_scaled, y_train, epochs=50, validation_data=(X_test_scaled, y_test), verbose=1)

# Evaluate on test data
loss, accuracy = model.evaluate(X_test_scaled, y_test, verbose=0)
print(f"Test accuracy: {accuracy:.2f}")

In [None]:
# Plot training & validation accuracy and loss
plt.figure(figsize=(12, 5))

# Accuracy
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train Acc')
plt.title('Model Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
# plt.legend()

# Loss
plt.subplot(1, 2, 2)
plt.plot(history.history['loss'], label='Train Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
# plt.legend()

plt.tight_layout()
plt.show()


## 🧠 How does the Neural Network make decisions?

Once the model is trained, we can **visualize how it separates the two classes** — students predicted to pass vs those predicted to fail.

The plot below shows:

- The input space (`Lecture Hours` vs `Project Hours`)
- A **colored background** based on the predicted outcome
- The **decision boundary** (black line) where the model is exactly 50% confident

> 🧭 Try reading this plot like a map:  
> Each dot is a student. Where they fall in the space determines the model's prediction.

Can you spot the students that fall on the “wrong” side? 😉


In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Grid per le coordinate
x_min, x_max = df['lecture_hours'].min() - 1, df['lecture_hours'].max() + 1
y_min, y_max = df['project_hours'].min() - 1, df['project_hours'].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),
                     np.linspace(y_min, y_max, 300))

# Creazione DataFrame e scalatura
grid_points = pd.DataFrame({
    'lecture_hours': xx.ravel(),
    'project_hours': yy.ravel()
})
grid_scaled = scaler.transform(grid_points)

# Predizione del modello
Z = model.predict(grid_scaled).reshape(xx.shape)

# Plot
plt.figure(figsize=(8, 6))

# Colori di sfondo per le classi predette
plt.contourf(xx, yy, Z > 0.5, alpha=0.4, cmap=plt.cm.RdBu)

# Linea di contorno per probabilità = 0.5
contours = plt.contour(xx, yy, Z, levels=[0.5], colors='black', linewidths=2)
plt.clabel(contours, fmt={0.5: 'Decision Boundary'}, inline=True, fontsize=10)

# Dati reali
plt.scatter(df['lecture_hours'], df['project_hours'],
            c=df['pass_course'], cmap=plt.cm.RdBu, edgecolor='k')

plt.xlabel("Lecture Hours")
plt.ylabel("Project Hours")
plt.title("Neural Network Decision Boundary")
plt.grid(True)
plt.show()


In [None]:
# Function to test new student data (clean version)
def predict_outcome(lecture_hours, project_hours):
    input_df = pd.DataFrame([{
        'lecture_hours': lecture_hours,
        'project_hours': project_hours
    }])

    input_scaled = scaler.transform(input_df)
    prob = model.predict(input_scaled).flatten()[0]

    print(f"\nPredicted probability of passing: {prob:.2f}")
    if prob >= 0.5:
        print("Prediction: PASS")
    else:
        print("Prediction: FAIL")


In [None]:
LECTURE_HOURS = 18
PROJECT_HOURS = 10

# Try it out!
predict_outcome(LECTURE_HOURS, PROJECT_HOURS)