## Part C: Model Training & Evaluation 

In [6]:
part_c = '''
Part C: Model Training & Evaluation

• Manual Train-Test Split:
  - Shuffle the dataset and split it into training and testing sets (e.g., 80/20).
  - Use NumPy operations only; no external libraries for splitting.

• Training the Model:
  - Run Gradient Descent on the training data.
  - Learn optimal weights and bias parameters.

• Predictions:
  - Use the trained model to make predictions on the test set.
  - Convert predicted probabilities into class labels using a threshold (default 0.5).

• Evaluation Metrics:
  - Manually compute Accuracy, Precision, and Recall.
  - Explain why Recall is especially important in cancer detection scenarios.
'''


### Requirements from previous files

In [7]:
# ============================================================
# Part C: Model Training & Evaluation
# Setup & Required Components (Self-Contained)
# ============================================================

# -------------------------
# Imports
# -------------------------
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


# -------------------------
# Load Dataset
# -------------------------
df = pd.read_csv("Wisconsin.csv")


# -------------------------
# Separate Features & Target
# -------------------------
X = df.drop(columns=['target'])
y = df['target'].values


# -------------------------
# Feature Scaling (Min-Max)
# -------------------------
X_min = X.min()
X_max = X.max()
X_scaled = (X - X_min) / (X_max - X_min)
X_scaled = X_scaled.values  # convert to NumPy array


# -------------------------
# Logistic Regression Helpers (from scratch)
# -------------------------

def sigmoid(z):
    """Sigmoid activation function"""
    return 1 / (1 + np.exp(-z))


def predict_proba(X, w, b):
    """Hypothesis function to predict probabilities"""
    z = np.dot(X, w) + b
    return sigmoid(z)


def compute_cost(y_true, y_pred):
    """Binary Cross Entropy cost function"""
    m = y_true.shape[0]
    epsilon = 1e-15
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon)
    
    cost = -(1 / m) * np.sum(
        y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)
    )
    return cost


def compute_gradients(X, y, y_pred):
    """Compute gradients for weights and bias"""
    m = X.shape[0]
    dw = (1 / m) * np.dot(X.T, (y_pred - y))
    db = (1 / m) * np.sum(y_pred - y)
    return dw, db


# -------------------------
# Setup Summary
# -------------------------
keep_me_in_loop = '''
• Loaded the Breast Cancer dataset for model training and evaluation.
• Separated features and target variable.
• Applied Min-Max feature scaling manually.
• Reimplemented all logistic regression components from scratch.
• Prepared a fully self-contained setup for Part C.
'''


### Manual Train-Test Split 

In [8]:
# ============================================================
# Manual Train-Test Split (NumPy Only)
# ============================================================

# Set split ratio
train_ratio = 0.8

# Number of samples
m = X_scaled.shape[0]

# Generate shuffled indices
indices = np.random.permutation(m)

# Compute split index
train_size = int(train_ratio * m)

# Split indices
train_indices = indices[:train_size]
test_indices = indices[train_size:]

# Create training and testing sets
X_train = X_scaled[train_indices]
y_train = y[train_indices]

X_test = X_scaled[test_indices]
y_test = y[test_indices]


In [9]:
keep_me_in_loop = '''
Here is what we did step by step:

1. We decided how much data to use for training and testing.
   - We chose 80% for training and 20% for testing using train_ratio = 0.8.

2. We found the total number of data samples.
   - m = X_scaled.shape[0] gives the number of rows (data points).

3. We created a list of numbers from 0 to m−1 and shuffled them randomly.
   - This is done using np.random.permutation(m).
   - Shuffling helps avoid any order bias in the data.

4. We calculated how many samples should go into the training set.
   - train_size = int(train_ratio * m).

5. We split the shuffled indices into two parts:
   - The first part is used for training.
   - The remaining part is used for testing.

6. Using these indices, we created the actual training data:
   - X_train contains the input features for training.
   - y_train contains the corresponding output labels.

7. Similarly, we created the testing data:
   - X_test contains the input features for testing.
   - y_test contains the corresponding output labels.

8. This method ensures:
   - Data is randomly split.
   - Training and testing sets do not overlap.
   - The model is evaluated fairly on unseen data.
'''


### Training the Model using Gradient Descent

In [10]:
# ============================================================
# Training the Logistic Regression Model (Training Set Only)
# ============================================================

# Initialize parameters
m, n = X_train.shape
w = np.zeros(n)
b = 0.0

learning_rate = 0.01
iterations = 1000

cost_history = []

# Gradient Descent loop
for i in range(iterations):
    
    # Forward propagation
    y_pred = predict_proba(X_train, w, b)
    
    # Compute cost
    cost = compute_cost(y_train, y_pred)
    cost_history.append(cost)
    
    # Compute gradients
    dw, db = compute_gradients(X_train, y_train, y_pred)
    
    # Update parameters
    w -= learning_rate * dw
    b -= learning_rate * db


In [11]:
keep_me_in_loop = '''
Here is what happens during model training step by step:

1. We first get the shape of the training data.
   - m is the number of training samples (rows).
   - n is the number of features (columns).

2. We initialize the model parameters:
   - w (weights) are set to zeros for all features.
   - b (bias) is set to 0.
   - This means the model starts with no prior knowledge.

3. We choose training settings:
   - learning_rate controls how big each update step is.
   - iterations decides how many times we repeat training.

4. We create an empty list called cost_history.
   - This will store the loss value at every iteration.
   - It helps us see whether the model is learning.

5. We start the Gradient Descent loop.
   - The loop runs for the given number of iterations.

6. Forward propagation:
   - Using the current values of w and b, we predict probabilities
     for the training data using predict_proba().
   - This gives y_pred (model predictions).

7. Cost calculation:
   - We compute how wrong the predictions are using compute_cost().
   - The cost value is stored in cost_history.

8. Gradient calculation:
   - We calculate how much w and b should change
     using compute_gradients().
   - dw is the gradient for weights.
   - db is the gradient for bias.

9. Parameter update:
   - We update w by moving it in the opposite direction of dw.
   - We update b by moving it in the opposite direction of db.
   - The learning rate controls how large these updates are.

10. After many iterations:
    - The cost should gradually decrease.
    - The model learns the best values of w and b
      that minimize the loss on the training data.
'''


### Predictions on the Test Set

In [12]:
# ============================================================
# Predictions on Test Set
# ============================================================

# Predict probabilities on test data
y_test_proba = predict_proba(X_test, w, b)

# Convert probabilities to class labels using threshold 0.5
y_test_pred = (y_test_proba >= 0.5).astype(int)


In [13]:
keep_me_in_loop = '''
Here is what we did step by step for predictions on the test set:

1. We use the trained model parameters (w and b).
   - These values were learned during training using gradient descent.

2. We pass the test input data (X_test) into the model.
   - predict_proba() calculates the probability of each sample
     belonging to the positive class (usually class 1).

3. The output y_test_proba contains values between 0 and 1.
   - Each value represents how confident the model is
     that the sample belongs to class 1.

4. We then convert probabilities into actual class labels.
   - If probability ≥ 0.5, we predict class 1.
   - If probability < 0.5, we predict class 0.

5. The comparison (y_test_proba >= 0.5) gives True or False.
   - We convert True to 1 and False to 0 using astype(int).

6. The final output y_test_pred contains only 0s and 1s.
   - These are the predicted class labels for the test data.

7. These predictions can now be used to:
   - Calculate accuracy
   - Build a confusion matrix
   - Evaluate how well the model performs on unseen data
'''


### Evaluation Metrics – Accuracy, Precision, Recall (manual)

In [14]:
# ============================================================
# Confusion Matrix Components
# ============================================================

TP = np.sum((y_test == 1) & (y_test_pred == 1))
TN = np.sum((y_test == 0) & (y_test_pred == 0))
FP = np.sum((y_test == 0) & (y_test_pred == 1))
FN = np.sum((y_test == 1) & (y_test_pred == 0))

TP, TN, FP, FN


(np.int64(35), np.int64(69), np.int64(0), np.int64(10))

In [15]:
# ============================================================
# Evaluation Metrics
# ============================================================

accuracy = (TP + TN) / (TP + TN + FP + FN)

precision = TP / (TP + FP) if (TP + FP) != 0 else 0

recall = TP / (TP + FN) if (TP + FN) != 0 else 0

accuracy, precision, recall


(np.float64(0.9122807017543859),
 np.float64(1.0),
 np.float64(0.7777777777777778))

In [16]:
keep_me_in_loop = '''
Here is what we did step by step to evaluate the model:

1. First, we compare the true labels (y_test) with the predicted labels (y_test_pred).
   - This helps us understand how many predictions are correct or wrong.

2. We calculate the four parts of the confusion matrix:

   a) True Positive (TP):
      - The model predicted 1 and the actual value is also 1.
      - This means the model correctly identified a positive case.

   b) True Negative (TN):
      - The model predicted 0 and the actual value is also 0.
      - This means the model correctly identified a negative case.

   c) False Positive (FP):
      - The model predicted 1 but the actual value is 0.
      - This means the model gave a false alarm.

   d) False Negative (FN):
      - The model predicted 0 but the actual value is 1.
      - This means the model missed a positive case.

3. We then calculate Accuracy:
   - Accuracy tells us how many total predictions were correct.
   - Formula: (TP + TN) / (TP + TN + FP + FN)

4. Next, we calculate Precision:
   - Precision tells us how many predicted positives were actually correct.
   - Formula: TP / (TP + FP)
   - We check if (TP + FP) is not zero to avoid division by zero.

5. Then, we calculate Recall:
   - Recall tells us how many actual positives were correctly found.
   - Formula: TP / (TP + FN)
   - We again check if (TP + FN) is not zero.

6. Finally, we print Accuracy, Precision, and Recall.
   - These values help us understand how well the model performs on test data.
'''
