In [2]:
import pandas as pd

# TODO: load the glass dataset into df
df = pd.read_csv("glass.csv")

In [3]:
# TODO: check number of rows and columns
df.shape

(214, 10)

In [4]:
# TODO: see column names
df.columns

Index(['RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'Type'], dtype='object')

In [5]:
# TODO: look at first few rows
df.head()


Unnamed: 0,RI,Na,Mg,Al,Si,K,Ca,Ba,Fe,Type
0,1.52101,13.64,4.49,1.1,71.78,0.06,8.75,0.0,0.0,1
1,1.51761,13.89,3.6,1.36,72.73,0.48,7.83,0.0,0.0,1
2,1.51618,13.53,3.55,1.54,72.99,0.39,7.78,0.0,0.0,1
3,1.51766,13.21,3.69,1.29,72.61,0.57,8.22,0.0,0.0,1
4,1.51742,13.27,3.62,1.24,73.08,0.55,8.07,0.0,0.0,1


## Step 1: Data Preprocessing & Binary Label Creation
- What this step does: Loads the glass data, checks its shape/columns, converts the target column (Type) into a binary 0/1 format (where Type == 1 is our positive class), and separates the features (X) from the labels (y).

- Why it is needed: Machine learning models require mathematical arrays to train. Since we are building a binary classifier, we must convert the multi class target into a simple binary outcome (1 or 0) so the math works.

In [6]:
# TODO: create binary labels
df["y"] = (df["Type"] ==1).astype(int)

# TODO: remove original Type column
df = df.drop(columns=["Type"])


In [7]:
# TODO: separate features and labels
X = df.drop(columns=["y"]).values
y = df["y"].values

## Step 2: Train-Test Split

- What this step does: Divides the dataset into two chunks: 80% for training the model and 20% kept hidden for testing.

- Why it is needed: If we evaluate the model on the same data it learned from, it's like giving a student the answer key before a test. Splitting the data ensures we can measure how well the model generalizes to completely new, unseen data.

In [9]:
from sklearn.model_selection import train_test_split
# TODO: split data
X_train, X_test, y_train, y_test = train_test_split(
 X, y, test_size=0.2, random_state=42
)


## Step 3: Feature Scaling

- What this step does: Standardizes the features so they all have a mean of 0 and a standard deviation of 1 using StandardScaler.

- Why it is needed: Different features have wildly different ranges. Without scaling, the weight updates would be dominated by the larger numbers, causing the Gradient Descent algorithm to struggle, bounce around, and converge very slowly.

In [10]:
from sklearn.preprocessing import StandardScaler
# TODO: scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## Step 4: Sigmoid and Probability Prediction
- What this step does: Takes the linear equation z = X * w + b and passes it through the sigmoid function 1 / (1 + e^{-z}).

- Why it is needed: The linear equation outputs raw scores that can range from -infinity to +infinity. The sigmoid function gracefully "squashes" these scores into a range strictly between 0 and 1, allowing us to interpret the output as a probability.

In [11]:
import numpy as np

def sigmoid(z):
# TODO: return sigmoid of z
  return 1 / (1 + np.exp(-z))


In [12]:
def predict_proba(X, w, b):
# TODO: compute z using weights and bias
  z = X @ w + b
# TODO: convert z to probability
  p = sigmoid(z)
  return p


## Step 5: Loss Function
- What this step does: Calculates the error between the model's predicted probabilities (p) and the actual true labels (y) using logarithmic math.

- Why it is needed: Unlike the basic perceptron that just checks right or wrong, Log Loss heavily penalizes the model if it is confidently wrong. This creates a smooth error landscape that Gradient Descent can easily navigate.

In [13]:
def loss(y, p):
# TODO: compute binary cross entropy
  return -np.mean(y*np.log(p) + (1-y)*np.log(1-p))


## Step 6: Weight Updates & Training Loop
- What this step does: Iteratively calculates the gradient (the direction of the steepest error) and updates the weights (w) and bias (b) by a fraction of the learning rate.

- Why it is needed: This is the actual learning engine. By repeating this process over multiple epochs, the model slowly adjusts its weights until the loss function is minimized.

In [14]:
def update_weights(X, y, w, b, lr):
# TODO: compute predictions
  p = predict_proba(X, w, b)
# TODO: compute error
  error = p - y
# TODO: update weights and bias
  w = w - lr * (X.T @ error) /len(y)
  b = b - lr * np.mean(error)
  return w, b


In [19]:
# TODO: initialize weights and bias
w = np.zeros(X_train.shape[1])
b =0.0
lr =0.1
epochs =100
for _ in range(epochs):
  w, b = update_weights(X_train, y_train, w, b, lr)


In [20]:
def predict_label(p, threshold=0.5):
  return (p >= threshold).astype(int)


- how this differs from perceptron
- why sigmoid matters
- what problem still remains unsolved

Unlike a standard perceptron that outputs a rigid, discrete binary step (0 or 1) and updates weights based on raw error, this logistic regression model outputs a continuous probability and updates weights using gradient descent on a smooth loss function. The sigmoid function matters because it enables this smooth, differentiable transition, mapping any linear output into a bounded [0, 1] probability space so the model can learn nuanced confidence levels rather than just making hard guesses. However, despite being a probabilistic upgrade, one major problem remains unsolved: logistic regression is still fundamentally a linear classifier. It can only draw a straight flat decision boundary and will completely fail on datasets that are not linearly separable (like the XOR problem we talked about earlier) unless we step up to a Multi-Layer Perceptron (Neural Network).