<a href="https://www.kaggle.com/code/sacrum/ml-labs-03-logistic-regression-from-scratch?scriptVersionId=178243455" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# 1. Logistic Regression

In this section we will implement basic functions that are used for training a logistic regression model



## Deriving the Logistic Loss Function

In logistic regression, we aim to model the probability of a binary outcome (y) given a set of features (x). The logistic function (g(z)) transforms the linear combination of weights (w) and features (x) into a probability between 0 and 1. 

Here's how we derive the logistic loss function:

**1. Noting the desired outcome:**

The desired outcome is to have the predicted probability (ŷ) closely match the actual label (y). If y is 1, we want ŷ to be close to 1. Conversely, if y is 0, we want ŷ to be close to 0.

**2. Choosing a loss function:**

A common choice for measuring the difference between the predicted and actual values in logistic regression is the **log loss** (also called binary cross-entropy). It penalizes the model for both underestimating and overestimating the true probability.

**3. Formulating the loss function:**

The log loss for a single data point is:

```
L(y, ŷ) = - (y * log(ŷ) + (1 - y) * log(1 - ŷ))
```

where:

* **L(y, ŷ):** Loss for a single data point
* **y:** True label (0 or 1)
* **ŷ:** Predicted probability

**4. Explanation:**

* When y = 1 and ŷ is close to 1, the first term (-y * log(ŷ)) dominates, resulting in a small loss.
* When y = 0 and ŷ is close to 0, the second term (-(1 - y) * log(1 - ŷ)) dominates, again resulting in a small loss.
* Conversely, if the model underestimates or overestimates the probability, the corresponding term in the loss function becomes larger, penalizing the model.

**5. Average loss for multiple data points:**

To evaluate the model's performance on the entire dataset, we calculate the average loss over all data points:

```
Loss = - (1/N) * sum(L(y_i, ŷ_i)) for i in N data points
```

This average loss is used during the training process to optimize the model's weights (w) using gradient descent algorithms. By minimizing the average loss, the model learns to predict probabilities that better match the true labels.

## Deriving the Update Rule for Logistic Regression

The update rule for logistic regression uses the **gradient descent algorithm** to adjust the model's weights (w) based on the calculated loss function. Here's the derivation:

**1. Gradient of the loss function:**

The update rule involves taking the negative gradient of the loss function with respect to each weight (w_j). This indicates the direction in which we should adjust the weights to minimize the loss.

For the logistic loss function, the gradient for a single data point is:

```
∇_w L(y, ŷ) = (ŷ - y) * φ_j(x)
```

where:

* **∇_w:** Gradient with respect to weight vector w
* **φ_j(x):** j-th feature in the feature vector ϕ(x)

**2. Update rule using gradient descent:**

Following the principle of gradient descent, we update each weight by subtracting the learning rate (η) multiplied by the gradient:

```
w_j_new = w_j_old - η * ∇_w L(y, ŷ)
```

Substituting the gradient expression:

```
w_j_new = w_j_old - η * (ŷ - y) * φ_j(x)
```

**3. Update for all data points:**

To update the weights based on the entire dataset, we perform the update rule for each data point and average the updates:

```
w_j_new = w_j_old - η * (1/N) * sum((ŷ_i - y_i) * φ_j(x_i)) for i in N data points
```

This update rule iteratively adjusts the weights based on the difference between the predicted and actual values, leading the model towards minimizing the overall loss and improving its performance in predicting probabilities for future data.

## Dataset

| Feature | Individual 1 | Individual 2 | Individual 3 | Individual 4 | Individual 5 | Individual 6 |
|---|---|---|---|---|---|---|
| Income (in Lacs) | 4.5 | 8.5 | 6 | 3 | 10 | 5.5 |
| Age | 34 | 42 | 29 | 24 | 50 | 36 |  
| Debt (in Lacs) | 10 | 15 | 20 | 5 | 30 |8 |
| Experience (Years) | 5 | 10 | 4 | 2 | 20 | 6 |
| Approval | No | Yes | No | No | Yes | No |

## Gradient Descent

These are the steps to gradient descent update rule for logistic regression:

1. **Initialize weights**: Start with some initial values for the weights.

2. **Set learning rate**: Choose a small number (e.g., 0.01) as the learning rate. This determines how much the weights are updated in each iteration.

3. **Set number of iterations**: Decide how many times you want to update the weights (e.g., 1000 times).

4. **Set batch size**: Define the number of data points you want to use in each batch update (e.g., 32).

5. **For each epoch** (complete pass through the data):
   - For each batch of data (of size batch_size):
     - Compute the gradient of the loss function with respect to each weight.
     - Update each weight by subtracting the product of the learning rate, the gradient, and the inverse of the batch size.

6. Repeat the above step for the specified number of epochs.

## Applying Gradient Descent

W = [0, 0, 0, 0, 0, 0]

ŷ = 𝑔(𝒘𝑇 𝜑(𝒙))

L(y, ŷ) = - (y * log(ŷ) + (1 - y) * log(1 - ŷ))

Loss = - (1/N) * sum(L(y_i, ŷ_i)) for i in N data points

∇_w L(y, ŷ) = (ŷ - y) * φ_j(x)

w_j_new = w_j_old - η * ∇_w L(y, ŷ)

### Apply Batch GD

> When batch size is equal to the number of samples in data, gradient descent become batch gradient descent

Epochs = 5
	  
Batch Size = 6

Learning Rate = 0.001

- Epoch 1
    - Iteration 1
        - ŷ = [0.5 0.5 0.5 0.5 0.5 0.5]
        - loss = 0.693
        - gradient = [ 0.042  2.583 -0.167 -1.083]
        - w = [-0.    -0.003  0.     0.001]
- Epoch 2
    - Iteration 1
        - ŷ = [0.48  0.476 0.483 0.485 0.474 0.479]
        - loss = 0.686
        - gradient = [-0.094  1.819 -0.486 -1.263]
        - w = [ 0.    -0.004  0.001  0.002]
- Epoch 3
    - Iteration 1
        - ŷ = [0.467 0.462 0.474 0.476 0.462 0.465]
        - loss = 0.681
        - gradient = [-0.17   1.383 -0.66  -1.361]
        - w = [ 0.    -0.006  0.001  0.004]
- Epoch 4
    - Iteration 1
        - ŷ = [0.459 0.454 0.469 0.469 0.457 0.457]
        - loss = 0.677
        - gradient = [-0.213  1.133 -0.754 -1.413]
        - w = [ 0.    -0.007  0.002  0.005]
- Epoch 5
    - Iteration 1
        - ŷ = [0.453 0.449 0.466 0.464 0.456 0.45 ]
        - loss = 0.673
        - gradient = [-0.237  0.988 -0.802 -1.439]
        - w = [ 0.001 -0.008  0.003  0.007]

### Apply Stochastic GD

> When batch size is equal to the 1, gradient descent become Stochastic gradient descent

Epochs = 2
	  
Batch Size = 1

Learning Rate = 0.001

- Epoch 1
    - Iteration 1
        - ŷ = [0.5]
        - loss = 0.693
        - gradient = [ 2.25 17.    5.    2.5 ]
        - w = [-0.002 -0.017 -0.005 -0.002]
    - Iteration 2
        - ŷ = [0.303]
        - loss = 1.194
        - gradient = [ -5.925 -29.275 -10.455  -6.97 ]
        - w = [0.004 0.012 0.005 0.004]
    - Iteration 3
        - ŷ = [0.624]
        - loss = 0.977
        - gradient = [ 3.742 18.085 12.473  2.495]
        - w = [-0.    -0.006 -0.007  0.002]
    - Iteration 4
        - ŷ = [0.457]
        - loss = 0.611
        - gradient = [ 1.372 10.978  2.287  0.915]
        - w = [-0.001 -0.017 -0.009  0.001]
    - Iteration 5
        - ŷ = [0.248]
        - loss = 1.396
        - gradient = [ -7.525 -37.623 -22.574 -15.049]
        - w = [0.006 0.021 0.013 0.016]
    - Iteration 6
        - ŷ = [0.728]
        - loss = 1.303
        - gradient = [ 4.006 26.221  5.827  4.37 ]
        - w = [ 0.002 -0.005  0.007  0.012]
- Epoch 2
    - Iteration 1
        - ŷ = [0.49]
        - loss = 0.673
        - gradient = [ 2.204 16.654  4.898  2.449]
        - w = [-0.    -0.022  0.003  0.009]
    - Iteration 2
        - ŷ = [0.311]
        - loss = 1.168
        - gradient = [ -5.857 -28.94  -10.336  -6.891]
        - w = [0.006 0.007 0.013 0.016]
    - Iteration 3
        - ŷ = [0.636]
        - loss = 1.01
        - gradient = [ 3.814 18.436 12.714  2.543]
        - w = [ 0.002 -0.012  0.     0.014]
    - Iteration 4
        - ŷ = [0.44]
        - loss = 0.579
        - gradient = [ 1.319 10.549  2.198  0.879]
        - w = [ 0.001 -0.022 -0.002  0.013]
    - Iteration 5
        - ŷ = [0.288]
        - loss = 1.244
        - gradient = [ -7.118 -35.589 -21.353 -14.236]
        - w = [0.008 0.014 0.019 0.027]
    - Iteration 6
        - ŷ = [0.7]
        - loss = 1.202
        - gradient = [ 3.847 25.184  5.596  4.197]
        - w = [ 0.004 -0.012  0.014  0.023]

## Evaluation

Accuracy = Number of correct predictions / Total number of predictions

For Batch GD
```
preds = [0.449 0.446 0.465 0.46  0.457 0.446]
y = [0 1 0 0 1 0]
accuracy = 0.67
```

For Stochastic GD
```
preds = [0.468 0.494 0.513 0.461 0.58  0.462]
y = [0 1 0 0 1 0]
accuracy = 0.67
```

| Gradient Descent Method | Accuracy | Epochs | Batch Size | Learning Rate |
|--------------------------|----------|--------|------------|---------------|
| Batch GD                 | 0.67     | 5      | 6          | 0.001         |
| Stochastic GD            | 0.67     | 2      | 1          | 0.001         |


Both Batch Gradient Descent (Batch GD) and Stochastic Gradient Descent (Stochastic GD) achieved the same accuracy of 0.67. However, they differ in their training approach:

1. **Batch GD**:
   - Uses a batch size of 6, meaning it processes 6 examples at a time before updating the weights.
   - Runs for 5 epochs, meaning it goes through the entire dataset 5 times.
   - Generally more stable but may take longer to converge due to processing the entire dataset before updating weights.

2. **Stochastic GD**:
   - Uses a batch size of 1, meaning it updates the weights after processing each example.
   - Runs for 2 epochs, meaning it goes through the entire dataset twice.
   - Can be faster to update weights but may be more noisy and less stable due to the frequent updates.

In this specific case, both methods achieved the same accuracy, but Batch GD took longer to train due to processing larger batches and more epochs. Stochastic GD trained faster but may be more sensitive to noise in the data.

## 2. Implementation in Code

### Setting Up

In [1]:
import numpy as np

In [2]:
# this function prints the vector with its shape

def print_vector(vector, vector_name):
	print(f""">> {vector_name}.shape\n{vector.shape}\n\n>> {vector_name}\n{vector}\n""")

In [3]:
# Initialize Dummy Data

n_samples = 10
n_inp_dims = 4
n_out_dims = 2

weight = np.random.rand(n_inp_dims, n_out_dims)
feature_vector = np.random.rand(n_samples, n_inp_dims)
y_true = np.random.randint(0, 2, size=(n_samples, n_out_dims))

In [4]:
print_vector(weight, "weight")
print_vector(feature_vector, "feature_vector")
print_vector(y_true, "y_true")

>> weight.shape
(4, 2)

>> weight
[[0.08462148 0.08141992]
 [0.45315634 0.04563395]
 [0.15407874 0.61881182]
 [0.11324862 0.0707232 ]]

>> feature_vector.shape
(10, 4)

>> feature_vector
[[0.4881384  0.09957008 0.56177337 0.44587574]
 [0.3978054  0.14561364 0.12273675 0.77941388]
 [0.13857465 0.68819973 0.32443448 0.49721472]
 [0.63147704 0.47253981 0.71493472 0.09589861]
 [0.16562338 0.80759302 0.31699851 0.51826292]
 [0.92587542 0.87732898 0.84075179 0.63889094]
 [0.79657433 0.30944903 0.18860998 0.3099062 ]
 [0.8262513  0.12186659 0.58946221 0.18716427]
 [0.64662448 0.17446101 0.50208342 0.23448922]
 [0.84886928 0.88548916 0.52341297 0.69545429]]

>> y_true.shape
(10, 2)

>> y_true
[[0 0]
 [0 1]
 [0 1]
 [1 0]
 [0 1]
 [1 0]
 [0 0]
 [0 0]
 [0 1]
 [0 1]]



### Logistic regression single prediction
Create a function that takes two vectors as input (weight and feature vector) and returns the logistic regression prediction for that input.

In [5]:
def logistic_regression_prediction(weight, feature_vector):
    z = np.dot(feature_vector, weight)
    return 1 / (1 + np.exp(-z))

In [6]:
single_predictions = logistic_regression_prediction(weight, feature_vector[0])

In [7]:
print_vector(single_predictions, "single_predictions")

>> single_predictions.shape
(2,)

>> single_predictions
[0.55563862 0.6043094 ]



### Logistic regression vector prediction
Create a function that takes a matrix and a vector as input (weight vector and feature matrix) and returns the logistic regression prediction vector for the whole training set.

In [8]:
def logistic_regression_vector_prediction(weight_vector, feature_matrix):
    z = np.dot(feature_matrix, weight_vector)
    return 1 / (1 + np.exp(-z))

In [9]:
predictions = logistic_regression_vector_prediction(weight, feature_vector)

In [10]:
print_vector(predictions, "predictions")

>> predictions.shape
(10, 2)

>> predictions
[[0.55563862 0.6043094 ]
 [0.55152329 0.5424247 ]
 [0.6058464  0.56920701]
 [0.59594264 0.62765527]
 [0.61952121 0.57031938]
 [0.6632501  0.66393335]
 [0.56753311 0.55418884]
 [0.55901438 0.61083836]
 [0.5591449  0.59577428]
 [0.65305694 0.61836214]]



### Logistic Loss
Now create a function that takes a vector of predictions and a vector of actual values as input and returns the Logistic Loss.

In [11]:
def logistic_loss(predictions, actual_values):
    epsilon = 1e-15
    return np.mean(
        - (actual_values * np.log(predictions + epsilon)
        + (1 - actual_values) * np.log(1 - predictions + epsilon))
    )
    # return -np.sum(actual_values * np.log(predictions))

In [12]:
loss = logistic_loss(predictions, y_true)

In [13]:
print_vector(loss, "loss")

>> loss.shape
()

>> loss
0.7732834723198788



### Logistic Gradient
Now create a function that takes a vector of predictions and a vector of actual values as input and returns the Gradient of Logistic Loss.

In [14]:
def logistic_gradient(feature_matrix, predictions, actual_values):
    n = len(predictions)
    return np.dot(feature_matrix.T, predictions - actual_values) / n

In [15]:
grad = logistic_gradient(feature_vector, predictions, y_true)

In [16]:
print_vector(grad, "grad")

>> grad.shape
(4, 2)

>> grad
[[ 0.19401336  0.13539837]
 [ 0.14860966  0.00666195]
 [ 0.12598983  0.10761695]
 [ 0.19118205 -0.01138301]]



### Gradient Descent Algorithm
This function performs batch gradient descent to optimize a linear regression model.

In [17]:

"""
Batch gradient descent: batch_size=len(X)
Stochastic gradient descent: batch_size=1
Mini-batch gradient descent: batch_size=32
"""

def gradient_descent(
		X_train, y_train,
		X_test=None, y_test=None,
		batch_size=32, epochs=10, learning_rate=0.01):

	n, m = X_train.shape
	o = y_train.shape[-1]

	# initialize random weights
	weights = np.random.rand(m, o)

	for epoch in range(epochs):

		# TRAINING
		train_loss = 0
		for iteration in range(0, n, batch_size):

			batch_start = iteration
			batch_end = iteration + batch_size

			x_batch = X_train[batch_start:batch_end]
			y_batch = y_train[batch_start:batch_end]

			predictions = logistic_regression_vector_prediction(weights, x_batch)

			gradient = logistic_gradient(x_batch, predictions, y_batch)
			weights -= learning_rate * gradient
			
			batch_loss = logistic_loss(predictions, y_batch)
			train_loss += batch_loss

		# TESTING
		test_loss = None
		if X_test is not None and y_test is not None:
			predictions = logistic_regression_vector_prediction(weights, X_test)
			test_loss = logistic_loss(predictions, y_test)

		print(f"epoch {epoch+1}/{epochs} | Train Loss {train_loss} | Test Loss {test_loss}")

	return weights

In [18]:
weights = gradient_descent(feature_vector, y_true, learning_rate=0.5, batch_size=len(y_true))

epoch 1/10 | Train Loss 1.039921165423216 | Test Loss None
epoch 2/10 | Train Loss 0.9578626259001194 | Test Loss None
epoch 3/10 | Train Loss 0.8898419841054629 | Test Loss None
epoch 4/10 | Train Loss 0.834478803597446 | Test Loss None
epoch 5/10 | Train Loss 0.7900042757312599 | Test Loss None
epoch 6/10 | Train Loss 0.75454341208784 | Test Loss None
epoch 7/10 | Train Loss 0.7263325811142264 | Test Loss None
epoch 8/10 | Train Loss 0.7038382545361648 | Test Loss None
epoch 9/10 | Train Loss 0.6857936552926904 | Test Loss None
epoch 10/10 | Train Loss 0.6711863321124855 | Test Loss None


# 3. MNIST Dataset
In this section we will apply Logistic Regression functions implemented from scracth above, on MNIST Dataset to predict handwritten digits

In [19]:
from sklearn.datasets import fetch_openml
import pandas as pd

# Load the MNIST dataset
mnist = fetch_openml('mnist_784', version=1, cache=True, parser='auto')

# Pandas data frame with feature vectors
X = mnist.data

# Scale pixel values
X = X / 255.

# Labels
y = mnist.target

# Labels converted to integers
y = y.astype(int)

# one hot encoding
y = pd.get_dummies(y).astype(int)

# train test split
X_train = X.iloc[:50_000]
X_test = X.iloc[50_000:]
y_train = y.iloc[:50_000]
y_test = y.iloc[50_000:]

# show shapes
print("X_train.shape:", X_train.shape)
print("y_train.shape:", y_train.shape)
print("X_test.shape:", X_test.shape)
print("y_test.shape:", y_test.shape)

X_train.shape: (50000, 784)
y_train.shape: (50000, 10)
X_test.shape: (20000, 784)
y_test.shape: (20000, 10)


### Applying Gradient Descents

In [20]:
batch_gd_weights = gradient_descent(
	X_train, y_train,
	X_train, y_train,
	batch_size=len(X_train),
	epochs=10,
	learning_rate=0.8,
)

epoch 1/10 | Train Loss 29.92379987013802 | Test Loss 21.43426526031631
epoch 2/10 | Train Loss 21.43426526031631 | Test Loss 2.7671093594338876
epoch 3/10 | Train Loss 2.7671093594338876 | Test Loss 1.1337505894001068
epoch 4/10 | Train Loss 1.1337505894001068 | Test Loss 0.9966317265374365
epoch 5/10 | Train Loss 0.9966317265374365 | Test Loss 0.877089545073934
epoch 6/10 | Train Loss 0.877089545073934 | Test Loss 0.7739913087844714
epoch 7/10 | Train Loss 0.7739913087844714 | Test Loss 0.6864732465206014
epoch 8/10 | Train Loss 0.6864732465206014 | Test Loss 0.612491545949527
epoch 9/10 | Train Loss 0.612491545949527 | Test Loss 0.5502463608151078
epoch 10/10 | Train Loss 0.5502463608151078 | Test Loss 0.49813874870251657


In [21]:
mini_batch_gd_weights = gradient_descent(
	X_train, y_train,
	X_train, y_train,
	batch_size=1024,
	epochs=10,
	learning_rate=0.8,
)

epoch 1/10 | Train Loss 71.03182866618727 | Test Loss 0.16451235623137758
epoch 2/10 | Train Loss 7.138745492583239 | Test Loss 0.1301894949651112
epoch 3/10 | Train Loss 6.096256921413664 | Test Loss 0.11716068197117482
epoch 4/10 | Train Loss 5.609873843532675 | Test Loss 0.10979591875931535
epoch 5/10 | Train Loss 5.310776406442182 | Test Loss 0.10488301723750358
epoch 6/10 | Train Loss 5.102031618321007 | Test Loss 0.10129662661759811
epoch 7/10 | Train Loss 4.945302670120703 | Test Loss 0.09852589141975114
epoch 8/10 | Train Loss 4.821845125282588 | Test Loss 0.09629930141649723
epoch 9/10 | Train Loss 4.721201172450327 | Test Loss 0.09445695578184324
epoch 10/10 | Train Loss 4.637002894178037 | Test Loss 0.09289777158757297


### Comparing Results

In [22]:
def metrics(preds, y_true):

	# Calculate True Positives, False Positives, False Negatives
	tp = np.sum((preds == 1) & (y_true == 1))
	fp = np.sum((preds == 1) & (y_true == 0))
	fn = np.sum((preds == 0) & (y_true == 1))
	
	# Calculate Accuracy
	accuracy = np.sum(preds == y_true) / len(y_true)
	
	# Calculate Precision
	precision = tp / (tp + fp) if (tp + fp) != 0 else 0
	
	# Calculate Recall
	recall = tp / (tp + fn) if (tp + fn) != 0 else 0
	
	# Calculate F1 Score
	f1score = 2 * (precision * recall) / (precision + recall) if (precision + recall) != 0 else 0
	
	return accuracy, precision, recall, f1score

def generate_report(preds, y_true):

	for i in range(preds.shape[0]):
		max_index = np.argmax(preds[i])
		preds[i] = np.where(preds[i] == preds[i][max_index], 1, 0)

	results = []
	for i in range(10):
		result = metrics(preds[:, i], y_true[:, i])
		results.append(result)
	return pd.DataFrame(
		results,
		columns=['Accuracy', 'Precision', 'Recall', 'F1 Score'],
		index=[f"Digit {i}" for i in range(10)]
	)

In [23]:
print("Train Set Result on Batch Gradient Descent")
preds = logistic_regression_vector_prediction(batch_gd_weights, X_train)
generate_report(preds, y_train.values)

Train Set Result on Batch Gradient Descent


Unnamed: 0,Accuracy,Precision,Recall,F1 Score
Digit 0,0.968,0.830032,0.849554,0.839679
Digit 1,0.9612,0.780631,0.915639,0.842762
Digit 2,0.93508,0.727537,0.554147,0.629113
Digit 3,0.93002,0.633278,0.746128,0.685087
Digit 4,0.92482,0.632148,0.541469,0.583306
Digit 5,0.91398,0.712215,0.076343,0.137903
Digit 6,0.9621,0.807074,0.811149,0.809106
Digit 7,0.95732,0.79843,0.786087,0.79221
Digit 8,0.89902,0.485587,0.720157,0.580055
Digit 9,0.90082,0.502197,0.664595,0.572094


In [24]:
print("Test Set Result on Batch Gradient Descent")
preds = logistic_regression_vector_prediction(batch_gd_weights, X_test)
generate_report(preds, y_test.values)

Test Set Result on Batch Gradient Descent


Unnamed: 0,Accuracy,Precision,Recall,F1 Score
Digit 0,0.9704,0.848761,0.851344,0.850051
Digit 1,0.9658,0.795322,0.927694,0.856423
Digit 2,0.93435,0.731246,0.554402,0.630661
Digit 3,0.9362,0.653537,0.797059,0.718198
Digit 4,0.93055,0.680905,0.551654,0.609502
Digit 5,0.9146,0.728111,0.087438,0.156126
Digit 6,0.9661,0.825587,0.821299,0.823438
Digit 7,0.9599,0.817261,0.800283,0.808683
Digit 8,0.90375,0.509857,0.75643,0.609137
Digit 9,0.90955,0.529848,0.725381,0.612385


In [25]:
print("Train Set Result on Mini-Batch Gradient Descent")
preds = logistic_regression_vector_prediction(mini_batch_gd_weights, X_train)
generate_report(preds, y_train.values)

Train Set Result on Mini-Batch Gradient Descent


Unnamed: 0,Accuracy,Precision,Recall,F1 Score
Digit 0,0.99058,0.937439,0.969181,0.953046
Digit 1,0.98896,0.942966,0.960902,0.951849
Digit 2,0.97728,0.899666,0.868156,0.88363
Digit 3,0.97406,0.869895,0.876887,0.873377
Digit 4,0.98128,0.914957,0.890101,0.902358
Digit 5,0.97224,0.872254,0.810697,0.84035
Digit 6,0.98814,0.933201,0.948091,0.940587
Digit 7,0.983,0.922942,0.911884,0.917379
Digit 8,0.96782,0.822462,0.851508,0.836733
Digit 9,0.97168,0.849648,0.870088,0.859746


In [26]:
print("Test Set Result on Mini-Batch Gradient Descent")
preds = logistic_regression_vector_prediction(mini_batch_gd_weights, X_test)
generate_report(preds, y_test.values)

Test Set Result on Mini-Batch Gradient Descent


Unnamed: 0,Accuracy,Precision,Recall,F1 Score
Digit 0,0.9914,0.943759,0.970573,0.956978
Digit 1,0.99205,0.960705,0.967258,0.96397
Digit 2,0.98065,0.921175,0.884273,0.902347
Digit 3,0.9761,0.871551,0.898039,0.884597
Digit 4,0.984,0.929056,0.906361,0.917568
Digit 5,0.9747,0.894961,0.815717,0.853503
Digit 6,0.98795,0.928717,0.947532,0.93803
Digit 7,0.98325,0.923918,0.917375,0.920635
Digit 8,0.96925,0.830435,0.866868,0.848261
Digit 9,0.97435,0.861538,0.881218,0.871267


# 4. Iris Dataset
In this section we will apply Logistic Regression functions implemented from scracth above, on Iris Dataset

In [27]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
import pandas as pd

# Load the Iris dataset
iris = fetch_openml('iris', version=1, cache=True, parser='auto')

# Pandas data frame with feature vectors
X = iris.data

# Scale
# X = X / 255.

# Labels
y = iris.target

# one hot encoding
y = pd.get_dummies(y).astype(int)

# train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y)

# show shapes
print("X_train.shape:", X_train.shape)
print("y_train.shape:", y_train.shape)
print("X_test.shape:", X_test.shape)
print("y_test.shape:", y_test.shape)

X_train.shape: (120, 4)
y_train.shape: (120, 3)
X_test.shape: (30, 4)
y_test.shape: (30, 3)


### Applying Gradient Descents

In [28]:
batch_gd_weights = gradient_descent(
	X_train, y_train,
	X_train, y_train,
	batch_size=len(X_train),
	epochs=10,
	learning_rate=0.05,
)

epoch 1/10 | Train Loss 4.726550872712751 | Test Loss 3.413371548941273
epoch 2/10 | Train Loss 3.413371548941273 | Test Loss 2.1997866006835736
epoch 3/10 | Train Loss 2.1997866006835736 | Test Loss 1.3460407827657273
epoch 4/10 | Train Loss 1.3460407827657273 | Test Loss 0.9297036782711285
epoch 5/10 | Train Loss 0.9297036782711285 | Test Loss 0.7103046560803397
epoch 6/10 | Train Loss 0.7103046560803397 | Test Loss 0.6459606864066962
epoch 7/10 | Train Loss 0.6459606864066962 | Test Loss 0.630006859570538
epoch 8/10 | Train Loss 0.630006859570538 | Test Loss 0.6184016298020568
epoch 9/10 | Train Loss 0.6184016298020568 | Test Loss 0.6075446578667751
epoch 10/10 | Train Loss 0.6075446578667751 | Test Loss 0.597184384659209


In [29]:
mini_batch_gd_weights = gradient_descent(
	X_train, y_train,
	X_train, y_train,
	batch_size=1024,
	epochs=10,
	learning_rate=0.05,
)

epoch 1/10 | Train Loss 4.96391702489201 | Test Loss 3.749646611340176
epoch 2/10 | Train Loss 3.749646611340176 | Test Loss 2.746347980902355
epoch 3/10 | Train Loss 2.746347980902355 | Test Loss 1.9193740623534756
epoch 4/10 | Train Loss 1.9193740623534756 | Test Loss 1.2705278029138203
epoch 5/10 | Train Loss 1.2705278029138203 | Test Loss 0.841395547844468
epoch 6/10 | Train Loss 0.841395547844468 | Test Loss 0.6754446330317418
epoch 7/10 | Train Loss 0.6754446330317418 | Test Loss 0.6474395423501708
epoch 8/10 | Train Loss 0.6474395423501708 | Test Loss 0.6348320634575362
epoch 9/10 | Train Loss 0.6348320634575362 | Test Loss 0.6240264513240896
epoch 10/10 | Train Loss 0.6240264513240896 | Test Loss 0.6138045076813763


### Comparing Results

In [30]:
def metrics(preds, y_true):

	# Calculate True Positives, False Positives, False Negatives
	tp = np.sum((preds == 1) & (y_true == 1))
	fp = np.sum((preds == 1) & (y_true == 0))
	fn = np.sum((preds == 0) & (y_true == 1))
	
	# Calculate Accuracy
	accuracy = np.sum(preds == y_true) / len(y_true)
	
	# Calculate Precision
	precision = tp / (tp + fp) if (tp + fp) != 0 else 0
	
	# Calculate Recall
	recall = tp / (tp + fn) if (tp + fn) != 0 else 0
	
	# Calculate F1 Score
	f1score = 2 * (precision * recall) / (precision + recall) if (precision + recall) != 0 else 0
	
	return accuracy, precision, recall, f1score

def generate_report(preds, y_true):

	for i in range(preds.shape[0]):
		max_index = np.argmax(preds[i])
		preds[i] = np.where(preds[i] == preds[i][max_index], 1, 0)

	results = []
	for i in range(3):
		result = metrics(preds[:, i], y_true[:, i])
		results.append(result)
	return pd.DataFrame(
		results,
		columns=['Accuracy', 'Precision', 'Recall', 'F1 Score'],
		index=[f"Category {i}" for i in range(3)]
	)

In [31]:
print("Train Set Result on Batch Gradient Descent")
preds = logistic_regression_vector_prediction(batch_gd_weights, X_train)
generate_report(preds, y_train.values)

Train Set Result on Batch Gradient Descent


Unnamed: 0,Accuracy,Precision,Recall,F1 Score
Category 0,0.666667,0.0,0.0,0.0
Category 1,0.35,0.025,0.025,0.025
Category 2,0.666667,0.5,1.0,0.666667


In [32]:
print("Test Set Result on Batch Gradient Descent")
preds = logistic_regression_vector_prediction(batch_gd_weights, X_test)
generate_report(preds, y_test.values)

Test Set Result on Batch Gradient Descent


Unnamed: 0,Accuracy,Precision,Recall,F1 Score
Category 0,0.666667,0.0,0.0,0.0
Category 1,0.333333,0.0,0.0,0.0
Category 2,0.666667,0.5,1.0,0.666667


In [33]:
print("Train Set Result on Mini-Batch Gradient Descent")
preds = logistic_regression_vector_prediction(mini_batch_gd_weights, X_train)
generate_report(preds, y_train.values)

Train Set Result on Mini-Batch Gradient Descent


Unnamed: 0,Accuracy,Precision,Recall,F1 Score
Category 0,0.666667,0.0,0.0,0.0
Category 1,0.333333,0.0,0.0,0.0
Category 2,0.65,0.4875,0.975,0.65


In [34]:
print("Test Set Result on Mini-Batch Gradient Descent")
preds = logistic_regression_vector_prediction(mini_batch_gd_weights, X_test)
generate_report(preds, y_test.values)

Test Set Result on Mini-Batch Gradient Descent


Unnamed: 0,Accuracy,Precision,Recall,F1 Score
Category 0,0.666667,0.0,0.0,0.0
Category 1,0.366667,0.090909,0.1,0.095238
Category 2,0.7,0.526316,1.0,0.689655


**Find More Labs**

This lab is from my Machine Learning Course, that is a part of my [Software Engineering](https://seecs.nust.edu.pk/program/bachelor-of-software-engineering-for-fall-2021-onward) Degree at [NUST](https://nust.edu.pk).

The content in the provided list of notebooks covers a range of topics in **machine learning** and **data analysis** implemented from scratch or using popular libraries like **NumPy**, **pandas**, **scikit-learn**, **seaborn**, and **matplotlib**. It includes introductory materials on NumPy showcasing its efficiency for mathematical operations, **linear regression**, **logistic regression**, **decision trees**, **K-nearest neighbors (KNN)**, **support vector machines (SVM)**, **Naive Bayes**, **K-means** clustering, principle component analysis (**PCA**), and **neural networks** with **backpropagation**. Each notebook demonstrates practical implementation and application of these algorithms on various datasets such as the **California Housing** Dataset, **MNIST** dataset, **Iris** dataset, **Auto-MPG** dataset, and the **UCI Adult Census Income** dataset. Additionally, it covers topics like **gradient descent optimization**, model evaluation metrics (e.g., **accuracy, precision, recall, f1 score**), **regularization** techniques (e.g., **Lasso**, **Ridge**), and **data visualization**.

| Title                                                                                                                   | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| ----------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [01 - Intro to Numpy](https://www.kaggle.com/code/sacrum/ml-labs-01-intro-to-numpy)                                     | The notebook demonstrates NumPy's efficiency for mathematical operations like array `reshaping`, `sigmoid`, `softmax`, `dot` and `outer products`, `L1 and L2 losses`, and matrix operations. It highlights NumPy's superiority over standard Python lists in speed and convenience for scientific computing and machine learning tasks.                                                                                                                                                                                              |
| [02 - Linear Regression From Scratch](https://www.kaggle.com/code/sacrum/ml-labs-02-linear-regression-from-scratch)     | This notebook implements `linear regression` and `gradient descent` from scratch in Python using `NumPy`, focusing on predicting house prices with the `California Housing Dataset`. It defines functions for prediction, `MSE` calculation, and gradient computation. Batch gradient descent is used for optimization. The dataset is loaded, scaled, and split. `Batch, stochastic, and mini-batch gradient descents` are applied with varying hyperparameters. Finally, the MSEs of the predictions from each method are compared. |
| [03 - Logistic Regression from Scratch](https://www.kaggle.com/code/sacrum/ml-labs-03-logistic-regression-from-scratch) | This notebook outlines the implementation of `logistic regression` from scratch in Python using `NumPy`, including functions for prediction, loss calculation, gradient computation, and batch `gradient descent` optimization, applied to the `MNIST` dataset for handwritten digit recognition and `Iris` data. And also inclues metrics like `accuracy`, `precision`, `recall`, `f1 score`                                                                                                                                         |
| [04 - Auto-MPG Regression](https://www.kaggle.com/code/sacrum/ml-labs-04-auto-mpg-regression)                           | The notebook uses `pandas` for data manipulation, `seaborn` and `matplotlib` for visualization, and `sklearn` for `linear regression` and `regularization` techniques (`Lasso` and `Ridge`). It includes data loading, processing, visualization, model training, and evaluation on the `Auto-MPG dataset`.                                                                                                                                                                                                                           |
| [05 - Desicion Trees from Scratch](https://www.kaggle.com/code/sacrum/ml-labs-05-desicion-trees-from-scratch)           | In this notebook, `DecisionTree` algorithm has been implmented from scratch and applied on dummy dataset                                                                                                                                                                                                                                                                                                                                                                                                                              |
| [06 - KNN from Scratch](https://www.kaggle.com/code/sacrum/ml-labs-06-knn-from-scratch)                                 | In this notebook, `K-Nearest Neighbour` algorithm has been implemented from scratch and compared with KNN provided in scikit-learn package                                                                                                                                                                                                                                                                                                                                                                                            |
| [07 - SVM](https://www.kaggle.com/code/sacrum/ml-labs-07-svm)                                                           | This notebook implements `SVM classifier` on `Iris Dataset`                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| [08 - Naive Bayes](https://www.kaggle.com/code/sacrum/ml-labs-08-naive-bayes)                                           | This notebook trains `Naive Bayes` and compares it with other algorithms `Decision Trees`, `SVM` and `Logistic Regression`                                                                                                                                                                                                                                                                                                                                                                                                            |
| [09 - K-means](https://www.kaggle.com/code/sacrum/ml-labs-09-k-means)                                                   | In this notebook `K-means` algorithm has been implemented using `scikit-learn` and different values of `k` are compared to understand the `elbow method` in `Calinski Harabasz Scores`                                                                                                                                                                                                                                                                                                                                                |
| [10 - UCI Adult Census Income](https://www.kaggle.com/code/sacrum/ml-labs-10-uci-adult-census-income)                   | Here I have used the UCI Adult Income dataset and applied different machine learning algorithms to find the best model configuration for predicting salary from the given information                                                                                                                                                                                                                                                                                                                                                 |
| [11 - PCA](https://www.kaggle.com/code/sacrum/ml-labs-11-pca)                                                           | `Principle Component Analysis` implemented from scratch                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| [12 - Neural Networks](https://www.kaggle.com/code/sacrum/ml-labs-12-neural-networks)                                   | This code implements neural networks with back propagation from scratch                                                                                                                                                                                                                                                                                                                                                                                                                                                               |