<a href="https://www.kaggle.com/code/sacrum/ml-labs-02-linear-regression-from-scratch?scriptVersionId=178243428" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# 1. Linear Regression

In this section we will implement basic functions that are used for training a linear regression model

### Setting Up

In [1]:
import numpy as np

In [2]:
# this function prints the vector with its shape

def print_vector(vector, vector_name):
	print(f""">> {vector_name}.shape\n{vector.shape}\n\n>> {vector_name}\n{vector}\n""")

In [3]:
# Initialize Dummy Data

n_dims = 3
n_samples = 10

weight = np.random.rand(n_dims)
feature_vector = np.random.rand(n_samples, n_dims)
y_true = np.random.rand(n_samples)

In [4]:
print_vector(weight, "weight")
print_vector(feature_vector, "feature_vector")
print_vector(y_true, "y_true")

>> weight.shape
(3,)

>> weight
[0.27390397 0.58489484 0.03643782]

>> feature_vector.shape
(10, 3)

>> feature_vector
[[0.01059579 0.6124475  0.29862076]
 [0.56801277 0.196372   0.7172178 ]
 [0.55751561 0.24821628 0.41610273]
 [0.30215765 0.32051184 0.35208816]
 [0.19319418 0.07730115 0.71617865]
 [0.39898015 0.25610437 0.43126924]
 [0.78716473 0.54406537 0.79419203]
 [0.92592103 0.07131411 0.76749022]
 [0.02865615 0.35722214 0.93128407]
 [0.06399781 0.58180371 0.57440601]]

>> y_true.shape
(10,)

>> y_true
[0.79012808 0.6416438  0.06313733 0.97930555 0.07768066 0.09060499
 0.00745351 0.4453443  0.61780283 0.52368578]



### Linear regression single prediction
Create a function that takes two vectors as input (weight and feature vector) and returns the linear regression prediction for that input.

In [5]:
def linear_regression_prediction(weight, feature_vector):
    return np.dot(weight, feature_vector)

In [6]:
single_predictions = linear_regression_prediction(weight, feature_vector[0])

In [7]:
print_vector(single_predictions, "single_predictions")

>> single_predictions.shape
()

>> single_predictions
0.37200069924579143



### Linear regression vector prediction
Create a function that takes a matrix and a vector as input (weight vector and feature matrix) and returns the linear regression prediction vector for the whole training set.

In [8]:
def linear_regression_vector_prediction(weight_vector, feature_matrix):
    return np.dot(feature_matrix, weight_vector)

In [9]:
predictions = linear_regression_vector_prediction(weight, feature_vector)

In [10]:
print_vector(predictions, "predictions")

>> predictions.shape
(10,)

>> predictions
[0.3720007  0.29657177 0.31304803 0.28305723 0.12422569 0.27479088
 0.56276719 0.32329037 0.25072038 0.37875335]



### Mean Squared Error
Now create a function that takes a vector of predictions and a vector of actual values as input and returns the Mean Squared Error.

In [11]:
def mean_squared_error(predictions, actual_values):
    return np.mean((predictions - actual_values) ** 2)

In [12]:
loss = mean_squared_error(predictions, y_true)

In [13]:
print_vector(loss, "loss")

>> loss.shape
()

>> loss
0.13562385653777925



### MSE Gradient
Now create a function that takes a vector of predictions and a vector of actual values as input and returns the Gradient of Mean Squared Error.

In [14]:
def mse_gradient(feature_vector, predictions, actual_values):
    # grad = (2/n) * X * (P-Y)

    # (P-Y)
    diff = predictions - actual_values

    # X * diff
    dot = np.dot(feature_vector.T, diff)

    n = len(feature_vector)
    grad = (2/n) * dot

    return grad

In [15]:
grad = mse_gradient(feature_vector, predictions, y_true)

In [16]:
print_vector(grad, "grad")

>> grad.shape
(3,)

>> grad
[ 0.02306238 -0.07124562 -0.09569898]



### Gradient Descent Algorithm
This function performs batch gradient descent to optimize a linear regression model.

In [17]:

"""
Batch gradient descent: batch_size=len(X)
Stochastic gradient descent: batch_size=1
Mini-batch gradient descent: batch_size=32
"""

def gradient_descent(X, y, batch_size=32, epochs=10, learning_rate=0.01):

	# n_samples, n_dims
	n, m = X.shape

	# initialize random weights
	weights = np.random.rand(m)

	losses = []
	for epoch in range(epochs):

		loss = 0
		for iteration in range(0, n, batch_size):

			batch_start = iteration
			batch_end = iteration + batch_size

			x_batch = X[batch_start:batch_end]
			y_batch = y[batch_start:batch_end]

			predictions = linear_regression_vector_prediction(weights, x_batch)

			gradient = mse_gradient(x_batch, predictions, y_batch)
			weights -= learning_rate * gradient
			
			batch_loss = mean_squared_error(predictions, y_batch)
			loss += batch_loss

		losses.append(loss)
		print(f"epoch {epoch+1}/{epochs} | loss {loss}")

	return weights, losses

In [18]:
weights = gradient_descent(feature_vector, y_true, learning_rate=0.1)

epoch 1/10 | loss 0.33818799746331574
epoch 2/10 | loss 0.28736461232730826
epoch 3/10 | loss 0.24897924512567643
epoch 4/10 | loss 0.21995183951632158
epoch 5/10 | loss 0.1979662194841185
epoch 6/10 | loss 0.18128060240383342
epoch 7/10 | loss 0.16858511719998276
epoch 8/10 | loss 0.15889466723676263
epoch 9/10 | loss 0.15146837011329442
epoch 10/10 | loss 0.14574898153977262


# 2. California Housing
In this section we will apply Linear Regression functions implemented from scracth above, on California Housing Dataset to predict house prices

### Loading and Scaling Data
- Data Source
	- [Kaggle](https://www.kaggle.com/datasets/camnugent/california-housing-prices)
	- [Google Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course/california-housing-data-description)

In [19]:
# Dataset can be loaded using scikitlearn datasets

from sklearn import datasets

california_housing = datasets.fetch_california_housing()

X = california_housing.data  # Feature matrix
y = california_housing.target  # Target values (median house values)

# Scale The Data
X = (X - X.mean(axis=0)) / X.std(axis=0)
y = (y - y.mean()) / y.std()

X.shape, y.shape

((20640, 8), (20640,))

### Applying Gradient Descents

In [20]:
# Apply Batch (Vanilla) Gradient Descent

batch_gd_weights, batch_gd_losses = gradient_descent(
	X, y,
	batch_size=len(X),
	epochs=100,
	learning_rate=0.4,
)

epoch 1/100 | loss 2.7378412203269975
epoch 2/100 | loss 1.0363077675021954
epoch 3/100 | loss 0.671534991760565
epoch 4/100 | loss 0.555747856043864
epoch 5/100 | loss 0.5111275941682364
epoch 6/100 | loss 0.48953331587963117
epoch 7/100 | loss 0.47623067953369225
epoch 8/100 | loss 0.4664110866545011
epoch 9/100 | loss 0.45838965163542933
epoch 10/100 | loss 0.4515148358425929
epoch 11/100 | loss 0.4454955903489219
epoch 12/100 | loss 0.4401749881710063
epoch 13/100 | loss 0.43545073861582617
epoch 14/100 | loss 0.4312460019306337
epoch 15/100 | loss 0.427498058706507
epoch 16/100 | loss 0.4241535302888601
epoch 17/100 | loss 0.4211660897559262
epoch 18/100 | loss 0.41849516685767146
epoch 19/100 | loss 0.41610508183949463
epoch 20/100 | loss 0.4139643885975785
epoch 21/100 | loss 0.4120453380167856
epoch 22/100 | loss 0.4103234224386257
epoch 23/100 | loss 0.40877698192813544
epoch 24/100 | loss 0.4073868611052471
epoch 25/100 | loss 0.4061361088918584
epoch 26/100 | loss 0.40500971

In [21]:
# Apply Stochastic Gradient Descent

stochastic_gd_weights, stochastic_gd_losses = gradient_descent(
	X, y,
	batch_size=1,
	epochs=15,
	learning_rate=0.0002,
)

epoch 1/15 | loss 16437.312375524853
epoch 2/15 | loss 9370.945585916368
epoch 3/15 | loss 8538.964737954915
epoch 4/15 | loss 8222.168607181642
epoch 5/15 | loss 8083.914501931886
epoch 6/15 | loss 8015.746799781206
epoch 7/15 | loss 7978.504369434242
epoch 8/15 | loss 7956.586074310669
epoch 9/15 | loss 7943.00539590955
epoch 10/15 | loss 7934.285151695873
epoch 11/15 | loss 7928.54395166849
epoch 12/15 | loss 7924.696994318178
epoch 13/15 | loss 7922.087492679448
epoch 14/15 | loss 7920.302441537773
epoch 15/15 | loss 7919.074436194764


In [22]:
# Apply Mini-Batch Gradient Descent

mini_batch_gd_weights, mini_batch_gd_losses = gradient_descent(
	X, y,
	batch_size=64,
	epochs=15,
	learning_rate=0.008
)

epoch 1/15 | loss 256.77767811135914
epoch 2/15 | loss 151.49911960179924
epoch 3/15 | loss 143.31823009347804
epoch 4/15 | loss 138.36544539165948
epoch 5/15 | loss 135.4234512648322
epoch 6/15 | loss 133.58684238072743
epoch 7/15 | loss 132.39577460187846
epoch 8/15 | loss 131.59758017590138
epoch 9/15 | loss 131.04798112436706
epoch 10/15 | loss 130.66088432022943
epoch 11/15 | loss 130.38299378329182
epoch 12/15 | loss 130.18025686264042
epoch 13/15 | loss 130.03031858414033
epoch 14/15 | loss 129.91815088286893
epoch 15/15 | loss 129.83343459931228


### Comparing Mean Squared Error on 3 Gradient Descents

In [23]:
preds = linear_regression_vector_prediction(batch_gd_weights, X)
mse = mean_squared_error(preds, y)

print("Mean Squared Error on Batch Gradient Descent")
print(mse)

Mean Squared Error on Batch Gradient Descent
0.39379558679584864


In [24]:
preds = linear_regression_vector_prediction(stochastic_gd_weights, X)
mse = mean_squared_error(preds, y)

print("Mean Squared Error on Stochastic Gradient Descent")
print(mse)

Mean Squared Error on Stochastic Gradient Descent
0.40076754475946436


In [25]:
preds = linear_regression_vector_prediction(mini_batch_gd_weights, X)
mse = mean_squared_error(preds, y)

print("Mean Squared Error on Mini Batch Gradient Descent")
print(mse)

Mean Squared Error on Mini Batch Gradient Descent
0.39718631227489


**Find More Labs**

This lab is from my Machine Learning Course, that is a part of my [Software Engineering](https://seecs.nust.edu.pk/program/bachelor-of-software-engineering-for-fall-2021-onward) Degree at [NUST](https://nust.edu.pk).

The content in the provided list of notebooks covers a range of topics in **machine learning** and **data analysis** implemented from scratch or using popular libraries like **NumPy**, **pandas**, **scikit-learn**, **seaborn**, and **matplotlib**. It includes introductory materials on NumPy showcasing its efficiency for mathematical operations, **linear regression**, **logistic regression**, **decision trees**, **K-nearest neighbors (KNN)**, **support vector machines (SVM)**, **Naive Bayes**, **K-means** clustering, principle component analysis (**PCA**), and **neural networks** with **backpropagation**. Each notebook demonstrates practical implementation and application of these algorithms on various datasets such as the **California Housing** Dataset, **MNIST** dataset, **Iris** dataset, **Auto-MPG** dataset, and the **UCI Adult Census Income** dataset. Additionally, it covers topics like **gradient descent optimization**, model evaluation metrics (e.g., **accuracy, precision, recall, f1 score**), **regularization** techniques (e.g., **Lasso**, **Ridge**), and **data visualization**.

| Title                                                                                                                   | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| ----------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [01 - Intro to Numpy](https://www.kaggle.com/code/sacrum/ml-labs-01-intro-to-numpy)                                     | The notebook demonstrates NumPy's efficiency for mathematical operations like array `reshaping`, `sigmoid`, `softmax`, `dot` and `outer products`, `L1 and L2 losses`, and matrix operations. It highlights NumPy's superiority over standard Python lists in speed and convenience for scientific computing and machine learning tasks.                                                                                                                                                                                              |
| [02 - Linear Regression From Scratch](https://www.kaggle.com/code/sacrum/ml-labs-02-linear-regression-from-scratch)     | This notebook implements `linear regression` and `gradient descent` from scratch in Python using `NumPy`, focusing on predicting house prices with the `California Housing Dataset`. It defines functions for prediction, `MSE` calculation, and gradient computation. Batch gradient descent is used for optimization. The dataset is loaded, scaled, and split. `Batch, stochastic, and mini-batch gradient descents` are applied with varying hyperparameters. Finally, the MSEs of the predictions from each method are compared. |
| [03 - Logistic Regression from Scratch](https://www.kaggle.com/code/sacrum/ml-labs-03-logistic-regression-from-scratch) | This notebook outlines the implementation of `logistic regression` from scratch in Python using `NumPy`, including functions for prediction, loss calculation, gradient computation, and batch `gradient descent` optimization, applied to the `MNIST` dataset for handwritten digit recognition and `Iris` data. And also inclues metrics like `accuracy`, `precision`, `recall`, `f1 score`                                                                                                                                         |
| [04 - Auto-MPG Regression](https://www.kaggle.com/code/sacrum/ml-labs-04-auto-mpg-regression)                           | The notebook uses `pandas` for data manipulation, `seaborn` and `matplotlib` for visualization, and `sklearn` for `linear regression` and `regularization` techniques (`Lasso` and `Ridge`). It includes data loading, processing, visualization, model training, and evaluation on the `Auto-MPG dataset`.                                                                                                                                                                                                                           |
| [05 - Desicion Trees from Scratch](https://www.kaggle.com/code/sacrum/ml-labs-05-desicion-trees-from-scratch)           | In this notebook, `DecisionTree` algorithm has been implmented from scratch and applied on dummy dataset                                                                                                                                                                                                                                                                                                                                                                                                                              |
| [06 - KNN from Scratch](https://www.kaggle.com/code/sacrum/ml-labs-06-knn-from-scratch)                                 | In this notebook, `K-Nearest Neighbour` algorithm has been implemented from scratch and compared with KNN provided in scikit-learn package                                                                                                                                                                                                                                                                                                                                                                                            |
| [07 - SVM](https://www.kaggle.com/code/sacrum/ml-labs-07-svm)                                                           | This notebook implements `SVM classifier` on `Iris Dataset`                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
| [08 - Naive Bayes](https://www.kaggle.com/code/sacrum/ml-labs-08-naive-bayes)                                           | This notebook trains `Naive Bayes` and compares it with other algorithms `Decision Trees`, `SVM` and `Logistic Regression`                                                                                                                                                                                                                                                                                                                                                                                                            |
| [09 - K-means](https://www.kaggle.com/code/sacrum/ml-labs-09-k-means)                                                   | In this notebook `K-means` algorithm has been implemented using `scikit-learn` and different values of `k` are compared to understand the `elbow method` in `Calinski Harabasz Scores`                                                                                                                                                                                                                                                                                                                                                |
| [10 - UCI Adult Census Income](https://www.kaggle.com/code/sacrum/ml-labs-10-uci-adult-census-income)                   | Here I have used the UCI Adult Income dataset and applied different machine learning algorithms to find the best model configuration for predicting salary from the given information                                                                                                                                                                                                                                                                                                                                                 |
| [11 - PCA](https://www.kaggle.com/code/sacrum/ml-labs-11-pca)                                                           | `Principle Component Analysis` implemented from scratch                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
| [12 - Neural Networks](https://www.kaggle.com/code/sacrum/ml-labs-12-neural-networks)                                   | This code implements neural networks with back propagation from scratch                                                                                                                                                                                                                                                                                                                                                                                                                                                               |