<a href="https://colab.research.google.com/github/gautam822/ICG_SUPPLY_CHAIN/blob/main/Gautam_Ginodiya_Assignment2_DS_ICG_Supply_Chain_Unveiled.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Importing Libraries

In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split,KFold
from sklearn.preprocessing import StandardScaler


Using the same dataset for all the algorithms of task 1

In [7]:
data = {
    'x': [1, 2, 3, 4, 5],
    'y': [2, 4, 5, 4, 5]
}
df = pd.DataFrame(data)

X = df['x']
y = df['y']
n = len(X)

Task 1

1.Linear regression:-

Cost Function(Mean squared error):

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

The best fit line is y=m*x+c

*  n \: number of data points  
*  yi \: actual value (ground truth)  
*  yi(hat) : predicted value by the model

Ordinary Least Squares

In [8]:
# If we do the calculation of differentiating and putting it equal to zero then we get the mathematical relation for m and c as follows:
m=(n*(X*y).sum()-(X.sum()*y.sum()))/((n*(X**2).sum())-(X.sum()**2))
c=(y.sum()-m*X.sum())/n
df['y_pred'] = m * X + c
mse_ols = ((df['y'] - df['y_pred'])**2).mean()

Gradient Descent

Usefull terms:  Learning rate= Step size

Epochs: Iterations.

In [11]:
m_gd,c_gd=0,0
lr=0.01
epochs=1000
for i in range(epochs):
  y_pred =m_gd*X + c_gd
  error = y - y_pred

  d_m  =  (-2/n)*sum(X*(y-y_pred))

  d_c  =  (-2/n)*sum(y-y_pred)

  m_gd -= lr*d_m
  c_gd -= lr*d_c

  df['y_pred_gd'] = m_gd * X + c_gd

  mse_gd = ((df['y'] - df['y_pred_gd'])**2).mean()



Lasso Regression

### 📉 Lasso Regression Loss Function

L1 Regularization

$$
{\text{Lasso}}(\beta) = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^{p} |\beta_j|
$$

---



### 🔍 Where:

- Lasso \: Total cost function for Lasso Regression  
-  n \: number of data points
-  yi \: Actual value of the target variable for the \( i^th \) sample  
- yi(hat) \: Predicted value from the model for the \( i^th \) sample  
- beta_j \: Model coefficient for the \( j^th \) feature  
- lambda \: Regularization parameter (controls penalty strength)  
-  p \: Number of features (independent variables)  
  


In [14]:
m_lasso, c_lasso = 0, 0
lr = 0.01
epochs = 1000
lmbda = 0.1

for _ in range(epochs):
    y_pred = m_lasso * X + c_lasso
    error = y - y_pred


    dm = (-2/n) * (X * error).sum() + lmbda * np.sign(m_lasso)
    dc = (-2/n) * error.sum()

    m_lasso -= lr * dm
    c_lasso -= lr * dc

df['y_pred_lasso'] = m_lasso * X + c_lasso

## 📘 Ridge Regression Cost Function

The cost function for Ridge Regression is:

$$
J(\beta) = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 + \lambda \|\beta\|^2
$$

---
Here Lambda is greater than or equal to zero for both lasso and ridge regression.



In [15]:

m_ridge, c_ridge = 0.0, 0.0
lr = 0.01
epochs = 1000
lmbda = 0.1

for _ in range(epochs):
    y_pred = m_ridge * X + c_ridge
    error = y - y_pred


    dm = (-2/n) * (X * error).sum() + 2 * lmbda * m_ridge
    dc = (-2/n) * error.sum()

    m_ridge -= lr * dm
    c_ridge -= lr * dc

df['y_pred_ridge'] = m_ridge * X + c_ridge




## 🌳 Decision Tree Regressor: Loss Function

### 🔹 Mean Squared Error (MSE):

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \bar{y})^2
$$

---

### 📌 Definitions:
-  n : Number of data points in the node  
- yi : Actual value for the \( i^th )data point  
- bar(y): Mean of all target values in the node  
- Lower MSE → better node split  


In [17]:
def mse(y):
    return ((y - y.mean()) ** 2).mean()

def best_split_regression(X, y):
    best_mse = float('inf')
    best_split = None

    for val in sorted(X.unique()):
        left = y[X < val]
        right = y[X >= val]
        if len(left) == 0 or len(right) == 0:
            continue

        score = (len(left) * mse(left) + len(right) * mse(right)) / len(y)

        if score < best_mse:
            best_mse = score
            best_split = val

    return best_split, best_mse

split, loss = best_split_regression(X,y)

## 🌳 Decision Tree Classifier: Loss Functions

### 🔹 Gini Impurity:

$$
G = 1 - \sum_{i=1}^{C} p_i^2
$$

### 🔹 Entropy:

$$
H = -\sum_{i=1}^{C} p_i \log_2(p_i)
$$

---

### 📌 Definitions:
-  C : Number of classes  
-  p_i : Proportion of class \( i \) in a node  
- Lower values of Gini or Entropy indicate purer (better) splits  


In [18]:
def gini(y):
    classes = y.unique()
    g = 1.0
    for c in classes:
        p = (y == c).mean()
        g -= p ** 2
    return g

def best_split_classification(X, y):
    best_gini = float('inf')
    best_split = None

    for val in sorted(X.unique()):
        left = y[X < val]
        right = y[X >= val]
        if len(left) == 0 or len(right) == 0:
            continue

        score = (len(left) * gini(left) + len(right) * gini(right)) / len(y)

        if score < best_gini:
            best_gini = score
            best_split = val

    return best_split, best_gini

split_c, loss_c = best_split_classification(X, y)


## 📊 K-Means Clustering: Loss Function

The objective (loss) function for K-Means clustering is:

$$
J = \sum_{i=1}^{k} \sum_{x \in C_i} \| x - \mu_i \|^2
$$

---

### 📌 Definitions:

-  k : Number of clusters  
- C_i : The set of data points assigned to cluster \( i \)  
- mu_i : Centroid (mean) of cluster \( i \)  
- x : A data point  
- | x - mu_i |^2 : Squared Euclidean distance between the point and its cluster center  


In [39]:
class KMeans:
    def __init__(self, k_clusters, max_iters, tol=1e-4):
        self.k_clusters = k_clusters
        self.max_iters = max_iters
        self.tol = tol

    def fit(self, X):
        if isinstance(X, pd.DataFrame):
            X = X.values
        self.X = X
        m, n = X.shape
        random_indices = np.random.choice(m, self.k_clusters, replace=False)
        self.centroids = X[random_indices]
        for i in range(self.max_iters):
            self.labels = self.assign_clusters(X)
            new_centroids = np.array([
                X[self.labels == j].mean(axis=0) for j in range(self.k_clusters)
            ])
            if np.all(np.linalg.norm(self.centroids - new_centroids, axis=1) < self.tol):
                break
            self.centroids = new_centroids
        return self

    def assign_clusters(self, X):
        distances = np.linalg.norm(X[:, np.newaxis] - self.centroids, axis=2)
        return np.argmin(distances, axis=1)

    def predict(self, X):
        return self.assign_clusters(X)

    def wcss(self):
        return np.sum([
            np.sum((self.X[self.labels == i] - self.centroids[i]) ** 2)

            for i in range(self.k_clusters)
        ])