**3.1 Implementation from Scratch Step - by - Step Guide:**

---



**3.1.1 Step -1- Data Understanding, Analysis and Preparations:**

**• To - Do - 1:**
1. Read and Observe the Dataset.
2. Print top(5) and bottom(5) of the dataset {Hint: pd.head and pd.tail}.
3. Print the Information of Datasets. {Hint: pd.info}.
4. Gather the Descriptive info about the Dataset. {Hint: pd.describe}
5. Split your data into Feature (X) and Label (Y).

---



In [None]:
import pandas as pd
import numpy as np

In [None]:
# Read the dataset
df = pd.read_csv("/content/drive/MyDrive/Concepts and technologies of AI/student.csv")

# Print the first 5 rows
print("Top 5 rows:\n", df.head())

# Print the last 5 rows
print("\nBottom 5 rows:\n", df.tail())

#print the information of data
print("\nDataset Info:")
print(df.info())

#description info of java
print("\nDescriptive Statistics:")
print(df.describe())

#split data into x and y

X = df[['Math', 'Reading']]
Y = df['Writing']

print("\nFeatures (X):\n", X.head())
print("\nLabel (Y):\n", Y.head())

Top 5 rows:
    Math  Reading  Writing
0    48       68       63
1    62       81       72
2    79       80       78
3    76       83       79
4    59       64       62

Bottom 5 rows:
      Math  Reading  Writing
995    72       74       70
996    73       86       90
997    89       87       94
998    83       82       78
999    66       66       72

Dataset Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype
---  ------   --------------  -----
 0   Math     1000 non-null   int64
 1   Reading  1000 non-null   int64
 2   Writing  1000 non-null   int64
dtypes: int64(3)
memory usage: 23.6 KB
None

Descriptive Statistics:
              Math      Reading      Writing
count  1000.000000  1000.000000  1000.000000
mean     67.290000    69.872000    68.616000
std      15.085008    14.657027    15.241287
min      13.000000    19.000000    14.000000
25%      58.000000    60.750000    58.000000
50%    

**• To - Do - 2:**
1. To make the task easier - let’s assume there is no bias or intercept.
2. Create the matrices given in the question.

---



In [None]:
# Features (Math, Reading)
X = df[['Math', 'Reading']].values.T   # Transpose to get shape (d x n)

# Weight vector (initialize with zeros)
W = np.zeros((X.shape[0], 1))          # Shape (d x 1)

# Labels (Writing marks)
Y = df['Writing'].values.reshape(1, -1)  # Shape (1 x n)

# Prediction using matrix multiplication
Y_pred = W.T @ X  # Y = W^T * X

# Print shapes to verify
print("X shape:", X.shape)
print("W shape:", W.shape)
print("Y shape:", Y.shape)
print("Y_pred shape:", Y_pred.shape)

X shape: (2, 1000)
W shape: (2, 1)
Y shape: (1, 1000)
Y_pred shape: (1, 1000)


**• To - Do - 3:**
1. Split the dataset into training and test sets.
2. You can use an 80-20 or 70-30 split, with 80% (or 70%) of the data used for training and the rest
for testing.

---



In [None]:
from sklearn.model_selection import train_test_split

In [None]:
# Features and label
X = df[['Math', 'Reading']].values    # Feature matrix (n x d)
Y = df['Writing'].values              # Label vector (n,)

# Split dataset into training and test sets (80% train, 20% test)
X_train, X_test, Y_train, Y_test = train_test_split(
    X, Y, test_size=0.2, random_state=42
)

# Print shapes to verify
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("Y_train shape:", Y_train.shape)
print("Y_test shape:", Y_test.shape)

X_train shape: (800, 2)
X_test shape: (200, 2)
Y_train shape: (800,)
Y_test shape: (200,)


**3.1.2 Step -2- Build a Cost Function:**

---
**To - Do - 4:**


---



In [None]:
# Define the cost function (Mean Squared Error)
def cost_function(X, Y, W):
    """
    Parameters:
    X : Feature matrix (d x n)
    Y : Target matrix (1 x n)
    W : Weight matrix (d x 1)

    Returns:
    cost : Mean Squared Error (scalar)
    """
    n = X.shape[1]               # number of samples
    Y_pred = W.T @ X             # Prediction: Y_pred = W^T X
    error = Y_pred - Y           # Difference between predicted and actual
    cost = (1 / (2 * n)) * np.sum(error ** 2)  # MSE formula
    return cost

# Example usage:
# cost = cost_function(X_matrix, Y_matrix, W)
# print("Initial Cost:", cost)

**To - Do - 5:**

Make sure your code at To - Do - 4 passed the following test case:

---



In [None]:
# Test case
X_test = np.array([[1, 3, 5],
                   [2, 4, 6]])   # Feature matrix (2 features, 3 samples)

Y_test = np.array([[3, 7, 11]])   # True labels (1 x 3)

W_test = np.array([[1],
                   [1]])          # Weight vector (2 x 1)

# Calculate cost
cost = cost_function(X_test, Y_test, W_test)

# Check output
if cost == 0:
    print("Proceed Further")
else:
    print("Something went wrong: Reimplement cost function")
    print("Cost function output:", cost)

Proceed Further


**To - Do - 6:**

Implement your code for Gradient Descent; Either fill the following code or write your own:

---



In [None]:
def gradient_descent(X, Y, W, alpha, iterations):
    """
    Perform gradient descent to optimize the parameters of a linear regression model.

    Parameters:
    X (numpy.ndarray): Feature matrix (d x n).
    Y (numpy.ndarray): Target vector (1 x n).
    W (numpy.ndarray): Initial guess for parameters (d x 1).
    alpha (float): Learning rate.
    iterations (int): Number of iterations for gradient descent.

    Returns:
    tuple: (W_update, cost_history)
        W_update (numpy.ndarray): Updated parameters (d x 1).
        cost_history (list): Cost values over iterations.
    """
    cost_history = [0] * iterations
    n = X.shape[1]  # number of samples

    W_update = W.copy()

    for iteration in range(iterations):
        # Step 1: Hypothesis values
        Y_pred = W_update.T @ X  # shape (1 x n)

        # Step 2: Difference between hypothesis and actual
        loss = Y_pred - Y        # shape (1 x n)

        # Step 3: Gradient calculation
        dw = (1 / n) * (X @ loss.T)  # shape (d x 1)

        # Step 4: Update weights
        W_update = W_update - alpha * dw

        # Step 5: Compute new cost
        cost = cost_function(X, Y, W_update)
        cost_history[iteration] = cost

    return W_update, cost_history

**To - Do - 7:**

Make sure following Test Case is passe by your code from To - Do - 6 or your Gradient Descent
Implementation:

---



In [None]:
# Generate random test data
np.random.seed(0)
X_test = np.random.rand(3, 100)       # 3 features x 100 samples
Y_test = np.random.rand(1, 100)       # Labels
W_test = np.random.rand(3, 1)         # Initial weights

# Hyperparameters
alpha = 0.01
iterations = 1000

# Run Gradient Descent
final_params, cost_history = gradient_descent(X_test, Y_test, W_test, alpha, iterations)

# Print results
print("Final Parameters:\n", final_params)
print("First 10 Cost Values:\n", cost_history[:10])
print("Last Cost Value:", cost_history[-1])

Final Parameters:
 [[0.27126876]
 [0.47953137]
 [0.08853683]]
First 10 Cost Values:
 [np.float64(0.11501437366403591), np.float64(0.11417721076243244), np.float64(0.11335358952199538), np.float64(0.11254328596468793), np.float64(0.11174607982644355), np.float64(0.110961754495564), np.float64(0.11019009695213929), np.float64(0.10943089770847206), np.float64(0.10868395075049048), np.float64(0.10794905348013303)]
Last Cost Value: 0.05532065979560406


**To - Do - 8:**

Implementation of RMSE in the Code - Complete the following code or write your own:

---



In [None]:
# Model Evaluation - RMSE
def rmse(Y, Y_pred):
    """
    This Function calculates the Root Mean Squares.

    Input Arguments:
    Y: Array of actual (Target) Dependent Variables.
    Y_pred: Array of predicted Dependent Variables.

    Output Arguments:
    rmse: Root Mean Square.
    """
    n = Y.shape[1] if Y.ndim > 1 else len(Y)
    rmse = np.sqrt(np.sum((Y_pred - Y) ** 2) / n)
    return rmse

**To - Do - 9 - Implementation in the Code:**

Complete the following code or write your own for r2 loss:

---




In [None]:
# Model Evaluation - R2
def r2(Y, Y_pred):
    """
    This Function calculates the R Squared Error.

    Input Arguments:
    Y: Array of actual (Target) Dependent Variables.
    Y_pred: Array of predicted Dependent Variables.

    Output Arguments:
    r2: R Squared Error.
    """
    mean_y = np.mean(Y)
    ss_tot = np.sum((Y - mean_y) ** 2)      # Total Sum of Squares
    ss_res = np.sum((Y - Y_pred) ** 2)      # Sum of Squared Residuals
    r2 = 1 - (ss_res / ss_tot)              # R-squared
    return r2

**• To - Do - 10:**

We will define a function that:
1. Loads the data and splits it into training and test sets.
2. Prepares the feature matrix (X) and target vector (Y).
3. Defines the weight matrix (W) and initializes the learning rate and number of iterations.
4. Calls the gradient descent function to learn the parameters.
5. Evaluates the model using RMSE and R2

---



In [None]:
def main():
    # Load dataset
    data = pd.read_csv('/content/drive/MyDrive/Concepts and technologies of AI/student.csv')

    # Features and target
    X = data[['Math', 'Reading']].values.T
    Y = data['Writing'].values.reshape(1, -1)

    # Train-test split
    X_train, X_test, Y_train, Y_test = train_test_split(
        X.T, Y.T, test_size=0.2, random_state=42
    )
    X_train, X_test = X_train.T, X_test.T
    Y_train, Y_test = Y_train.T, Y_test.T

    # Initialize weights and hyperparameters
    W = np.zeros((X_train.shape[0], 1))
    alpha = 0.00001
    iterations = 1000

    # Train model
    W_optimal, cost_history = gradient_descent(X_train, Y_train, W, alpha, iterations)

    # Predict and evaluate
    Y_pred = W_optimal.T @ X_test
    print("Final Weights:\n", W_optimal)
    print("Cost History (First 10):", cost_history[:10])
    print("RMSE on Test Set:", rmse(Y_test, Y_pred))
    print("R-Squared on Test Set:", r2(Y_test, Y_pred))

# Execute main
if __name__ == "__main__":
    main()

Final Weights:
 [[0.34811659]
 [0.64614558]]
Cost History (First 10): [np.float64(2013.165570783755), np.float64(1640.286832599692), np.float64(1337.0619994901588), np.float64(1090.479489285058), np.float64(889.9583270083237), np.float64(726.8940993009545), np.float64(594.2897260808595), np.float64(486.4552052951634), np.float64(398.7634463599483), np.float64(327.45171473246876)]
RMSE on Test Set: 5.2798239764188635
R-Squared on Test Set: 0.8886354462786421


**To - Do - 11 - Present your finding:**


---


**1. Did your Model Overfitt, Underfitts, or performance is acceptable.**

->*The model’s performance is acceptable. The test RMSE is reasonable, and the R² value is high (≈0.89), which means the model explains most of the variation in writing scores. There are no signs of overfitting or underfitting; the model generalizes well to unseen data.*


---


**2. Experiment with different value of learning rate, making it higher and lower, observe the result.**

->*I experimented with different learning rates:*

* *Current learning rate (α = 0.00001): The model converges smoothly, but slowly.*

* *Higher learning rate (e.g., 0.0001 or 0.001): The model learns faster, but if too high, the cost starts to fluctuate or even diverge.*

* *Lower learning rate (e.g., 0.000001): The model is very stable, but convergence is much slower and requires more iterations.*

**Observation:** *A moderate learning rate gives the best balance between speed and stability.*

---

