Full Name: Nguyễn Trọng Nhân

Student ID: 22521005

**Experiment Requirements (for all datasets)**
- Train - test splitting ratio = 7:3.
- Random_state = 'Your Student ID'.
- Experiment with **Gradient Descent** (*max_iter = 200*, *learning_rate (eta)* is set by yourself) and **Normal Equation**.
- Visualize the cost trend (only for Gradient Descent)
- Evaluation Metric: **R2 score**.
- After completing all requirements, click on ‘Runtime’ and choose ‘Restart and run all’.
- If the code is run without any errors (and the results are the same as before re-running), download the .ipybn file and submit. 
Otherwise, check the code carefully.


# Fish Dataset

🐟🐟 **Dataset Information (fish.csv)** 🐟🐟
- Weight: Weight of fish in gram
- Length1: Vertical length in cm
- Length2: Diagonal length in cm
- Length3: Cross length in cm
- Height: Height in cm
- Width: Diagonal width in cm

📚📚 **Problem Requirement** 📚📚

Use all remaining attributes to estimate the **weight** of fish.


**Note**: Try to make the Gradient Descent achieving the comparable performance to the Normal Equation model. ✌✌

**Hint**💡: Apply (suitable) techniques in Data Preprocessing tutorial file.

In [49]:
# Code here
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error as MAE
from sklearn.metrics import mean_squared_error as MSE
from sklearn.metrics import r2_score as R2



class LinearRegression_GD:
    def __init__(self, eta = 0.001, max_iter = 20, random_state = 42):
        self.eta = eta
        self.max_iter = max_iter
        self.random_state = random_state
        self.w = None
        self.costs = []

    def predict(self, X):
        return np.dot(X, self.w[1:]) + self.w[0]
  
    def fit(self, X, y):
        rgen = np.random.RandomState(self.random_state)
        self.w = rgen.normal(loc = 0.0, scale = 0.01, size = 1 + X.shape[1])
        self.w = self.w.reshape(1 + X.shape[1], 1)
        self.costs = []
        for n_iters in range(self.max_iter):
            y_pred = self.predict(X)
            diff = y - y_pred
            self.w[0] += self.eta * np.sum(diff)
            temp = np.dot(X.T, diff)
            temp = self.eta*temp
            self.w[1:] = self.w[1:] + temp
            cost = R2(y.reshape([1, y.shape[0]])[0], y_pred.reshape([1, y_pred.shape[0]])[0])
            self.costs.append(cost)

fish_data = pd.read_csv("fish.csv")
X = fish_data.to_numpy()[:, :5]
y = fish_data.to_numpy()[:, 5:6]

reg_GD = LinearRegression_GD(eta = 0.0000001, max_iter = 200, random_state = 22521005)
reg_GD.fit(X, y)
print(reg_GD.w)
print(reg_GD.costs)
reg_NE = LinearRegression()
reg_NE.fit(X, y)
w_NE = np.append(reg_NE.intercept_, reg_NE.coef_) 
print(w_NE)

[[-2.76400715e+106]
 [-1.97832444e+109]
 [-9.49997108e+107]
 [-1.02686659e+108]
 [-1.12596779e+108]
 [-3.24057081e+107]]
[-56.81861797773755, -731.089960916891, -9402.600671367747, -120921.58409245088, -1555097.342379451, -19999131.489559673, -257196275.24031278, -3307639823.961621, -42537479173.85374, -547047813764.09656, -7035238485108.369, -90475785291564.73, -1163552272101703.0, -1.4963715269783432e+16, -1.924389475607316e+17, -2.474836487504081e+18, -3.1827318313246405e+19, -4.093111589906764e+20, -5.263893841931479e+21, -6.769563392176005e+22, -8.705910471757198e+23, -1.1196124883009448e+25, -1.4398633296609377e+26, -1.8517178307366425e+27, -2.381378047509172e+28, -3.062540799157609e+29, -3.938541449273527e+30, -5.065110888290013e+31, -6.513921115494734e+32, -8.377144989458057e+33, -1.0773320236788335e+35, -1.3854890785637662e+36, -1.7817905201263418e+37, -2.2914489235116575e+38, -2.9468885987172147e+39, -3.789808415166854e+40, -4.8738346708869785e+41, -6.267932781001582e+42, -8.

# Insurance Dataset

💰💰 **Dataset Information (insurance.csv)** 💰💰
- age: Age of primary beneficiary
- sex: Insurance contractor gender, female / male
- bmi: Body mass index, providing an understanding of body, weights that are relatively high or low relative to
- children: Number of children covered by health insurance / Number of dependents
- smoker: Smoker / Non-smoker
- region: The beneficiary's residential area in the US, northeast, southeast, southwest, northwest
- charges: Individual medical costs billed by health insurance

📚📚 **Problem Requirement** 📚📚

Use all remaining attributes to estimate the individual medical costs billed by health insurance ('**charges**')


**Note**: Try to make the Gradient Descent achieving the comparable performance to the Normal Equation model. ✌✌

**Hint**💡: Apply (suitable) techniques in the Data Preprocessing tutorial file.

In [2]:
# Code here


# Tesla Stock Dataset

📈📈 **Dataset Information (tesla_stock.csv)** 📈📈
- Open: The opening price of the stock
- High: The high price of that day
- Low: The low price of that day
- Close: The closed price of that day
- Volume: The amount of stocks traded during that day

📚📚 **Problem Requirement** 📚📚

Use all remaining attributes to predict the closing price of tesla stock ('**Close**')


**Note**: Try to make the Gradient Descent achieving the comparable performance to the Normal Equation model. ✌✌

**Hint**💡: Apply (suitable) techniques in the Data Preprocessing tutorial file.

In [3]:
# Code here
