# Linear Regression Test

## Part 1: Testing against sklearn library

In [1]:
from sklearn.datasets import fetch_california_housing

In [2]:
print(fetch_california_housing()['DESCR'])

.. _california_housing_dataset:

California Housing dataset
--------------------------

**Data Set Characteristics:**

    :Number of Instances: 20640

    :Number of Attributes: 8 numeric, predictive attributes and the target

    :Attribute Information:
        - MedInc        median income in block group
        - HouseAge      median house age in block group
        - AveRooms      average number of rooms per household
        - AveBedrms     average number of bedrooms per household
        - Population    block group population
        - AveOccup      average number of household members
        - Latitude      block group latitude
        - Longitude     block group longitude

    :Missing Attribute Values: None

This dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html

The target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).

This dataset was derived

In [3]:
X = fetch_california_housing()['data']
y = fetch_california_housing()['target']

In [4]:
from sklearn.model_selection import train_test_split

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

In [6]:
from sklearn.linear_model import LinearRegression

In [7]:
linreg = LinearRegression()

linreg.fit(X_train, y_train)

LinearRegression()

In [8]:
print("Beta:", linreg.coef_)
print("Intercept:", linreg.intercept_)

Beta: [ 4.36558720e-01  9.31540311e-03 -1.04367081e-01  6.13168787e-01
 -6.68355342e-07 -3.23110109e-03 -4.24158639e-01 -4.38966638e-01]
Intercept: -37.35570783483359


In [9]:
y_test_pred = linreg.predict(X_test)

In [10]:
from sklearn.metrics import mean_squared_error, r2_score
from numpy import sqrt
mse = mean_squared_error(y_test, y_test_pred)
rmse = sqrt(mse)
r2 = r2_score(y_test, y_test_pred)
print("RMSE:", rmse)
print("R²:", r2)

RMSE: 0.7300197003825873
R²: 0.5999041801167233


## Part 2: Testing against native library

In [11]:
from data_science.machine_learning import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, 0.33) 

In [12]:
from data_science import Linear_Regression
lr = Linear_Regression()

In [16]:
lr.fit(X_train, y_train, learning_rate=0.0000001, num_steps=500)

Ridge Loss Fit: 100%|████████████████████████████████████████████████████████████████| 500/500 [02:37<00:00,  3.17it/s]


In [17]:
print("Beta:", lr.beta[1:])
print("Intercept:", lr.beta[0])

Beta: [0.012795467440709538, -0.04688082446820239, 0.21702422051702747, -0.00021427691149083033, -0.006640697776360565, -0.11668155352214246, -0.035741057524696834]
Intercept: 0.4360839943078677


In [18]:
y_test_pred = lr.predict(X_test)

In [19]:
#mse2 = lr.mse(y_test, y_test_pred)
#rmse2 = sqrt(mse2)
r2_2 = lr.r2()
#print("RMSE:", rmse2)
print("R²:", r2_2)

R²: 0.4426256252733832
