<a href="https://colab.research.google.com/github/pspvv/ML_Workshop/blob/main/50SU_Linear_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**The dataset contains the following columns:**

**R&D Spend: Investment in Research and Development.**

**Administration: Expenses for administrative activities.**

**Marketing Spend: Expenditure on marketing.**

**State: Location of the startup (categorical data).**

**Profit: The target variable**

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

**I'll proceed with the following steps:**

**Preprocess the data (e.g., encode categorical variables).**

In [2]:
# Create DataFrame
df = pd.read_csv('/content/50_Startups.csv')
df.sample(5)
df = pd.get_dummies(df, columns=["State"])
df.columns

Index(['R&D Spend', 'Administration', 'Marketing Spend', 'Profit',
       'State_California', 'State_Florida', 'State_New York'],
      dtype='object')

**Separating the inputs and the target column as X and y**

In [10]:
X = df.drop("Profit",axis=1).values
y = df["Profit"].values

In [21]:
X.shape, y.shape

((50, 6), (50,))

In [22]:
y = y.reshape(-1,1)

In [24]:
lr = LinearRegression()
lr.fit(X, y)

In [25]:
y_pred=lr.predict(X)

In [26]:
# Calculate metrics
print("MAE", mean_absolute_error(y, y_pred))
print("MSE", mean_squared_error(y, y_pred))
print("RMSE", np.sqrt(mean_squared_error(y, y_pred)))
print("R2 Score", r2_score(y, y_pred))

MAE 6475.500708609337
MSE 78406792.88803764
RMSE 8854.761029414494
R2 Score 0.9507524843355148


In [27]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [28]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((40, 6), (10, 6), (40, 1), (10, 1))

**Build a multiple linear regression model (without regularization).**

In [30]:
# 1. Multiple Linear Regression (No Regularization)
lr = LinearRegression()
lr.fit(X_train, y_train)

In [31]:
y_pred_lr = lr.predict(X_test)

In [32]:
# Calculate metrics
print("MAE", mean_absolute_error(y_test, y_pred_lr))
print("MSE", mean_squared_error(y_test, y_pred_lr))
print("RMSE", np.sqrt(mean_squared_error(y_test, y_pred_lr)))
print("R2 Score", r2_score(y_test, y_pred_lr))

MAE 6961.477813250247
MSE 82010363.04423548
RMSE 9055.957323454848
R2 Score 0.8987266414329447


In [33]:
# 2. L1 Regularization (Lasso)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

  model = cd_fast.enet_coordinate_descent(


In [34]:
y_pred_lasso = lasso.predict(X_train)

In [35]:
# Calculate metrics on trained data
print("MAE", mean_absolute_error(y_train, y_pred_lasso))
print("MSE", mean_squared_error(y_train, y_pred_lasso))
print("RMSE", np.sqrt(mean_squared_error(y_train, y_pred_lasso)))
print("R2 Score", r2_score(y_train, y_pred_lasso))

MAE 6662.606863700654
MSE 79700060.17158064
RMSE 8927.489018283957
R2 Score 0.9537019994731595


In [36]:
y_pred_lasso2 = lasso.predict(X_test)

In [37]:
# Calculate metrics on test data
print("MAE", mean_absolute_error(y_test, y_pred_lasso2))
print("MSE", mean_squared_error(y_test, y_pred_lasso2))
print("RMSE", np.sqrt(mean_squared_error(y_test, y_pred_lasso2)))
print("R2 Score", r2_score(y_test, y_pred_lasso2))

MAE 6961.489359153961
MSE 82009534.96788244
RMSE 9055.911603360672
R2 Score 0.898727664011925


In [38]:
lr.score(X, y)

0.9496499582724438

In [39]:
# 3. L2 Regularization (Ridge)
ridge = Ridge(alpha=0.1)
ridge.fit(X_train, y_train)

In [40]:
y_pred_ridge = ridge.predict(X_train)

In [41]:
# Calculate metrics on train data
print("MAE", mean_absolute_error(y_train, y_pred_ridge))
print("MSE", mean_squared_error(y_train, y_pred_ridge))
print("RMSE", np.sqrt(mean_squared_error(y_train, y_pred_ridge)))
print("R2 Score", r2_score(y_train, y_pred_ridge))

MAE 6662.094665899252
MSE 79700071.2305449
RMSE 8927.48963766102
R2 Score 0.9537019930489744


In [42]:
y_pred_ridge2 = ridge.predict(X_test)

In [43]:
# Calculate metrics on test data
print("MAE", mean_absolute_error(y_test, y_pred_ridge2))
print("MSE", mean_squared_error(y_test, y_pred_ridge2))
print("RMSE", np.sqrt(mean_squared_error(y_test, y_pred_ridge2)))
print("R2 Score", r2_score(y_test, y_pred_ridge2))

MAE 6961.614185693829
MSE 82000822.36490287
RMSE 9055.430545529178
R2 Score 0.8987384230737417


In [44]:
##By default ridge regression score method gives r^2 score
lr.score(X_test, y_test)

0.8987266414329447