## Install kaggle by writing the below command

In [None]:
pip install kaggle



##Download the particular Dataset from Kaggle that you want to predict

In [None]:
!kaggle datasets download -d farhanmd29/50-startups

Dataset URL: https://www.kaggle.com/datasets/farhanmd29/50-startups
License(s): other
Downloading 50-startups.zip to /content
  0% 0.00/1.30k [00:00<?, ?B/s]
100% 1.30k/1.30k [00:00<00:00, 2.22MB/s]


##Access the zip file

In [None]:
from zipfile import ZipFile
dataset = "/content/50-startups.zip"

with ZipFile(dataset, 'r') as zip:
  zip.extractall()
  print('done')

done


#**Import libraries**

In [None]:
import pandas as pd
import numpy as np
import sklearn.datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
import seaborn as sns
from sklearn.metrics import r2_score

###Initialization

The code assigns the CSV dataset to the variable List and then converts it into a DataFrame using pd.DataFrame(). Two sets of independent variables are created: X_multiple, which includes 'R&D Spend', 'Administration', and 'Marketing Spend', and X_single, which includes only 'R&D Spend'. The dependent variable 'Profit' is stored in the y set. Finally, the first few rows of the DataFrame are printed using df.head().

In [None]:
List = pd.read_csv(dataset)

df= pd.DataFrame(List)
X_multiple = df[['R&D Spend','Administration','Marketing Spend']].values
X_single = df[['R&D Spend']].values
y = df['Profit'].values
print(df.head())

   R&D Spend  Administration  Marketing Spend       State     Profit
0  165349.20       136897.80        471784.10    New York  192261.83
1  162597.70       151377.59        443898.53  California  191792.06
2  153441.51       101145.55        407934.54     Florida  191050.39
3  144372.41       118671.85        383199.62    New York  182901.99
4  142107.34        91391.77        366168.42     Florida  166187.94


###Standardization

Standardization in regression involves transforming the input features to have a mean of 0 and a standard deviation of 1. In this code, the StandardScaler is used to standardize the training and testing datasets for both multiple regression (X_train_m, X_test_m) and single regression (X_train_s, X_test_s).



In [None]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Append bias term (for intercept)
X_scaled = np.c_[np.ones(X_scaled.shape[0]), X_scaled]


###Splitting

The code splits the dataset into training and testing sets, which is a common step in preparing data for model training and evaluation. The dataset is split twice: once for multiple regression (X_multiple and y) and once for single regression (X_single and y). In both cases, 20% of the data is set aside for testing (test_size=0.2), and the random splitting is controlled by a fixed seed (random_state=0).

In [None]:
X_train_m,X_test_m,Y_train_m,Y_test_m = train_test_split(X_multiple,y,test_size=0.2,random_state=0)
X_train_s,X_test_s,Y_train_s,Y_test_s = train_test_split(X_single,y,test_size=0.2,random_state=0)

In [None]:
print(X_test_m.std())
print(X_test_s.std())

1.071097610672191
0.8941313431851632



###Training  the regression model

Two models are trained: reg_multiple for multiple regression using the standardized training data (X_train_m and Y_train_m), and reg_single for single regression using the standardized single feature training data (X_train_s and Y_train_s). The fit method trains each model by finding the best-fitting line to predict the target variable Y_train_m and Y_train_s from the input features.


In [None]:
from sklearn.linear_model import LinearRegression
reg_multiple = LinearRegression()
reg_multiple.fit(X_train_m, Y_train_m)
reg_single = LinearRegression()
reg_single.fit(X_train_s, Y_train_s)

###Predicting the Test set results  for both Multiple and single linear regression



In [None]:

y_pred_multiple = reg_multiple.predict(X_test_m)
y_pred_single = reg_single.predict(X_test_s)


##computing R-squared value and Accuracy for both multiple and single linear regression

In [None]:

r2_m = r2_score(Y_test_m, y_pred_multiple)
print("R-squared value for multiple linear regression:", r2_m)
r2_s= r2_score(Y_test_s, y_pred_single)
print("R-squared value for single linear regression:", r2_s)

acc_m = r2_m * 100
print("Accuracy for multiple linear regression:", acc_m)
acc_s = r2_s * 100
print("Accuracy for single linear regression:", acc_s)

R-squared value for multiple linear regression: 0.9393955917820571
R-squared value for single linear regression: 0.9464587607787219
Accuracy for multiple linear regression: 93.93955917820571
Accuracy for single linear regression: 94.64587607787219
