# **Linear Regression**

https://realpython.com/linear-regression-in-python/ </br>
https://www.sfu.ca/~mjbrydon/tutorials/BAinPy/09_regression.html </br>
https://www.xlstat.com/en/solutions/features/linear-regression </br>
https://www.datacamp.com/tutorial/simple-linear-regression </br>
https://www.w3schools.com/python/python_ml_multiple_regression.asp </br>
https://www.w3schools.com/python/python_ml_polynomial_regression.asp


# **Simple linear regression** *helps make predictions and understand relationships between one independent variable and one dependent variable. For example, you might want to know how a tree’s height (independent variable) affects the number of leaves it has (dependent variable). By collecting data and fitting a simple linear regression model, you could predict the number of leaves based on the tree's height.*

Note: A distinction is usually made between simple regression(with only one explanatory variable) and multiple regression (several explanatory variables) although the overall concept and calculation methods are identical.

In [2]:
#Import libraries

import statsmodels.api as sm
import pandas as pd

In [4]:
con = pd.read_csv('concrete_data.csv')
con.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [8]:
Y = con['Strength']
X = con['Fly Ash']
X.head()

Unnamed: 0,Fly Ash
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0


In [9]:
#Adding a column for the constant

X = sm.add_constant(X)
X.head(3)

Unnamed: 0,const,Fly Ash
0,1.0,0.0
1,1.0,0.0
2,1.0,0.0


In [10]:
#Running the Model

model = sm.OLS(Y, X, missing='drop')
model_result = model.fit()
model_result.summary()

0,1,2,3
Dep. Variable:,Strength,R-squared:,0.011
Model:,OLS,Adj. R-squared:,0.01
Method:,Least Squares,F-statistic:,11.63
Date:,"Mon, 10 Feb 2025",Prob (F-statistic):,0.000675
Time:,21:01:39,Log-Likelihood:,-4355.4
No. Observations:,1030,AIC:,8715.0
Df Residuals:,1028,BIC:,8725.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,37.3139,0.679,54.978,0.000,35.982,38.646
Fly Ash,-0.0276,0.008,-3.410,0.001,-0.043,-0.012

0,1,2,3
Omnibus:,29.013,Durbin-Watson:,0.848
Prob(Omnibus):,0.0,Jarque-Bera (JB):,27.218
Skew:,0.351,Prob(JB):,1.23e-06
Kurtosis:,2.625,Cond. No.,110.0


### *Multiple Linear Regression With scikit-learn*

*Multiple regression is like linear regression, but with more than one independent value, meaning that we try to predict a value based on two or more variables.*

In [135]:
car = pd.read_csv('car_data.csv')
car.head(3)

Unnamed: 0,Car,Model,Volume,Weight,CO2
0,Toyoty,Aygo,1000,790,99
1,Mitsubishi,Space Star,1200,1160,95
2,Skoda,Citigo,1000,929,95


In [136]:
X_car = car[['Weight', 'Volume']]
y_car = car['CO2']

In [137]:
#Import Library

from sklearn import linear_model

In [138]:
regr = linear_model.LinearRegression()
regr.fit(X_car, y_car)

In [139]:
predictedCO2 = regr.predict([[3300, 1300]])
print(predictedCO2)

[114.75968007]




In [140]:
#Get results

car_r_sq = regr.score(X_car, y_car)
car_intercept, car_coefficients = regr.intercept_, regr.coef_

In [141]:
print(f"coefficient of determination: {car_r_sq}")

print(f"intercept: {car_intercept}")

print(f"intercept: {car_coefficients}")

coefficient of determination: 0.3765564043619989
intercept: 79.69471929115939
intercept: [0.00755095 0.00780526]


# *Polynomial Regression With scikit-learn*

*Polynomial regression, like linear regression, uses the relationship between the variables x and y to find the best way to draw a line through the data points.*

In [116]:
#Import relevant libraries

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures

In [157]:
#Provide data

poly_X = [
   [0, 1], [5, 1], [15, 2], [25, 5], [35, 11], [45, 15], [55, 34], [60, 35]
 ]
poly_y = [4, 5, 20, 14, 32, 22, 38, 43]

poly_x, poly_y = np.array(poly_X), np.array(poly_y)

In [158]:
#Transform input data

poly_x_ = PolynomialFeatures(degree=2, include_bias=False).fit_transform(poly_x)

In [159]:
#Create a model and fit it

poly_model = LinearRegression().fit(poly_x_, poly_y)

In [160]:
#Get results

r_sq = poly_model.score(poly_x_, poly_y)
intercept_, coefficients_ = poly_model.intercept_, poly_model.coef_

In [162]:
#Predict response

y_pred = poly_model.predict(poly_x_)

In [163]:
print(f"coefficient of determination: {r_sq}")

print(f"intercept: {intercept}")

print(f"intercept: {coefficients}")

print(f"intercept: {y_pred}")

coefficient of determination: 0.9453701449127822
intercept: 0.8430556452395876
intercept: [ 2.44828275  0.16160353 -0.15259677  0.47928683 -0.4641851 ]
intercept: [ 0.54047408 11.36340283 16.07809622 15.79139    29.73858619 23.50834636
 39.05631386 41.92339046]
