<a href="https://colab.research.google.com/github/rosh4github/eportfolio/blob/main/RK_Unit03_Ex3_multiple_linear_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Source: Unit 3 - Machine Learning module - University of Essex Online (2024)

In [None]:
# pandas - library for data manipulation and analysis / toolbox for working with data that is organized into tables, spreadsheets
import pandas
import numpy as np

# scikit-learn - ML library
# linear_model module imported - contains tools for building linear regr. models
from sklearn import linear_model

# e.g. chef preparing a dish - pandas is the set of knives and cutting boards to prepare ingredients (data), linear_model is recipe for a dish (linear regr) which guides on how to combine ingredients to create the final dish (a predictive model)


In [None]:
# common practice to use df as a short form for DataFrame

# pandas.read_csv - function from the pandas library to read data from a CSV file (values separated by comma)
# "cars.csv" - argument passed to the function - unless mentioned, file assumed to be located in the same directory as this Python script
#df = pandas.read_csv("cars.csv")

# manual data entry, currently in string format
data = "Car,Model,Volume,Weight,CO2; Toyota,Aygo,1000,790,99; Mitsubishi,Space Star,1200,1160,95; Skoda,Citigo,1000,929,95; Fiat,500,900,865,90; Mini,Cooper,1500,1140,105; VW,Up!,1000,929,105; Skoda,Fabia,1400,1109,90; Mercedes,A-Class,1500,1365,92; Ford,Fiesta,1500,1112,98; Audi,A1,1600,1150,99; Hyundai,I20,1100,980,99; Suzuki,Swift,1300,990,101; Ford,Fiesta,1000,1112,99; Honda,Civic,1600,1252,94; Hundai,I30,1600,1326,97; Opel,Astra,1600,1330,97; BMW,1,1600,1365,99; Mazda,3,2200,1280,104; Skoda,Rapid,1600,1119,104; Ford,Focus,2000,1328,105; Ford,Mondeo,1600,1584,94; Opel,Insignia,2000,1428,99; Mercedes,C-Class,2100,1365,99; Skoda,Octavia,1600,1415,99; Volvo,S60,2000,1415,99; Mercedes,CLA,1500,1465,102; Audi,A4,2000,1490,104; Audi,A6,2000,1725,114; Volvo,V70,1600,1523,109; BMW,5,2000,1705,114; Mercedes,E-Class,2100,1605,115; Volvo,XC70,2000,1746,117; Ford,B-Max,1600,1235,104; BMW,2,1600,1390,108; Opel,Zafira,1600,1405,109; Mercedes,SLK,2500,1395,120"

# 1. Splitting data into rows using the semicolon as a delimiter
rows = data.split('; ')

# 2. Further splitting each row into elements using comma as a delimiter
# creates a list of lists representing the data
data_list = [row.split(',') for row in rows]

# 3. Creating a NumPy array
# access elements using indexing, data_array[0,0] for the first element
data_array = np.array(data_list)

# 4. Creating a DataFrame
# data_array[1:, :] - selects rows starting from index 1 (2nd row) to the end, skips header row; : selects all columns
# columns=data_array[0,:] - selects column names in the first / header row
df = pandas.DataFrame(data_array[1:, :], columns=data_array[0, :])
df

Unnamed: 0,Car,Model,Volume,Weight,CO2
0,Toyota,Aygo,1000,790,99
1,Mitsubishi,Space Star,1200,1160,95
2,Skoda,Citigo,1000,929,95
3,Fiat,500,900,865,90
4,Mini,Cooper,1500,1140,105
5,VW,Up!,1000,929,105
6,Skoda,Fabia,1400,1109,90
7,Mercedes,A-Class,1500,1365,92
8,Ford,Fiesta,1500,1112,98
9,Audi,A1,1600,1150,99


In [None]:
#
X = df[['Weight', 'Volume']]
y = df['CO2']

regr = linear_model.LinearRegression()
regr.fit(X, y)

Predict the CO2 emission of a car where the weight is 2300kg, and the volume is 1300cm cube:

In [None]:
predictedCO2 = regr.predict([[2300, 1300]])
print(predictedCO2)
# 107.2087328

###Coefficient
The coefficient is a factor that describes the relationship with an unknown variable. In this case, we can ask for the coefficient value of weight against CO2, and for volume against CO2. The answer(s) we get tells us what would happen if we increase, or decrease, one of the independent values.

In [None]:
print(regr.coef_)

[0.00755095 0.00780526]


The result array represents the coefficient values of weight and volume.

Weight: 0.00755095 Volume: 0.00780526

These values tell us that if the weight increase by 1kg, the CO2 emission increases by 0.00755095g.

And if the engine size (Volume) increases by 1 cm3, the CO2 emission increases by 0.00780526 g.

We have already predicted that if a car with a 1300cm3 engine weighs 2300kg, the CO2 emission will be approximately 107g.

What if we increase the weight with 1000kg (from 2300 to 3300) what will be the CO2 emission?

Ans: 107.2087328 + (1000 * 0.00755095) = 114.75968

In [None]:
predictedCO2 = regr.predict([[3300, 1300]])
print(predictedCO2)
# 114.75968007