## Linear Regression For Cars Co2 Emission

### Objective
A hands-on exercise for following:
- Use **scikit-learn** to implement simple linear regression
- Create, train, and test a **linear regression** model on real data

### Use Case

 Using simple linear regression on fuel consumption dataset which contains model-specific fuel consumption ratings, we will be estimating Carbon Dioxide (CO2) emission for new light vehicles for retail car sales in Canada.  We will be using open source dataset [FuelConsumptionsCo2.csv](files/FuelConsumptionsCo2.csv) from https://open.canada.ca/data/en/dataset/98f1a129-f628-4ce4-b24d-6f16bf24dd64

### Import
Check and install the following packages:
- NumPy
- Matplotlib
- Pandas
- Scikit-learn

In [None]:
!pip install numpy
!pip install pandas
!pip install scikit-learn
!pip install matplotlib

Use the imported libraries

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

### Load the Data
The dataset we will be using will be the file [FuelConsumptionsCo2.csv](files/FuelConsumptionsCo2.csv). We will simply read the file using pandas

In [5]:
df = pd.read_csv("./FuelConsumptionCo2.csv")

In [6]:
# Verify the file load by loading some sample records
df.sample(10)

Unnamed: 0,MODELYEAR,MAKE,MODEL,VEHICLECLASS,ENGINESIZE,CYLINDERS,TRANSMISSION,FUELTYPE,FUELCONSUMPTION_CITY,FUELCONSUMPTION_HWY,FUELCONSUMPTION_COMB,FUELCONSUMPTION_COMB_MPG,CO2EMISSIONS
376,2014,FORD,F150 FFV,PICKUP TRUCK - STANDARD,3.7,6,A6,E,18.8,13.7,16.5,17,264
69,2014,BENTLEY,CONTINENTAL GT,SUBCOMPACT,6.0,12,AS8,Z,18.8,11.5,15.5,18,356
462,2014,GMC,SIERRA 4WD,PICKUP TRUCK - STANDARD,6.2,8,A6,Z,16.4,11.7,14.3,20,329
763,2014,MERCEDES-BENZ,ML 350 4MATIC FFV,SUV - STANDARD,3.5,6,AS7,E,17.8,13.8,16.0,18,256
479,2014,GMC,YUKON XL 4WD,SUV - STANDARD,5.3,8,A6,X,16.0,11.1,13.8,20,317
593,2014,JEEP,PATRIOT 4X4,SUV - SMALL,2.4,4,A6,X,11.4,8.7,10.2,28,235
408,2014,FORD,FOCUS SFE FFV,COMPACT,2.0,4,A6,E,11.6,8.3,10.1,28,162
398,2014,FORD,FLEX AWD,SUV - STANDARD,3.5,6,AS6,X,13.7,10.2,12.1,23,278
1064,2014,VOLVO,XC70 AWD,SUV - SMALL,3.0,6,AS6,X,13.4,9.8,11.8,24,271
120,2014,BMW,M6,COMPACT,4.4,8,AM7,Z,17.3,11.5,14.7,19,338


### Understand the data
With reference to the website [https://open.canada.ca](https://open.canada.ca/data/en/dataset/98f1a129-f628-4ce4-b24d-6f16bf24dd64),
FuelConsumption data has following fields:
 - MODEL YEAR e.g. 2014
 - MAKE e.g. VOLVO
 - MODEL e.g. S60 AWD
 - VEHICLE CLASS e.g. COMPACT
 - ENGINE SIZE e.g. 3.0
 - CYLINDERS e.g 6
 - TRANSMISSION e.g. AS6
 - FUEL TYPE e.g. Z
 - FUEL CONSUMPTION in CITY(L/100 km) e.g. 13.2
 - FUEL CONSUMPTION in HWY (L/100 km) e.g. 9.5
 - FUEL CONSUMPTION COMBINED (L/100 km) e.g. 11.5
 - FUEL CONSUMPTION COMBINED MPG (MPG) e.g. 25
 - CO2 EMISSIONS (g/km) e.g. 182                                                
We will be creating a simple linear regression model by simply using one of these features to predict the CO2 emission of unobserved cars based on that feature

### Explore the data
A statistical summary of data

In [8]:
df.describe()

Unnamed: 0,MODELYEAR,ENGINESIZE,CYLINDERS,FUELCONSUMPTION_CITY,FUELCONSUMPTION_HWY,FUELCONSUMPTION_COMB,FUELCONSUMPTION_COMB_MPG,CO2EMISSIONS
count,1067.0,1067.0,1067.0,1067.0,1067.0,1067.0,1067.0,1067.0
mean,2014.0,3.346298,5.794752,13.296532,9.474602,11.580881,26.441425,256.228679
std,0.0,1.415895,1.797447,4.101253,2.79451,3.485595,7.468702,63.372304
min,2014.0,1.0,3.0,4.6,4.9,4.7,11.0,108.0
25%,2014.0,2.0,4.0,10.25,7.5,9.0,21.0,207.0
50%,2014.0,3.4,6.0,12.6,8.8,10.9,26.0,251.0
75%,2014.0,4.3,8.0,15.55,10.85,13.35,31.0,294.0
max,2014.0,8.4,12.0,30.2,20.5,25.8,60.0,488.0
