# CO2 Emission by Vehicals

## <font color='red'> Business Objective </font>
- ***The fundamental goal here is to model the CO2 emissions as a function of several car engines features.***

## <font color='red'> Data Set Details </font>
 
#### ***The file contains the data for this example. Here the number of variables (columns) is 12, and the number of instances (rows) is 7385. In that way, this problem has the 12 following variables:***
- ***make, car brand under study.***
- ***model, the specific model of the car.***
- ***vehicle_class, car body type of the car.***
- ***engine_size, size of the car engine, in Litres.***
- ***cylinders, number of cylinders.***
- ***transmission, "A" for 'Automatic', "AM" for 'Automated manual', "AS" for 'Automatic with select shift', "AV" for 'Continuously variable', "M" for 'Manual'.***
- ***fuel_type, "X" for 'Regular gasoline', "Z" for 'Premium gasoline', "D" for 'Diesel', "E" for 'Ethanol (E85)', "N" for 'Natural gas'.***
- ***fuel_consumption_city, City fuel consumption ratings, in litres per 100 kilometres.***
- ***fuel_consumption_hwy, Highway fuel consumption ratings, in litres per 100 kilometres.***
- ***fuel_consumption_comb(l/100km), the combined fuel consumption rating (55% city, 45% highway), in L/100 km.***
- ***fuel_consumption_comb(mpg), the combined fuel consumption rating (55% city, 45% highway), in miles per gallon (mpg).***
- ***co2_emissions, the tailpipe emissions of carbon dioxide for combined city and highway driving, in grams per kilometer.***


### <font color='green'> Importing some important libraries </font>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

## <font color='green'> EDA (Exploratory Data Analysis)</font>

In [None]:
# storing the data into a variable
df = pd.read_csv('co2 Emissions.csv')

In [None]:
#check first 10 records
df.head(10)

In [None]:
#check last 10 records
df.tail()

In [None]:
#checking the null values
df.isnull().sum()

##### ***We can say there is no null values is present in out data set***

In [None]:
#collecting the information about the data set
df.info()

In [None]:
#checking the shape of the data set
df.shape

In [None]:
df.describe().T

### <font color='green'> Brands of Cars </font>

In [None]:
print("We have total",len(df['Make'].unique()),"Car Companies Data")
df_brand = df['Make'].value_counts().reset_index().rename(columns={'count':'Count'})
df_brand.head(20)

In [None]:
plt.figure(figsize=(20,6))
figure1 = sns.barplot(data = df_brand, x = "Make",  y= "Count")
plt.xticks(rotation = 75)
plt.title("All Car Companies and their Cars")
plt.xlabel("Companies")
plt.ylabel("Cars")
plt.bar_label(figure1.containers[0])
plt.show()

### <font color=green> Models of cars </font>

In [None]:
print("We have total",len(df['Model'].unique()),"Car Models")
df_model = df['Model'].value_counts().reset_index().rename(columns={'count':'Count'})[:25]
df_model.head(20)

In [None]:
plt.figure(figsize=(20,6))
figure2 = sns.barplot(data = df_model, x = "Model",  y= "Count")
plt.xticks(rotation = 75)
plt.title("Top 25 Car Models")
plt.xlabel("Models")
plt.ylabel("Cars")
plt.bar_label(figure2.containers[0])
plt.show()

### <font color=green> Vehical Class </font>

In [None]:
print("We have total",len(df['Vehicle Class'].unique()),"Vehicle Class")
df_vehicle_class = df['Vehicle Class'].value_counts().reset_index().rename(columns={'count':'Count'})
df_vehicle_class

In [None]:
plt.figure(figsize=(20,5))
figure4 = sns.barplot(data = df_vehicle_class, x = "Vehicle Class",  y= "Count")
plt.xticks(rotation = 75)
plt.title("All Vehicle Class")
plt.xlabel("Vehicle Class")
plt.ylabel("Cars")
plt.bar_label(figure4.containers[0])
plt.show()

### <font color='green'> Engine Sizes of cars</font>

In [None]:
print("We have total",len(df['Engine Size(L)'].unique()),"Types of Engine Size")
df_engine_size = df['Engine Size(L)'].value_counts().reset_index().rename(columns={'count':'Count'})
df_engine_size.head(20)

In [None]:
plt.figure(figsize=(20,6))
figure5 = sns.barplot(data = df_engine_size, x = "Engine Size(L)",  y= "Count")
plt.xticks(rotation = 90)
plt.title("All Engine Sizes")
plt.xlabel("Engine Size(L)")
plt.ylabel("Cars")
plt.bar_label(figure5.containers[0])
plt.show()

### <font color='green'> Cylinders </font>

In [None]:
print("We have total",len(df['Cylinders'].unique()),"Types of Cylinders")
df_cylinders = df['Cylinders'].value_counts().reset_index().rename(columns={'count':'Count'})
df_cylinders.head(20)

In [None]:
plt.figure(figsize=(20,6))
figure6 = sns.barplot(data = df_cylinders, x = "Cylinders",  y= "Count")
plt.xticks(rotation = 90)
plt.title("All Cylinders")
plt.xlabel("Cylinders")
plt.ylabel("Cars")
plt.bar_label(figure6.containers[0])
plt.show()

### <font color='green'> Transmission of Cars </font>

In [None]:
df['Transmission'].unique()

### ***Here we have to map similar labels into a single label for our Transmission column*** 

In [None]:
df["Transmission"] = np.where(df["Transmission"].isin(["A4", "A5", "A6", "A7", "A8", "A9", "A10"]), "Automatic", df["Transmission"])
df["Transmission"] = np.where(df["Transmission"].isin(["AM5", "AM6", "AM7", "AM8", "AM9"]), "Automated Manual", df["Transmission"])
df["Transmission"] = np.where(df["Transmission"].isin(["AS4", "AS5", "AS6", "AS7", "AS8", "AS9", "AS10"]), "Automatic with Select Shift", df["Transmission"])
df["Transmission"] = np.where(df["Transmission"].isin(["AV", "AV6", "AV7", "AV8", "AV10"]), "Continuously Variable", df["Transmission"])
df["Transmission"] = np.where(df["Transmission"].isin(["M5", "M6", "M7"]), "Manual", df["Transmission"])

In [None]:
print("We have total",len(df['Transmission'].unique()),"Transmissions")
df_transmission = df['Transmission'].value_counts().reset_index().rename(columns={'count':'Count'})
df_transmission

In [None]:
plt.figure(figsize=(20,5))
figure7 = sns.barplot(data = df_transmission, x = "Transmission",  y= "Count")
plt.title("All Transmissions")
plt.xlabel("Transmissions")
plt.ylabel("Cars")
plt.bar_label(figure7.containers[0])
plt.show()

### <font color='green'> Fuel Type of Cars </font>

In [None]:
df['Fuel Type'].unique()

### ***Here we have to map similar labels into a single label for our Fuel Type column*** 

In [None]:
df["Fuel Type"] = np.where(df["Fuel Type"]=="Z", "Premium Gasoline", df["Fuel Type"])
df["Fuel Type"] = np.where(df["Fuel Type"]=="X", "Regular Gasoline", df["Fuel Type"])
df["Fuel Type"] = np.where(df["Fuel Type"]=="D", "Diesel", df["Fuel Type"])
df["Fuel Type"] = np.where(df["Fuel Type"]=="E", "Ethanol(E85)", df["Fuel Type"])
df["Fuel Type"] = np.where(df["Fuel Type"]=="N", "Natural Gas", df["Fuel Type"])

In [None]:
print("We have total",len(df['Fuel Type'].unique()),"Fuel Types")
df_fuel_type = df['Fuel Type'].value_counts().reset_index().rename(columns={'count':'Count'})
df_fuel_type

In [None]:
plt.figure(figsize=(20,5))
figure8 = sns.barplot(data = df_fuel_type, x = "Fuel Type",  y= "Count")
plt.title("All Fuel Types")
plt.xlabel("Fuel Types")
plt.ylabel("Cars")
plt.bar_label(figure8.containers[0])
plt.show()

## ***Variation in CO2 emissions with different features***

### <font color='green'> CO2 Emission with Brand </font>

In [None]:
df_co2_make = df.groupby(['Make'])['CO2 Emissions(g/km)'].mean().sort_values().reset_index()

In [None]:
plt.figure(figsize=(20,5))
figure9 = sns.barplot(data = df_co2_make, x = "Make",  y= "CO2 Emissions(g/km)")
plt.xticks(rotation = 90)
plt.title("CO2 Emissions variation with Brand")
plt.xlabel("Brands")
plt.ylabel("CO2 Emissions(g/km)")
plt.bar_label(figure9.containers[0], fontsize=7, fmt='%.1f')
plt.show()

In [None]:
plt.figure(figsize=(20,7))
order = df.groupby("Make")["CO2 Emissions(g/km)"].median().sort_values(ascending=True).index
sns.boxplot(x="Make", y="CO2 Emissions(g/km)", data=df, order=order, width=0.5)
plt.title("Distribution of CO2 Emissions in relation to Make", fontsize=15)
plt.xticks(rotation=90, horizontalalignment='center')
plt.xlabel("Make", fontsize=12)
plt.ylabel("CO2 Emissions(g/km)", fontsize=12)
plt.axhline(df["CO2 Emissions(g/km)"].median(),color='r',linestyle='dashed',linewidth=1)
plt.tight_layout()
plt.show()

### <font color='green'> CO2 Emissions variation with Vehicle Class </font>

In [None]:
df_co2_vehicle_class = df.groupby(['Vehicle Class'])['CO2 Emissions(g/km)'].mean().sort_values().reset_index()

In [None]:
plt.figure(figsize=(23,5))
figure10 = sns.barplot(data = df_co2_vehicle_class, x = "Vehicle Class",  y= "CO2 Emissions(g/km)")
plt.xticks(rotation = 90)
plt.title("CO2 Emissions variation with Vehicle Class")
plt.xlabel("Vehicle Class")
plt.ylabel("CO2 Emissions(g/km)")
plt.bar_label(figure10.containers[0], fontsize=7)
plt.show()

In [None]:
plt.figure(figsize=(20,7))
order = df.groupby("Vehicle Class")["CO2 Emissions(g/km)"].median().sort_values(ascending=True).index
sns.boxplot(x="Vehicle Class", y="CO2 Emissions(g/km)", data=df, order=order, width=0.5)
plt.title("Distribution of CO2 Emissions in relation to Make", fontsize=15)
plt.xticks(rotation=90, horizontalalignment='center')
plt.xlabel("Vehicle Class", fontsize=12)
plt.ylabel("CO2 Emissions(g/km)", fontsize=12)
plt.axhline(df["CO2 Emissions(g/km)"].median(),color='r',linestyle='dashed',linewidth=1)
plt.tight_layout()
plt.show()

### <font color='green'> CO2 Emissions variation with Transmission </font>

In [None]:
df_co2_transmission = df.groupby(['Transmission'])['CO2 Emissions(g/km)'].mean().sort_values().reset_index()

In [None]:
plt.figure(figsize=(23,5))
figure11 = sns.barplot(data = df_co2_transmission, x = "Transmission",  y= "CO2 Emissions(g/km)")
plt.xticks(rotation = 90)
plt.title("CO2 Emissions variation with Transmission")
plt.xlabel("Transmission")
plt.ylabel("CO2 Emissions(g/km)")
plt.bar_label(figure11.containers[0], fontsize=7)
plt.show()

In [None]:
plt.figure(figsize=(20,7))
order = df.groupby("Transmission")["CO2 Emissions(g/km)"].median().sort_values(ascending=True).index
sns.boxplot(x="Transmission", y="CO2 Emissions(g/km)", data=df, order=order, width=0.5)
plt.title("Distribution of CO2 Emissions in relation to Make", fontsize=15)
plt.xticks(rotation=90, horizontalalignment='center')
plt.xlabel("Transmission", fontsize=12)
plt.ylabel("CO2 Emissions(g/km)", fontsize=12)
plt.axhline(df["CO2 Emissions(g/km)"].median(),color='r',linestyle='dashed',linewidth=1)
plt.tight_layout()
plt.show()

### <font color='green'> CO2 Emissions variation with Fuel Type </font>

In [None]:
df_co2_fuel_type = df.groupby(['Fuel Type'])['CO2 Emissions(g/km)'].mean().sort_values().reset_index()

In [None]:
plt.figure(figsize=(23,5))
figure12 = sns.barplot(data = df_co2_fuel_type, x = "Fuel Type",  y= "CO2 Emissions(g/km)")
plt.xticks(rotation = 90)
plt.title("CO2 Emissions variation with Fuel Type")
plt.xlabel("Fuel Type")
plt.ylabel("CO2 Emissions(g/km)")
plt.bar_label(figure12.containers[0], fontsize=7)
plt.show()

In [None]:
plt.figure(figsize=(20,7))
order = df.groupby("Fuel Type")["CO2 Emissions(g/km)"].median().sort_values(ascending=True).index
sns.boxplot(x="Fuel Type", y="CO2 Emissions(g/km)", data=df, order=order, width=0.5)
plt.title("Distribution of CO2 Emissions in relation to Make", fontsize=15)
plt.xticks(rotation=90, horizontalalignment='center')
plt.xlabel("Fuel Type", fontsize=12)
plt.ylabel("CO2 Emissions(g/km)", fontsize=12)
plt.axhline(df["CO2 Emissions(g/km)"].median(),color='r',linestyle='dashed',linewidth=1)
plt.tight_layout()
plt.show()

### <font color=red>Conclusion Of the EDA </font>
#### ***1. There are total 42 types of car brand.***
#### ***2. There are total 2053 unique car model. These neither can be converted into any dummy variable nor it can be used for analysis. So we can drop this column.***
#### ***3. There are total 16 types of vehicle class basis on their gross vehicle weight rating (GVWR) and volume index. But there are no data available with exact GVWR or volume index value, so that we can categorise the similar vehicle into a same group.***
#### ***4. The 27 type of transmission has been clubed into 5 different transmission without taking the number of clutches into account, as they doesnot affect CO2 emissions.***
#### ***5. The 5 type of Fuel Types has been renamed so that it has some meaningful interpretation.***
#### ***6. We dont have too much data for Natural Gas. So we have to remove Natural Gas row from our data set***

## <font color='red'> DATA CLEANING </font>

### <font color=green> Correlation </font>

### ***We have to remove Natural Gass data from our data set. Because we can predict anything by only use one record.***

In [None]:
df_natural=df[df["Fuel Type"]=="Natural Gas"]
natural=df_natural.index
df_natural

In [None]:
# We have to remove Natural Gas from our data set
for i in natural:
    df.drop(i, axis = 0,inplace = True)

In [None]:
df[df["Fuel Type"]=="Natural Gas"]

In [None]:
df_check = df['Fuel Type'].value_counts().reset_index().rename(columns={'count':'Count'})
df_check

In [None]:
df.head()

### ***To check the correlation between our data we have to remove "Mode", "Make","Vehicle Class","Transmission","FuelType"***

In [None]:
df.drop(['Make','Model','Vehicle Class','Fuel Consumption City (L/100 km)','Fuel Consumption Hwy (L/100 km)','Transmission','Fuel Consumption Comb (mpg)'],inplace=True,axis=1)

In [None]:
df_correlation = df[['Engine Size(L)','Cylinders','Fuel Consumption Comb (L/100 km)','CO2 Emissions(g/km)']]
df_correlation

In [None]:
df_check = df['Fuel Type'].value_counts().reset_index().rename(columns={'count':'Count'})
df_check

In [None]:
df_correlation.corr().T

In [None]:
plt.figure(figsize = (23,10))
sns.heatmap(df_correlation.corr(), annot = True)
plt.show()

In [None]:
sns.pairplot(df_correlation)

In [None]:
plt.figure(figsize = (20,10))
for i in enumerate(df_correlation):
    plt.subplot(2,4,i[0]+1)
    plt.title(i[1])
    plt.boxplot(df_correlation[i[1]])

In [None]:
#removing the outliers
df_new = df_correlation[(np.abs(stats.zscore(df_correlation)) < 1.9).all(axis=1)]

In [None]:
print("The length of the original : " , len(df))
print("The length after removing the outliers : " , len(df_new))
print("We just Removed",len(df)-len(df_new),"Outliers")

In [None]:
plt.figure(figsize = (20,10))
for i in enumerate(df_new):
    plt.subplot(2,4,i[0]+1)
    plt.title(i[1])
    plt.boxplot(df_new[i[1]])

In [None]:
# with outliers
df.describe().T

In [None]:
# without outliers
df_new.describe().T

### <font color='green'> Sample Frame </font>

In [None]:
sample_df=df_new.sample(n=200,random_state=35)
sample_df

In [None]:
indexs=sample_df.index
indexs

In [None]:
# we have to drop the sample dataframes
for i in indexs:
    df_new.drop(i, axis = 0,inplace = True)

In [None]:
df_new

In [None]:
sample_df_Xtest=sample_df.drop(['CO2 Emissions(g/km)'],axis=1)
sample_df_ytest=sample_df["CO2 Emissions(g/km)"]

In [None]:
new=sample_df_Xtest.astype(np.float32)
# y=y.astype(np.float32)
sample_df_Xtest = (new - np.min(new)) / (np.max(new) - np.min(new))
sample_df_Xtest["Engine Size(L)"]=sample_df_Xtest["Engine Size(L)"].map(lambda x:round(x,2))
sample_df_Xtest["Cylinders"]=sample_df_Xtest["Cylinders"].map(lambda x:round(x,2))
sample_df_Xtest["Fuel Consumption Comb (L/100 km)"]=sample_df_Xtest["Fuel Consumption Comb (L/100 km)"].map(lambda x:round(x,2))
sample_df_Xtest

### <font color='green'> Normalize </green>

In [None]:
X = df_new.drop(['CO2 Emissions(g/km)'], axis= 1).astype(np.float32)
y = df_new["CO2 Emissions(g/km)"].astype(np.float32)

In [None]:
# Normalize
X = (X - np.min(X)) / (np.max(X) - np.min(X))


In [None]:
X["Engine Size(L)"]=X["Engine Size(L)"].map(lambda x:round(x,2))
X["Cylinders"]=X["Cylinders"].map(lambda x:round(x,2))
X["Fuel Consumption Comb (L/100 km)"]=X["Fuel Consumption Comb (L/100 km)"].map(lambda x:round(x,2))
X

In [None]:
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.25, random_state=42)

print("X_train", X_train.shape)
print("y_train",y_train.shape)
print("X_test",X_test.shape)
print("y_test",y_test.shape)

### <font color='Green'>Linear Regression</font>

In [None]:
from sklearn.linear_model import LinearRegression
lm = LinearRegression()
model = lm.fit(X_train, y_train)
model.intercept_
model.coef_
from sklearn.metrics import mean_squared_error, r2_score
np.sqrt(mean_squared_error(y_train, model.predict(X_train)))
np.sqrt(mean_squared_error(y_test, model.predict(X_test)))
model.score(X_train, y_train)
lin_r2_score = cross_val_score(model, X_train, y_train, cv = 10, scoring = "r2").mean()
print("R2 score of Liner Regression is : ",lin_r2_score)

In [None]:
pred=model.predict(sample_df_Xtest)
frames = [pred, sample_df_ytest.values]
result_pred = pd.DataFrame(data=frames)
result_pred=result_pred.T

result_pred_Lin=result_pred.rename(columns={0:'Pred_Linear',1:'Real_Value'})
result_pred_Lin["Pred_Linear"]=result_pred_Lin["Pred_Linear"].map(lambda x:round(x,2))
result_pred_Lin["Diff"]=result_pred_Lin["Pred_Linear"]-result_pred_Lin["Real_Value"]
result_pred_Lin["Diff"]=result_pred_Lin["Diff"]
print("Mean Diff: ",abs(result_pred_Lin["Diff"]).mean())
result_pred_Lin.head(20)


### <font color='green'> KNN </font>

In [None]:
from sklearn.neighbors import KNeighborsRegressor
knn_model = KNeighborsRegressor().fit(X_train, y_train)
y_pred = knn_model.predict(X_test)
np.sqrt(mean_squared_error(y_test, y_pred))
knn_model.score(X_train, y_train)
knn_model.score(sample_df_Xtest, sample_df_ytest)
from sklearn.model_selection import GridSearchCV
knn_params = {'n_neighbors': np.arange(1,30,1)}
knn = KNeighborsRegressor()
knn_cv_model = GridSearchCV(knn, knn_params, cv = 10)
knn_cv_model.fit(X_train, y_train)
knn_cv_model.best_params_["n_neighbors"]
knn_tuned = KNeighborsRegressor(n_neighbors = knn_cv_model.best_params_["n_neighbors"])
knn_tuned.fit(X_train, y_train)
np.sqrt(mean_squared_error(y_test, knn_tuned.predict(X_test)))
knn_r2_score = knn_tuned.score(sample_df_Xtest, sample_df_ytest)
print("R2 score of KNN is : ",knn_r2_score)

In [None]:
pred=knn_tuned.predict(sample_df_Xtest)
frames = [pred, sample_df_ytest.values]
result_pred = pd.DataFrame(data=frames)
result_pred=result_pred.T
result_pred_Knn=result_pred.rename(columns={0:'Pred_KNN',1:'Real'})
result_pred_Knn["Pred_KNN"]=result_pred_Knn["Pred_KNN"].map(lambda x:round(x,2))
result_pred_Knn["Diff"]=result_pred_Knn["Pred_KNN"]-result_pred_Knn["Real"]
result_pred_Knn["Diff"]=result_pred_Knn["Diff"]
print("Difference between their Mean: ",abs(result_pred_Knn["Diff"]).mean())
result_pred_Knn.head(20)

### <font color='green'> SVR Model</font>

In [None]:
from sklearn.svm import SVR
svr_model = SVR(kernel= 'rbf', C= 1e3, gamma= 0.01)
svr_model.fit(X_train, y_train)
y_pred = svr_model.predict(X_test)
np.sqrt(mean_squared_error(y_test, y_pred))
svr_params = {"C": [0.01, 0.1,0.4,5,10,20,30,40,50]}
svr_cv_model = GridSearchCV(svr_model,svr_params, cv = 10)
svr_cv_model.fit(X_train, y_train)
pd.Series(svr_cv_model.best_params_)[0]
svr_tuned = SVR(kernel= 'rbf', C= pd.Series(svr_cv_model.best_params_)[0], gamma= 0.01)
svr_tuned.fit(X_train, y_train)
y_pred = svr_tuned.predict(X_test)
np.sqrt(mean_squared_error(y_test, y_pred))
svr_r2_score = svr_tuned.score(sample_df_Xtest, sample_df_ytest)
print("R2 score of SVR is : ",svr_r2_score)

In [None]:
pred=svr_tuned.predict(sample_df_Xtest)
frames = [pred, sample_df_ytest.values]
result_pred = pd.DataFrame(data=frames)
result_pred=result_pred.T

result_pred_Svr = result_pred.rename(columns={0:'Pred_SVR',1:'Real'})
result_pred_Svr["Pred_SVR"]=result_pred_Svr["Pred_SVR"].map(lambda x:round(x,2))
result_pred_Svr["Diff"]=result_pred_Svr["Pred_SVR"]-result_pred_Svr["Real"]
result_pred_Svr["Diff"]=result_pred_Svr["Diff"]
print("Mean Diff: ",abs(result_pred_Svr["Diff"]).mean())
result_pred_Svr.head(20)

### <font color='green'> Random Forest </font>

In [None]:
from sklearn.ensemble import RandomForestRegressor
rf_model = RandomForestRegressor(random_state = 42)
rf_model.fit(X_train, y_train)
y_pred = rf_model.predict(X_test)
np.sqrt(mean_squared_error(y_test, y_pred))
rf_params = {'max_depth': list(range(1,10)), 'max_features': [3,5,10,15], 'n_estimators' : [100, 200, 500, 750]}
rf_model = RandomForestRegressor(random_state = 42)
rf_cv_model = GridSearchCV(rf_model, rf_params, cv = 10, n_jobs = -1, verbose = 2)
rf_cv_model.fit(X_train, y_train)
rf_cv_model.best_params_
rf_tuned = RandomForestRegressor(max_depth  = 9, max_features = 5, n_estimators =750)
rf_tuned.fit(X_train, y_train)
y_pred = rf_tuned.predict(X_test)
np.sqrt(mean_squared_error(y_test, y_pred))
rf_r2_score = rf_tuned.score(sample_df_Xtest, sample_df_ytest)
print("R2 score of Random Forest is : ",rf_r2_score)

In [None]:
pred=rf_tuned.predict(sample_df_Xtest)
frames = [pred, sample_df_ytest.values]
result_pred = pd.DataFrame(data=frames)
result_pred=result_pred.T

result_pred_Rf=result_pred.rename(columns={0:'Pred_RF',1:'Real'})
result_pred_Rf["Pred_RF"]=result_pred_Rf["Pred_RF"].map(lambda x:round(x,2))
result_pred_Rf["Diff"]=result_pred_Rf["Pred_RF"]-result_pred_Rf["Real"]
result_pred_Rf["Diff"]=result_pred_Rf["Diff"]
print("Mean Diff: ",abs(result_pred_Rf["Diff"]).mean())
result_pred_Rf.head(20)

## <font color='red'> Models R2 Score Comparison Table </font>

In [None]:
data = {"Model": ["Lin", "KNN","SVR", "Random Forest"], "R2 Score": [lin_r2_score,knn_r2_score,svr_r2_score,rf_r2_score]}
df=pd.DataFrame(data)
df

### ***<font color='green'>As you can see we are getting more accuracy in our Random Forest Model. So we are going with Random Forest Model in our deployment.</font>***

## <font color='red'> Values Comparison between the Real Data Values and our Models Predicted Values </font>

In [None]:
result = pd.concat([result_pred_Lin,result_pred_Knn, result_pred_Svr, result_pred_Rf], axis=1,sort=False)
final_result=result["Real_Value"]
final_result=pd.DataFrame(final_result)
result.drop(['Diff',"Real"],inplace=True,axis=1)
final_result=pd.concat([final_result,result],axis=1)

In [None]:
import plotly.graph_objects as go
colors=['lightpink','lightgreen','yellow','lightgreen','yellow']
fig = go.Figure(data=[go.Table(header=dict(values=['Values from our Data Set', 'Linear Model Predicated Values','KNN Model Predicated Values','SVR Model Predicated Values','RF Model Predicated Values'],
line_color='blacK', fill_color='LightSlateGray',
align='center',font=dict(color='white', size=12)),
cells=dict( values=[final_result['Real_Value'],final_result['Pred_Linear'], final_result['Pred_KNN'], final_result['Pred_SVR'], final_result['Pred_RF'],], line_color=colors, fill_color=colors, align='center', font=dict(color='#660033', size=13)))])
fig.show()