# Car Fuel Consumption prediction

## Introduction
In this exercise, we'll train a regression tree to predict the mpg (miles per gallon) consumption of cars in the auto-mpg dataset using all the six available features.

![car](https://img.freepik.com/premium-vector/character-filling-car-gas-station-pouring-fuel-into-vehicle-petroleum-refueling-automobile-transport-gasoline-service-drivers-benzene-ring-formula-cartoon-people-vector-illustration_87771-11191.jpg?w=2000)

In [None]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor

from sklearn.metrics import mean_squared_error as MSE

from sklearn.model_selection import cross_val_score

## Dataset

In [None]:
df_car = pd.read_csv('auto.csv')

In [None]:
print(df_car.head())

    mpg  displ   hp  weight  accel  origin  size
0  18.0  250.0   88    3139   14.5      US  15.0
1   9.0  304.0  193    4732   18.5      US  20.0
2  36.1   91.0   60    1800   16.4    Asia  10.0
3  18.5  250.0   98    3525   19.0      US  15.0
4  34.3   97.0   78    2188   15.8  Europe  10.0


In [None]:
# Create dummy variables for Origin column
origin_dummy = pd.get_dummies(df_car['origin'])
df_car = pd.concat([df_car, origin_dummy], axis=1)
df_car = df_car.drop(['origin'], axis=1)

In [None]:
print(df_car.head())


    mpg  displ   hp  weight  accel  size  Asia  Europe  US
0  18.0  250.0   88    3139   14.5  15.0     0       0   1
1   9.0  304.0  193    4732   18.5  20.0     0       0   1
2  36.1   91.0   60    1800   16.4  10.0     1       0   0
3  18.5  250.0   98    3525   19.0  15.0     0       0   1
4  34.3   97.0   78    2188   15.8  10.0     0       1   0


In [None]:
X = df_car.drop(['mpg'], axis = 1)
y = df_car[['mpg']]

## Train/Test split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)

## Train and fit model

In [None]:
# Instantiate model
dt = DecisionTreeRegressor(max_depth=4, min_samples_leaf=0.26, random_state=3)

# Fit to training data
dt.fit(X_train, y_train)

## Evaluation

In [None]:
# Compute y_pred
y_pred = dt.predict(X_test)

# Compute mse
mse_dt = MSE(y_test, y_pred)

# Compute rmse
rmse_dt = mse_dt ** (1/2)

# Print rmse_dt
print("Test set RMSE of dt: {:.2f}".format(rmse_dt))

Test set RMSE of dt: 4.56


## Evaluate 10 fold CV error

In [None]:
# Compute the array containing the 10-folds CV MSEs
MSE_CV_scores = - cross_val_score(dt, X_train, y_train, cv=10, scoring = 'neg_mean_squared_error', n_jobs=-1)

# Compute the 1--folds CV RMSE
RMSE_CV = (MSE_CV_scores.mean()) ** (1/2)

# Print RMSE_CV
print('CV RMSE: {:.2f}'.format(RMSE_CV))

CV RMSE: 4.90
