In these types of machine learning problems to predict fuel efficiency, we aim to predict the output of a continuous value, such as a price or a probability. In this programming lab, I will take you through how we can predict Fuel Efficiency with Machine Learning.

## Predict Fuel Efficiency
Here we will use one of the famous datasets among machine learning practitioners, Auto MPG dataset to create a model to predict fuel efficiency of vehicles in the late 1970s and early 1980s. To do this, we will provide the model with a description of many automobiles from this period. This description includes attributes such as cylinders, displacement, horsepower and weight.

Let’s import the necessary libraries to get started with this task:

In [38]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Now, the next thing to do is to download the dataset. let’s import the data using the pandas package:

In [39]:
dataset = pd.read_csv('./auto.csv')

The “origin” column in the dataset is categorical, so to move forward we need to use some one-hot encoding on it:

In [40]:
dataset.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin,car name
0,18.0,8,307.0,130,3504,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165,3693,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150,3436,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150,3433,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140,3449,10.5,70,1,ford torino


In [41]:
origin = dataset.pop('origin')
dataset['USA'] = (origin == 1)*1.0
dataset['Europe'] = (origin == 2)*1.0
dataset['Japan'] = (origin == 3)*1.0

In [42]:
dataset.pop('car name')

0      chevrolet chevelle malibu
1              buick skylark 320
2             plymouth satellite
3                  amc rebel sst
4                    ford torino
                 ...            
393              ford mustang gl
394                    vw pickup
395                dodge rampage
396                  ford ranger
397                   chevy s-10
Name: car name, Length: 398, dtype: object

Now, let’s split the data into training and test sets:

In [43]:
train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)

Now, we will separate the target values from the features in the dataset. This label is that feature that we will use to train the model to predict fuel efficiency:

In [44]:
train_labels = train_dataset.pop('mpg')
test_labels = test_dataset.pop('mpg')

## Normilize Data

It is recommended that you standardize features that use different scales and ranges. Although the model can converge without standardization of features, this makes learning more difficult and makes the resulting model dependent on the choice of units used in the input. We need to do this to project the test dataset into the same distribution the model was trained on:

In [45]:
train_dataset.head()

Unnamed: 0,cylinders,displacement,horsepower,weight,acceleration,model year,USA,Europe,Japan
65,8,351.0,153,4129,13.0,72,1.0,0.0,0.0
132,4,140.0,75,2542,17.0,74,1.0,0.0,0.0
74,8,302.0,140,4294,16.0,72,1.0,0.0,0.0
78,4,120.0,87,2979,19.5,72,0.0,1.0,0.0
37,6,232.0,100,3288,15.5,71,1.0,0.0,0.0


In [46]:
def norm(df):
    # apply normalization techniques 
    for column in df.columns: 
        df[column] = (df[column] - df[column].mean()) / df[column].std()   

    return df


In [47]:
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)

TypeError: Could not convert 1537514087100696590150110?658854145808290105691907897112701166688841101109272701109690155711982201658010096110901801076776831307617020011088841407872100951801481981058321567155871009575756160180150607846746784120716370225975267129165758517585528892659010517515098907013065868965158145651701001509514072756883851508311095605265857811090909315062120150230671128685180140103?901251151001006811010592105175867510053112149588872751526213015070951108853138110704863210150150170108?96150888092851158410070139208907517567671509012911590851538890697011097225?1456778981101671058012084758815015092949714595979715072487571105140639060115461057015012595769290686486?861051607715068888880150959088909067122110165708014011317013011074100678810021588225681001609088102681007578 to numeric

# Build the Model

In [48]:
def build_model():
  model = keras.Sequential([
    layers.Dense(64, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
    layers.Dense(64, activation=tf.nn.relu),
    layers.Dense(1)
  ])

  optimizer = tf.keras.optimizers.RMSprop(0.001)

  model.compile(loss='mean_squared_error',
                optimizer=optimizer,
                metrics=['mean_absolute_error', 'mean_squared_error'])
  return model
model = build_model()
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_3 (Dense)             (None, 64)                640       
                                                                 
 dense_4 (Dense)             (None, 64)                4160      
                                                                 
 dense_5 (Dense)             (None, 1)                 65        
                                                                 
Total params: 4,865
Trainable params: 4,865
Non-trainable params: 0
_________________________________________________________________


Now, before training the model to predict fuel efficiency let’s try this model in the first 10 samples:

In [49]:
example_batch = train_dataset[:10]
example_result = model.predict(example_batch)
example_result

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).