# Predicting MPG with SDC

**Author:** Richard Hemphill<br>
**ID:** 903877709<br>
**Class:** ECE5268 Theory of Neural Networks<br>
**Instructor:** Dr. Georgios C. Anagnostopoulos<br>
**Description:** Utilize characteristics from various cars to predict miles-per-gallon fuel consumption.  The prediction equation is determined using Stocastic Gradient Descent minimization method.

In [402]:
# CONSTANTS
DATASET_FILE = 'autompg_dataset.csv'
NUMBER_FOR_TRAINING = 200
NUMBER_FOR_VALIDATION = 100
OUTPUT_FEATURE='mpg'
INPUT_FEATURES=['horsepower', 'weight']
EPOCHS = 1000

In [403]:
# LIBRARIES
import numpy as np                  # matrix manipulation
import random                       # shuffle data
import matplotlib.pyplot as plt     # surface plot

In [404]:
# FUNCTIONS
# Create Augmented Design Matrix
def DesignMatrix(dataSet, features):
    # Create the design matrix.
    adm = dataSet[features[0]]
    for feature in features[1:]:
        adm = np.column_stack((adm,dataSet[feature]))
     # Augment the design matrix to accomodate the bias term.
    adm = np.column_stack((adm,np.ones(len(adm))))
    return adm

In [405]:
# Calculate Mean Squared Error
def MSE(actual, predicted):
    return np.square(np.subtract(actual, predicted)).mean()

In [406]:
def PredictionEquation(y, xs, w):
    eq = '{} = '.format(y)
    wfmat = lambda i: ('+' if i > 0 else '') + '{:0.6}'.format(i)
    for idx, x in enumerate(xs):
        eq = eq + '{}*{}'.format(wfmat(w[idx]), x)
    eq = eq + wfmat(w[-1])
    return eq

In [407]:
# Load data file
csvFile = open(DATASET_FILE, 'r')
dataSet = np.genfromtxt(csvFile, delimiter=',', names=True, case_sensitive=True)
csvFile.close()

In [408]:
# shuffle data randomly so that training will not use same sets every time.
random.shuffle(dataSet)

In [409]:
# Split the data set into groups for training, validation and test.
trainData = dataSet[:NUMBER_FOR_TRAINING]
valData = dataSet[NUMBER_FOR_TRAINING+1:NUMBER_FOR_TRAINING+NUMBER_FOR_VALIDATION]
testData = dataSet[NUMBER_FOR_TRAINING+NUMBER_FOR_VALIDATION+1:]

## Part (a): Batch Size 1
tbd

In [455]:
EPOCHS = 1
LRmax = 0.00001
decay = 10
batchSize = 1
iterations = round(len(trainData)/batchSize)
Wa = np.zeros(len(INPUT_FEATURES)+1)
for e in range(EPOCHS):
    #print('epoch({}):Y({}) X({}) Wa({})'.format(e,Y,X,Wa))
    for i in range(iterations):
        batch = trainData[batchSize*i:batchSize*(i+1)]
        Y = batch[OUTPUT_FEATURE]
        X = AugmentedDesignMatrix(dataSet=batch,features=INPUT_FEATURES)
        Ypred = np.dot(X,Wa)
        LR = LRmax / (1 + (decay * i))
        N = len(X)
        Wa = Wa - LR * (2/N) * X.T.dot(Ypred-Y)
        print('epoch({}) iteration({}): Y({}) X({}) Wa({}) N({})'.format(e,i,Y,X,Wa,N))

00e+00]]) Wa([83667.12194909 -3027.27610174  -543.77962663]) N(1)
epoch(0) iteration(42): Y([13.]) X([[1.700e+02 4.746e+03 1.000e+00]]) Wa([83668.28972736 -2994.67448011  -543.77275734]) N(1)
epoch(0) iteration(43): Y([18.]) X([[1.000e+02 3.288e+03 1.000e+00]]) Wa([83675.15850922 -2768.82893259  -543.70406952]) N(1)
epoch(0) iteration(44): Y([15.]) X([[1.90e+02 3.85e+03 1.00e+00]]) Wa([83630.02614168 -3683.35322218  -543.9416083 ]) N(1)
epoch(0) iteration(45): Y([11.]) X([[2.100e+02 4.382e+03 1.000e+00]]) Wa([83616.79011893 -3959.54489688  -544.00463698]) N(1)
epoch(0) iteration(46): Y([15.]) X([[1.700e+02 3.563e+03 1.000e+00]]) Wa([83616.00511832 -3975.99758623  -544.00925463]) N(1)
epoch(0) iteration(47): Y([15.]) X([[1.500e+02 3.761e+03 1.000e+00]]) Wa([83631.3674442  -3590.81286864  -543.90683912]) N(1)
epoch(0) iteration(48): Y([18.]) X([[1.300e+02 3.504e+03 1.000e+00]]) Wa([83640.61443031 -3341.57102758  -543.83570846]) N(1)
epoch(0) iteration(49): Y([18.]) X([[9.700e+01 2.774e+0

In [411]:
#print(PredictionEquation(y=OUTPUT_FEATURE, xs=INPUT_FEATURES, w=Wa))