# Predicting MPG with OLS

**Author:** Richard Hemphill<br>
**ID:** 903877709<br>
**Class:** ECE5268 Theory of Neural Networks<br>
**Instructor:** Dr. Georgios C. Anagnostopoulos<br>
**Description:** Utilize characteristics from various cars to predict miles-per-gallon fuel consumption.  The prediction equation is determined using Ordinary Least Squares regression.

In [85]:
# CONSTANTS
DATASET_FILE = 'autompg_dataset.csv'
NUMBER_FOR_TRAINING = 200
NUMBER_FOR_VALIDATION = 100

In [86]:
# LIBRARIES
import numpy as np                  # matrix manipulation
import random                       # shuffle data
import matplotlib.pyplot as plt     # surface plot

In [87]:
# FUNCTIONS
def MSE(actual, predicted):
    return np.square(np.subtract(actual, predicted)).mean()

In [88]:
def PredictionEquation(y, xs, w):
    eq = '{} = '.format(y)
    wfmat = lambda i: ('+' if i > 0 else '') + '{:0.6}'.format(i)
    for idx, x in enumerate(xs):
        eq = eq + '{}*{}'.format(wfmat(w[idx]), x)
    eq = eq + wfmat(w[-1])
    return eq

In [89]:
# Load data file
csvFile = open(DATASET_FILE, 'r')
dataSet = np.genfromtxt(csvFile, delimiter=',', names=True, case_sensitive=True)
csvFile.close()

In [90]:
# shuffle data randomly so that training will not use same sets every time.
random.shuffle(dataSet)

In [91]:
# Split the data set into groups for training, validation and test.
trainData = dataSet[:NUMBER_FOR_TRAINING]
valData = dataSet[NUMBER_FOR_TRAINING+1:NUMBER_FOR_VALIDATION]
testData = dataSet[NUMBER_FOR_VALIDATION:]

## Part (a):
Use OLS regression on the training data to predict _mpg_ based on _horsepower_ and _weight_.

In [92]:
# Specify the features to be used.
outputFeature='mpg'
inputFeatures=['horsepower', 'weight']

In [93]:
# Create the output vector
Y = trainData[outputFeature]

In [94]:
# Create the design matrix.
designMatrix = trainData[inputFeatures[0]]
for inputFeature in inputFeatures[1:]:
    designMatrix = np.column_stack((designMatrix,trainData[inputFeature]))

In [95]:
# Augment the design matrix to accomodate the bias term.
X = np.column_stack((designMatrix,np.ones(len(designMatrix))))

In [96]:
# Create the augmented model parameter vector.
W = np.ones(len(inputFeatures)+1)

In [97]:
# Calculate the augmented model parameter vector using OLS
R = np.dot(X.T, X)
Rinv = np.linalg.inv(R)
W = np.dot(np.dot(Rinv, X.T), Y)

### i Prediction Equation

In [99]:
print(PredictionEquation(y=outputFeature, xs=inputFeatures, w=W))

mpg = -0.0164966*horsepower-0.00508234*weight+37.66
