# Exercises

The autompg dataset concerns city-cycle fuel consumption in miles per gallon, to be predicted in terms of 3 multivalued discrete and 5 continuous attributes. There are 398 instances.

Attribute information:
1. mpg:           continuous
2. cylinders:     multi-valued discrete
3. displacement:  continuous
4. horsepower:    continuous
5. weight:        continuous
6. acceleration:  continuous
7. model year:    multi-valued discrete
8. origin:        multi-valued discrete
9. car name:      string (unique for each instance)


There are 6 missing entries for horsepower, indicated in the file as  ?. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Neural network regression model
from sklearn.neural_network import MLPRegressor

# for linear regression
import statsmodels.api as sm

# Model validation 
from sklearn.model_selection import KFold,RepeatedKFold,GridSearchCV
from sklearn.metrics import mean_squared_error
from sklearn.base import clone

# for generating combinations
from itertools import product

%matplotlib inline

In [2]:
plt.rcParams['figure.dpi'] = 150

In [3]:
auto = pd.read_csv(
    '../data/auto-mpg.csv',
    na_values = '?'
)

# drop missing values
# axis = 0 - drop 
auto = auto.dropna(axis=0)

# car name is unique for each instance
# deleting the column
del auto['car name']

# print a few rows of data frame
auto

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model year,origin
0,18.0,8,307.0,130.0,3504,12.0,70,1
1,15.0,8,350.0,165.0,3693,11.5,70,1
2,18.0,8,318.0,150.0,3436,11.0,70,1
3,16.0,8,304.0,150.0,3433,12.0,70,1
4,17.0,8,302.0,140.0,3449,10.5,70,1
...,...,...,...,...,...,...,...,...
393,27.0,4,140.0,86.0,2790,15.6,82,1
394,44.0,4,97.0,52.0,2130,24.6,82,2
395,32.0,4,135.0,84.0,2295,11.6,82,1
396,28.0,4,120.0,79.0,2625,18.6,82,1


In [4]:
# standardize predictors
X = auto.drop('mpg',axis=1).values # extract as numpy array
X_mean,X_std = X.mean(axis=0),X.std(axis=0)
X = (X-X_mean)/X_std

# standardize response
y = auto['mpg'].values
y_mean,y_std = y.mean(),y.std()
y = (y-y_mean)/y_std

## Exercise 1

Train and tune a neural network model to predict `mpg` as a function of the other variables. Use 5-fold CV.

In [None]:
#### YOUR CODE GOES HERE ####

## Exercise 2

Train and tune a support vector machine model (with radial basis kernel) to predict `mpg` as a function of the other variables. Use the same 5-fold partition as earlier.


`SVR` from the module `sklearn.svm` implements supports support vector machine regression. For the default radial basis kernel, it has two hyperparameters: `C` and `gamma`. Test the following values:

1. `C`: `[1e-2,0.1,1,10,100]`
2. `gamma`: `[1e-3,1e-2,0.1,1,10]`

There are 25 combinations of C and gamma values.

In [31]:
from sklearn.svm import SVR

In [None]:
#### YOUR CODE GOES HERE ####

## Exercise 3

Compare both models using (a) a single K-fold partition (b) multiple K-fold partitions

In [None]:
#### YOUR CODE GOES HERE ####