# Homework

#### Explain the bias-variance tradeoff.

#Read ISLR p33-37 (extra credit)
A: Bias is the error introduced by fitting a complex data set using a simple method such as linear regression.  Variance is the amount f^ would change if we estimated it using a different training data set.  We want both values to be low, however both values can be influenced by the method selected and sometimes in opposite directions.  Knowing f in advance makes the method selection easier, however the goal is to solve the problem not knowing f.  We have to keep an open mind to determining f^ and use simple and complex methods to identify the best model solution.

#### Discuss the pros and cons of using the BIC to select a model.

A: Pros of BIC to select a model is it provides a simple comparison value to evaluate your models, the smaller the value the better including negative values.  The BIC function is also readily available to calculate the value of your models.
The cons of BIC is that it does not give you a good understanding on the underlying data and may provide numerous possible model options.  It should mainly be used to eliminate bad models so you can focus your attention on the good model options.

# Model Selection on a Classification Model

In [None]:
#Try this from Brian
wheat.seeds.data = read.csv('~/uclax-data-science/UCI-ML-Seeds/data/seeds_dataset.csv',
        header=FALSE, sep="")

colnames(wheat.seeds.data) = c("area", "perimeter", "compactness", "kernel_length", "kernel_width",
                             "asymmetry_coeff", "kernel_groove_length", "target")

In [None]:
#Get and load the data
SEEDS_DATA_URL = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00236/seeds_dataset.txt'
#seeds_df.data = read.csv(SEEDS_DATA_URL, sep="\s+", header=None)
seeds_df.data = read.csv(SEEDS_DATA_URL, sep="\s+", header=None)
#seeds_df.columns = ["Area","Perimeter","Compactness","KernelLength","KernelWidth","AsymmetryCoefficient","KernelGrooveLength","VarietyType"]

In [None]:
seeds_df.data = read.csv("data/Seeds.csv")

In [None]:
iris.data = read.csv("data/iris.csv", row.names='X')

In [None]:
iris.glm = glm("label ~ 1 + sepal_length + sepal_width + petal_length + petal_width", data = iris.data)
summary(iris.glm)

In [None]:
#Pr(>|t|) value > .05 is significant

## The Log-Likelihood

Without going too far into the math, we can think of the log-likelihood as a **likelihood function** telling us how likely a model is given the data. 

This value is not human interpretable but is useful as a comparison.

In [None]:
logLik(iris.glm)

"All models are wrong, but some are useful." - George Box

We might be concerned with one additional property - the **complexity** of the model. 

##### William of Occam

[**Occam's razor**](https://en.wikipedia.org/wiki/Occam's_razor) is the problem-solving principle that, when presented with competing hypothetical answers to a problem, one should select the one that makes the fewest assumptions.

<img src="https://upload.wikimedia.org/wikipedia/commons/a/ab/William_of_Ockham_-_Logica_1341.jpg" width=400px>

We can represent this idea of complexity in terms of both the number of features we use and the amount of data.

## Bayesian Information Criterion

https://en.wikipedia.org/wiki/Bayesian_information_criterion

The BIC is formally defined as

$$ \mathrm{BIC} = {\ln(n)k - 2\ln({\widehat L})}. $$

where

- $\widehat L$ = the maximized value of the likelihood function of the model $M$
- $x$ = the observed data
- $n$ = the number of data points in $x$, the number of observations, or equivalently, the sample size;
- $k$ = the number of parameters estimated by the model. For example, in multiple linear regression, the estimated parameters are the intercept, the $q$ slope parameters, and the constant variance of the errors; thus, $k = q + 2$.


It might help us to think of it as 

$$ \mathrm{BIC} = \text{complexity}-\text{likelihood}$$

In [None]:
BIC(iris.glm)

In [None]:
n = length(iris.glm$fitted.values)
p = length(coefficients(iris.glm))

likelihood = 2 * logLik(iris.glm)
complexity = log(n)*(p+1)

bic = complexity - likelihood
bic

In [None]:
BIC_of_model = function (model) {
    n = length(model$fitted.values)
    p = length(coefficients(model))

    likelihood = 2 * logLik(model)
    complexity = log(n)*(p+1)

    bic = complexity - likelihood
    return(bic)
}

In [None]:
BIC_of_model(iris.glm)

## Model Selection

Here, we choose the optimal model by removing features one by one.

In [None]:
#Model backward selection for GLM

In [None]:
model_1  = "label ~ 1 + sepal_length + sepal_width + petal_length + petal_width"
model_2a = "label ~ 1 + sepal_length + sepal_width + petal_length"
model_2b = "label ~ 1 + sepal_length + sepal_width                + petal_width"
model_2c = "label ~ 1 + sepal_length               + petal_length + petal_width"
model_2d = "label ~ 1                + sepal_width + petal_length + petal_width"

In [None]:
iris.glm.1 = glm(model_1, data=iris.data)
iris.glm.2a = glm(model_2a, data=iris.data)
iris.glm.2b = glm(model_2b, data=iris.data)
iris.glm.2c = glm(model_2c, data=iris.data)
iris.glm.2d = glm(model_2d, data=iris.data)

In [None]:
#iris.glm.1

In [None]:
print(c('model_1', BIC_of_model(iris.glm.1)))
print(c('model_2a', BIC_of_model(iris.glm.2a )))
print(c('model_2b', BIC_of_model(iris.glm.2b )))
print(c('model_2c', BIC_of_model(iris.glm.2c )))
print(c('model_2d', BIC_of_model(iris.glm.2d )))

In [None]:
print(c('model_1', BIC(iris.glm.1)))
print(c('model_2a', BIC(iris.glm.2a )))
print(c('model_2b', BIC(iris.glm.2b )))
print(c('model_2c', BIC(iris.glm.2c )))
print(c('model_2d', BIC(iris.glm.2d )))

In [None]:
model_1  = "label ~ 1 + sepal_length + sepal_width + petal_length + petal_width"
model_2c = "label ~ 1 + sepal_length               + petal_length + petal_width"
model_3a = "label ~ 1 + sepal_length               + petal_length "
model_3b = "label ~ 1 + sepal_length                              + petal_width"
model_3c = "label ~ 1                              + petal_length + petal_width"

In [None]:
iris.glm.3a = glm(model_3a, data=iris.data)
iris.glm.3b = glm(model_3b, data=iris.data)
iris.glm.3c = glm(model_3c, data=iris.data)

In [None]:
print(c('model_1', BIC(iris.glm.1)))
print(c('model_2c', BIC(iris.glm.2c )))
print(c('model_3a', BIC(iris.glm.3a )))
print(c('model_3b', BIC(iris.glm.3b )))
print(c('model_3c', BIC(iris.glm.3c )))