**NAME**: Eric Graham

# In Class Assignment 1
In the following assignment you will be asked to fill in python code and derivations for a number of different problems.
**Please fill in the code and answer all the questions asked.** Please read all instructions carefully and turn in the rendered notebook after class.

### Loading the Data
Diabetes Dataset Details: https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset

Please run the following code to read in the diabetes dataset using the sklearn data loading module.

This will load the data into the variable `ds`. Note that `ds` is a dictionary object with fields like `ds.data`, which is a matrix of the continuous features in the dataset. The object is not a pandas dataframe. It is a numpy matrix. Each row is a set of observed instances, each column is a different feature. It also has a field called `ds.target` that is a continuous value we are trying to predict. Each entry in `ds.target` is a label for each row of the `ds.data` matrix.

In [6]:
import numpy as np # linear algebra
import pandas as pd
from pprint import pprint #for "pretty" printing

from sklearn.datasets import load_diabetes # sklearn data
from sklearn.linear_model import LinearRegression #sklearn linear regression model

In [2]:
ds = load_diabetes()

# this holds the data which consists of continuous features
# because ds.data is a matrix, there are some special properties we can access, like shape (see below)
print('type is: ', type(ds.data))
print('dataset shape:', ds.data.shape, 'format is:', ('rows','columns')) # there are 442 instances with 10 features each
print('range of target:', np.min(ds.target),np.max(ds.target))

type is:  <class 'numpy.ndarray'>
dataset shape: (442, 10) format is: ('rows', 'columns')
range of target: 25.0 346.0


In [3]:
# we can set the fields inside of ds and set them to new variables in python
pprint(ds.data) # prints out elements of the matrix
pprint(ds.target) # prints the vector (all 442 items)

array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
         0.01990749, -0.01764613],
       [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
        -0.06833155, -0.09220405],
       [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
         0.00286131, -0.02593034],
       ...,
       [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
        -0.04688253,  0.01549073],
       [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
         0.04452873, -0.02593034],
       [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
        -0.00422151,  0.00306441]], shape=(442, 10))
array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
        69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
        68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
        87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
       259.,  53., 190., 142.,  75., 142., 155., 225.,  59., 104., 182.,
     

### Using Linear Regression
In the videos, we derived the formula for calculating the optimal values of the regression weights as

$$ w = (X^TX)^{-1}X^Ty, $$

where $X$ is the matrix of values with a bias column of ones appended onto it. For the diabetes dataset one could construct this $X$ matrix by stacking a column of ones onto the `ds.data` matrix.

$$ X=\begin{bmatrix}
         & \vdots &        &  1 \\
        \dotsb & \text{ds.data} & \dotsb &  \vdots\\
         & \vdots &         &  1\\
     \end{bmatrix}
$$

# Question 1:
For the diabetes dataset, how many elements will the vector $w$ contain?

In [10]:
ds.data.shape

(442, 10)

## Answer:

W will contain a number of elements equal to the number of features plus a bias term. Since our dataset has 10 features, W will contain 11 elements.


# Question 2:

In the following empty cell, use the above equation and numpy matrix operations to find the values of the vector $w$. You will need to be sure $X$ and $y$ are created like the instructor talked about in the video. Don't forget to include any modifications to $X$ to account for the bias term in $w$. You might be interested in the following functions:

- `np.hstack((mat1,mat2))` stack two matrices horizontally, to create a new matrix
- `np.ones((rows,cols))` create a matrix full of ones
- `my_mat.T` takes transpose of numpy matrix named `my_mat`
- `np.dot(mat1,mat2)` is matrix multiplication for two matrices
- `np.linalg.inv(mat)` gets the inverse of the variable `mat`

## Answer:

In [11]:
# Write you code here, print the values of the regression weights using the 'print()' function in python
X = np.hstack((ds.data, np.ones((ds.data.shape[0], 1))))
y = ds.target
w = np.dot(np.dot(np.linalg.inv(np.dot(X.T, X)), X.T), y)
print(w)

[ -10.0098663  -239.81564367  519.84592005  324.3846455  -792.17563855
  476.73902101  101.04326794  177.06323767  751.27369956   67.62669218
  152.13348416]


# Question 3:

Scikit-learn also has a linear regression fitting implementation. Look at the scikit learn API and learn to use the linear regression method. The API is here:

- API Reference: http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

Use the sklearn `LinearRegression` module to fit a LinearRegression model  to check your results from the previous question. Did you get the same parameters?

## Answer:

In [39]:
from sklearn.linear_model import LinearRegression

# write your code here, print the values of model by accessing
#    its properties that you looked up from the API

fit = LinearRegression().fit(X, y)


print('model coefficients are:', fit.coef_)
print('model intercept is', fit.intercept_)
print('Answer to question is', 'Yes they are the same!')

model coefficients are: [ -10.0098663  -239.81564367  519.84592005  324.3846455  -792.17563855
  476.73902101  101.04326794  177.06323767  751.27369956   67.62669218
    0.        ]
model intercept is 152.13348416289597
Answer to question is Yes they are the same!


# Question 4:

Recall that to predict the output from our model, $\hat{y}$, from $w$ and $X$ we need to use the following formula
\begin{align}
    \hat{y}=w^TX^T,
\end{align}

where $X$ is a matrix with example instances in *each row* of the matrix.


- **Part A:** Compute $\hat{y}$ in two ways: 1) By matrix multiplication using numpy and the above equation (denote this with $\hat{y}_{numpy}$) and 2)  by using the sklearn regression model from Question 3 (denote this $\hat{y}_{sklearn}$).
        
    Note: you may need to make the regression weights a column vector using the following code: `w = w.reshape((len(w),1))` This assumes your weights vector is assigned to the variable named `w`.
- **Part B:** Calculate the mean squared error between your prediction from numpy and the target: $\sum_i(y-\hat{y}_{numpy})^2$.
- **Part C:** Calculate the mean squared error between your sklearn prediction and the target: $\sum_i(y-\hat{y}_{sklearn})^2$.

## Answer:

In [38]:
# Use this block to answer the questions
# Part A
w = w.reshape((len(w),1)) # make w a column vector
y_numpy = np.dot(w.T, X.T)
y_sklearn = fit.predict(X)

# print(w.shape)
# print(X.shape)

print('MSE Sklearn is:', y_sklearn)
print('MSE Numpy is:', y_numpy)

# Part B
diff_numpy = ds.target - y_numpy
squared_diff_numpy = diff_numpy ** 2
mse_numpy = np.sum(squared_diff_numpy)
print(mse_numpy)

# Part C
diff_sklearn = ds.target - y_sklearn
squared_diff_sklearn = diff_sklearn ** 2
mse_sklearn = np.sum(squared_diff_sklearn)
print(mse_sklearn)

MSE Sklearn is: [206.11667725  68.07103297 176.88279035 166.91445843 128.46225834
 106.35191443  73.89134662 118.85423042 158.80889721 213.58462442
  97.07481511  95.10108423 115.06915952 164.67656842 103.07814257
 177.17487964 211.7570922  182.84134823 148.00326937 124.01754066
 120.33362197  85.80068961 113.1134589  252.45225837 165.48779206
 147.71997564  97.12871541 179.09358468 129.05345958 184.7811403
 158.71516713  69.47575778 261.50385365 112.82234716  78.37318279
  87.66360785 207.92114668 157.87641942 240.84708073 136.93257456
 153.48044608  74.15426666 145.62742227  77.82978811 221.07832768
 125.21957584 142.6029986  109.49562511  73.14181818 189.87117754
 157.9350104  169.55699526 134.1851441  157.72539008 139.11104979
  72.73116856 207.82676612  80.11171342 104.08335958 134.57871054
 114.23552012 180.67628279  61.12935368  98.72404613 113.79577026
 189.95771575 148.98351571 124.34152283 114.8395504  121.99957578
  73.91017087 236.71054289 142.31126791 124.51672384 150.8407

# Question 5: Using Linear Classification
Now lets use the code you created to make a classifier with linear boundaries. Run the following code in order to load the iris dataset.

In [40]:
from sklearn.datasets import load_iris

# this will overwrite the diabetes dataset
iris_ds = load_iris()
print('features shape:', iris_ds.data.shape) # there are 150 instances and 4 features per instance
print('original number of classes:', len(np.unique(iris_ds.target)))

# now let's make this a binary classification task
iris_ds.target = iris_ds.target>1
print ('new number of classes:', len(np.unique(iris_ds.target)))

features shape: (150, 4)
original number of classes: 3
new number of classes: 2


**Question 5:** Now use linear regression to come up with a set of weights, `w`, that predict the class value. This is exactly like you did before for the *diabetes* dataset. However, instead of regressing to continuous values, you are just regressing to the integer value of the class (0 or 1), like we talked about in the video. Remember to account for the bias term when constructing the feature matrix, `X`. Print the weights of the linear classifier.

## Answer:

In [41]:
# write your code here and print the values of the weights
X_iris = np.hstack((iris_ds.data, np.ones((iris_ds.data.shape[0], 1))))
y_iris = iris_ds.target.astype(int) 
w_iris = np.dot(np.dot(np.linalg.inv(np.dot(X_iris.T, X_iris)), X_iris.T), y_iris)

print(w_iris)

[-0.04587608  0.20276839  0.00398791  0.55177932 -0.69528186]


# Question 6:

Finally, use a hard decision function on the output of the linear regression to make this a binary classifier. This is just like we talked about in the video, where the output of the linear regression passes through a function $g$ in the form:

$$\hat{y}=g(w^TX^T), \text{where} $$
 - $g(w^TX^T)$ for $w^TX^T < \alpha$ maps the predicted class to `0`,
 - $g(w^TX^T)$ for $w^TX^T \geq \alpha$ maps the predicted class to `1`.

$\alpha$ is a threshold for deciding the class.

What value for $\alpha$ makes the most sense? What is the accuracy of the classifier given the $\alpha$ you chose?

Note: You can calculate the accuracy with as follows: `accuracy = float(sum(yhat==y)) / len(y)` where `y` and `yhat` denote the true targets and the predicted targets, respectively.

## Answer:

In [49]:
# use this box to predict the classification output

predictions_iris = np.dot(w_iris.T, X_iris.T)

# let's try 0.5 first

alpha = 0.5
predictions_iris_a05 = (predictions_iris >= alpha).astype(int)

accuracy_a05 = float(sum(predictions_iris_a05 == y_iris)) / len(y_iris)
print('Accuracy with Alpha 0.5:', accuracy_a05)

# alpha = 0.0
alpha = 0.0
predictions_iris_a0 = (predictions_iris >= alpha).astype(int)

accuracy_a00 = float(sum(predictions_iris_a0 == y_iris)) / len(y_iris)
print('Accuracy with Alpha 0.0:', accuracy_a00)

# alpha = 0.25
alpha = 0.25
predictions_iris_a025 = (predictions_iris >= alpha).astype(int)

accuracy_a025 = float(sum(predictions_iris_a025 == y_iris)) / len(y_iris)
print('Accuracy with Alpha 0.25:', accuracy_a025)

# alpha = 0.75
alpha = 0.75
predictions_iris_a075 = (predictions_iris >= alpha).astype(int)

accuracy_a075 = float(sum(predictions_iris_a075 == y_iris)) / len(y_iris)
print('Accuracy with Alpha 0.75:', accuracy_a075)

# alpha = 0.95
alpha = 0.95
predictions_iris_a095 = (predictions_iris >= alpha).astype(int)

accuracy_a095 = float(sum(predictions_iris_a095 == y_iris)) / len(y_iris)
print('Accuracy with Alpha 0.95:', accuracy_a095)

print('The best accuracy comes from Alpha 0.5:', accuracy_a05)

Accuracy with Alpha 0.5: 0.9266666666666666
Accuracy with Alpha 0.0: 0.6066666666666667
Accuracy with Alpha 0.25: 0.7466666666666667
Accuracy with Alpha 0.75: 0.8266666666666667
Accuracy with Alpha 0.95: 0.7266666666666667
The best accuracy comes from Alpha 0.5: 0.9266666666666666


The End. Please **save (make sure you saved!!!) and upload your rendered notebook in the Digital Campus**

**Grading Schema**


Question 1: 10

Question 2: 10

Question 3: 10

Question 4: 40

Question 5: 10

Question 6: 20