# Part 1: Polynomial Regression

### A) Use the Auto dataset, find the test $R^2$ score of a linear regression model that predicts the miles per gallon (mpg) from the horsepower.

### B) Use polynomial regression to include both the horsepower feature and $(horsepower)^2$ in the regression model. Find the $R^2$ metric. 

Hint: You can use [numpy.concatenate](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.concatenate.html). For example to add to an array U a column vector $W^2$, we can use X=np.concatenate((U,W**2),axis=1)

In [6]:

from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import pandas as pd
import numpy as np
AutoData=read_csv('Auto_modify.csv') # read the data
#print(type(AutoData))
#print(AutoData)
X_auto_hp=AutoData.horsepower.values.reshape(-1,1) # define features: horsepower 
Y_auto_mpg=AutoData.mpg.values.reshape(-1,1) # define label: miles per gallon

# add your solution here
X_train, X_test, Y_train, Y_test = train_test_split(X_auto_hp, Y_auto_mpg, random_state=0)

model = LinearRegression().fit(X_train, Y_train)

print('The R2 score of this model without housrsepower^2 term is ' + str(model.score(X_test, Y_test)))

X_auto_hp_2 = np.concatenate((X_auto_hp,np.square(AutoData.horsepower.values.reshape(-1,1))),axis = 1)

X_train, X_test, Y_train, Y_test = train_test_split(X_auto_hp_2, Y_auto_mpg, random_state=0)

model = LinearRegression().fit(X_train, Y_train)

print('The R2 score of this model with housrsepower^2 term is ' + str(model.score(X_test, Y_test)))


The R2 score of this model without housrsepower^2 term is 0.6217658811398383
The R2 score of this model with housrsepower^2 term is 0.7271031504642004


### C)Use KNN regression to predict the miles per gallon(mpg) with K=7, and find $R^2$ metric in the following cases 

- One feature: Horsepower only

- Two features: horsepower and $(horsepower)^2$ 

Hint: 

    Create KNN regression object using neighbors.KNeighborsRegressor:

    knnRegression = neighbors.KNeighborsRegressor(n_neighbors=7)

    Use the .fit and .score methods as before



In [9]:
from pandas import read_csv
from sklearn.model_selection import train_test_split
import numpy as np
from sklearn import neighbors
from sklearn import preprocessing

AutoData=read_csv('Auto_modify.csv')

X_auto_hp=AutoData.horsepower.values.reshape(-1,1) # define features: horsepower
Y_new = AutoData.mpg.values.reshape(-1,1)

X_train, X_test, Y_train, Y_test = train_test_split(X_auto_hp, Y_new, random_state=0)

model = neighbors.KNeighborsRegressor(n_neighbors=7)
model.fit(X_train, Y_train)

print('The R2 score of this model without housrsepower^2 term is ' + str(model.score(X_test, Y_test)))

# X_auto_hp_2 = np.concatenate((X_auto_hp,np.square(AutoData.horsepower.values.reshape(-1,1))),axis = 1)

# X_train, X_test, Y_train, Y_test = train_test_split(X_auto_hp_2, Y_new, random_state=0)

# model = KNeighborsClassifier(n_neighbors=7)
# model.fit(X_train, Y_train)

# print('The R2 score of this model with housrsepower^2 term is ' + str(model.score(X_test, Y_test)))


The R2 score of this model without housrsepower^2 term is 0.6674777441714226


#### COMMENT on your results on (E) and (F): which model performs better? How does performance change when adding the quadratic feature?

1. The second one is slightly better, but it is meaningless. Because when it comes to knn model, addiing the quadratic feature will not be able to distinguish the different distance between different kinds, just waste of time.  

# Part 2: Regularization

### A) Use the Boston dataset, and use Ridge regression model with tuning parameter set to 100 (alpha =100). Find the $R^2$ score and number of non zero coefficients.

###  B) Use Lasso regression instead of Ridge regression, also set the tuning parameter to 100. Find the $R^2$ score and number of non zero coefficients.

### C) Change the tuning parameter of the Lasso model to a very low value (alpha =0.001). What is the $R^2$ score.



In [17]:
from sklearn.datasets import load_boston
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
import numpy as np

dataset = load_boston()
X=dataset.data
Y=dataset.target

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state= 0)

model = Ridge(alpha= 100)
model.fit(X_train, Y_train)


print( 'The R2 score of Ridge with alpha =100 is '+ str( model.score(X_test,Y_test)))
print('The number of non zero coefficients is ' + str(np.where(model.coef_ != 0)[0].shape[0]) )
print(' ')

model2 = Lasso(alpha=100)
model2.fit(X_train, Y_train)

print( 'The R2 score of Lasso with alpha =100 is '+ str( model2.score(X_test,Y_test)))
print('The number of non zero coefficients is ' + str(np.where(model2.coef_ != 0)[0].shape[0]) )
print(' ')

model3 = Lasso(alpha=0.001)
model3.fit(X_train, Y_train)

print( 'The R2 score of Ridge with alpha =0.001 is '+ str( model3.score(X_test,Y_test)))


The R2 score of Ridge with alpha =100 is 0.5925358036157627
The number of non zero coefficients is 13
 
The R2 score of Lasso with alpha =100 is 0.11866916175527807
The number of non zero coefficients is 2
 
The R2 score of Ridge with alpha =0.001 is 0.6350353125168686


### D) Comment on your result. In this problem, do all feature seem important in making predictions?
1. In this case we can see, the selection of alpha is very important, because it strongly affects the performance of these models. Firstly, if the alpha is too big, many features will lost it's power in predicting the result, because the loss function will tend to diminish the big coefficients, and finally the model becomes underfitted. But if the alpha is too small, then the model will become the normal linear regression. In this case, we can see, not all the features are important. Only the feature with non-zero coefficients can be reagared as useful.