# <font color="blue">Lesson 7 Performance Metrics & Hyperparameters</font>

## Gradient Descent

The most common optimization algorithm used in machine learning is stochastic gradient descent.

Remember that gradient descent is a cost function, used as a hyperparamter to improve your model. It is not a machine learning model.  

For this lab, we'll use the [wine data set](https://canvas.uw.edu/courses/1188730/files/47572077) that we used for HW1. We'll use a linear regression model to see if we can use the wine features to predict whether or not a wine is white (0) or red (1). 

In [None]:
import pandas as pd
wine_df = pd.read_csv("https://library.startlearninglabs.uw.edu/DATASCI420/Datasets/RedWhiteWine.csv")

In [None]:
# Display the first 5 rows of the dataframe


In [None]:
# the class column has already been encoded to 0 and 1 for you
wine_df["Class"].value_counts()

## Pre-processing
Before we can use SGD and linear regression on this data set, we must do ensure that each attribute is on the same scale. We can do this to normalizing each attribute to a range bewteen 0 and 1. 


In [None]:
# use sklearn's preprocessing module to scale your data between 0 and 1
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()

In [None]:
# fit and transform the numerical columns in your dataset
scaled_wine = min_max_scaler.fit_transform(wine_df)

In [None]:
# convert the scaled arrays to a pandas dataframe and provide column names
scaled_df = pd.DataFrame(scaled_wine, columns = wine_df.columns)

In [None]:
# Display the first 5 rows of the scaled dataframe. How does it compare to the original?


## Pull out features and targets

In [None]:
# first pull out the features from the scaled dataset
wine_features = scaled_df.drop("Class", axis=1)
wine_features.head()

In [None]:
# now pull out the targets from the scaled_df
wine_targets = scaled_df["Class"]

### Consider this
What should this targets dataframe look like? Display the first 5 rows in the cell below.

## Split dataset into training and test sets
Now that we've normalized our data set to be on the a scale between 0 and 1, we can split our data into training and test sets. 

In [None]:
from sklearn.model_selection import train_test_split
feature_train, feature_test = train_test_split(wine_features, test_size = 0.20)
target_train, target_test = train_test_split(wine_targets, test_size = 0.20)

## Implement Stochastic Gradient Descent (SGD)
We can run SGD just like any other model using the following parameters: 
- loss: The loss function to be used. Defaults to ‘hinge’, which gives a linear SVM.The possible options are ‘hinge’, ‘log’, ‘modified_huber’, ‘squared_hinge’, ‘perceptron’, or a regression loss: ‘squared_loss’, ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’.  

- alpha: Constant that multiplies the regularization term. Defaults to 0.0001 Also used to compute learning_rate when set to ‘optimal’.  

- max_iter:  number of passes over the training data 

- penalty: regularization term; ‘none’, ‘l2’, ‘l1’, or ‘elasticnet’. The penalty (aka regularization term) to be used. Defaults to ‘l2’ which is the standard regularizer for linear SVM models. ‘l1’ and ‘elasticnet’ might bring sparsity to the model (feature selection) not achievable with ‘l2’.

See [sklearn's website](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html) for additional parameters. 

In [None]:
# instantiate model and fit the training data
from sklearn.linear_model import SGDClassifier
clf = SGDClassifier(loss="log", penalty="l2", max_iter=5 )
clf.fit(feature_train, target_train)

In [None]:
# run predictions with our testing set: 
pred = clf.predict(feature_test)

In [None]:
pred

## Assess Model Accuracy
Now that we have made predictions, we can use sklearn's accuracy_score method, and the target testing set we held aside to assess how accurate our model is. 

In [None]:
from sklearn.metrics import accuracy_score
print (accuracy_score(pred, target_test))

### Consider this
Is our model overfit? Is that accuracy score acceptable? What else could be done to improve the accuracy?