# Phase 3 Review

![review guy](https://media.giphy.com/media/3krrjoL0vHRaWqwU3k/giphy.gif)

# TOC 

1. [Gradient Descent](#grad_desc)
2. [Logistic Regression](#logistic)
3. [Confusion Matrix](#con_mat)
4. [Accuracy/Precision/Recall/F1](#more_metric)
5. [auc_roc](#auc_roc)
3. [Algos](#algos)

## Gradient Descent

### Question
What is a loss function? (Explain it in terms of the relationship between true and predicted values) 


### Question: 

What loss functions do we know and what types of data work best with each?

With a parametric model, such as linear regression, describe how the parameters (betas) influence the loss.  

Below, you will see a set of predictors created from numpy's random normal function.  There is also a dependent feature created by adding some noise to feature_1.  We will use this contrived data to practice and think about gradient descent.

In [6]:
# Run no changes
import numpy as np
import pandas as pd 

np.random.seed(42)

feature_1 = np.random.normal(0,1,1000)
feature_2 = np.random.normal(1,2,1000)
feature_3 = np.random.normal(2,3,1000)

X = pd.DataFrame()

X['f_1'] = feature_1
X['f_2'] = feature_2
X['f_3'] = feature_3

y = feature_1 + np.random.normal(0,.5,1000)

Let's start with the following set of guesses for our betas. 

In [None]:
# initial guesses
beta_f1 = 0
beta_f2 = 0
beta_f3 = 0
intercept = 0

In [7]:
# create an array, y_hat, of the predictions based on the initial guesses.

In [8]:
# calculate the loss with the initial guesses

In [9]:
# nudge the beta for f1 up by .1 and create new predictions

In [None]:
# calculate the new loss

In [10]:
# now calculate what the loss would be  if the f1 beta was nuged .01 in the opposite direction

### Question
Which direction should we nudge the f_1 beta?

### Question: 
The above is trial and error.  Instead of trial and error, we can use calculus.  How do we use the partial derivative of the loss function to update the parameters?

### Question:
What is a step size when talking about gradient descent?

### Question

Why does step size decrease as we approach minimum loss?

## Question

How does learning rate regulate step size?

<a id='logistic'></a>

# Logistic Regression and Modeling

What type of target do we feed the logistic regression model?

Is logistic regression a parametric or non-parametric model?

Describe the journey from beta's to 0/1 predictions.

Is logistic regression a black box model?

What hyperparameters are important when tuning logistic regression models?

Your dataset is highly imbalanced, and your logistic regression has a poor precision score.  What is one way that you might possibly boost the number of positive predictions?

# Now let's code

In [None]:
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer(as_frame=True)
X = data['data']
y = data['target']
X.head()

In [None]:
# Perform a train-test split

### Question: 
What is the purpose of train/test split?  


### Question: 
Why should we never fit to the test portion of our dataset?

In [None]:
# Scale the training set using a standard scaler
ss = None
X_train_scaled = None

In [None]:
X_train_scaled.head()

### Question: 
Why is scaling our data important? For part of your answer, relate to one of the advantages of logistic regression over another classifier.

In [None]:
# fit model with logistic regression to the appropriate portion of our dataset

Now that we have fit our classifier, the object `lr` has been filled up with information about the best fit parameters.  Take a look at the coefficients held in the `lr` object.  Interpret what their magnitudes mean.

In [None]:
# Inspect the .coef_ attribute of lr and interpret

Logistic regression has a predict method just like linear regression.  Use the predict method to generate a set of predictions (y_hat_train) for the training set.

In [None]:
# use predict to generate a set of predictions
y_hat_train = None

<a id='con_mat'></a>

## Confusion Matrix

Confusion matrices are a great way to visualize the performance of our classifiers. 

### Question: 
What does a good confusion matrix look like?

In [None]:
# create a confusion matrix for our logistic regression model fit on the scaled training data

## Accuracy/Precision/Recall/F_1 Score

We have a bunch of additional metrics, most of which we can figure out from the CM

## Question: 
Define accuracy. What is the accuracy score of our classifier?

In [None]:
# Confirm accuracy in code

## Question: 
Why might accuracy fail to be a good representation of the quality of a classifier?

## Question: 
Define recall. What is the recall score of our classifier?

In [None]:
# Confirm recall in code

## Question: 
Define precision. What is the precision score of our classifier?

In [None]:
# Confirm precision in code

## Question: 
Define f1 score. What is the f1 score score of our classifier?

In [None]:
# Confirm f1 score in code

<a id='auc_roc'></a>

## Auc_Roc

The AUC_ROC curve can't be deduced from the confusion matrix.  Describe what the AUC_ROC curve shows. 
Look [here](https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5) for some nice visualizations of AUC_ROC.

Describe the AUC_ROC curve.  What does a good AUC_ROC curve look like? What is a good AUC_ROC score?

In [None]:
# Plot the AUC_ROC curve for our classifier

<a id='algos'></a>

# More Algorithms

Much of the sklearn syntax is shared across classifiers and regressors.  Fit, predict, and score are methods associated with all sklearn classifiers.  They work differently under the hood. KNN's fit method simply stores the training set in memory. Logistic regressions .fit() does the hard work of calculating coefficients. 

![lazy_george](https://media.giphy.com/media/8TJK6prvRXF6g/giphy.gif)

However, each algo also has specific parameters and methods associated with it.  For example, decision trees have feature importances and logistic has coefficients. KNN has n_neighbors and decision trees has max_depth.


Getting to know the algo's and their associated properties is an important area of study. 

That being said, you now are getting to the point that no matter which algorithm you choose, you can run the code to create a model as long as you have the data in the correct shape. Most importantly, the target is the appropriate form (continuous/categorical) and is isolated from the predictors.

Here are the algos we know so far. 
 - Linear Regression
 - Lasso/Ridge Regression
 - Logistic Regression
 - Naive-Bayes
 - KNN
 
> Note that KNN and decision trees also have regression classes in sklearn.


Here is a dataset from seaborn.  Let's work through the process of creating simple Decision Tree model for it.

In [1]:
import seaborn as sns
penguins = sns.load_dataset('penguins')
penguins.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female


## Decision Trees: 

In [None]:
# split target from predictors

For the first simple model, let's just use the numeric predictors.

In [None]:
# isolate numeric predictors

In [None]:
# Scale appropriately

In [None]:
# instantiate appropriate model and fit to appropriate part of data.

In [None]:
# Create a set of predictions

y_hat_train = None
y_hat_test = None


In [None]:
# Create and analyze appropriate metrics

## kNN

In [7]:
# Using the previously scaled data, create a kNN Classifier

In [8]:
# Compare the results with the Decision Tree