<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">
 
# Introduction to Decision Trees
 
_Author: B Rhodes (DC)_


---

This lesson is all about **_decision trees_** a non-parametric method that can be used for regression or classification. We'll discuss both approaches, but spend most of our time discussing decision trees for classification. Decision trees are rule-based classifiers and have been in use for decades. The longevity of the method is due to its simplicity and effectiveness for routine classification tasks with performance that is on par with more sophisticated approaches. Decision trees and their variants are common in practice because they have decent effectiveness without sacrificing explainability.

##### Learning Objectives
Students will be able to:

- What a decision tree is and what it is used for.
- Explain how a decision tree is built.
- Build a decision tree model in scikit-learn.
- Tune a decision tree model and explain how tuning impacts the model.
- Describe the key differences between regression and classification trees.
- Determine whether or not a decision tree is an appropriate model for a given problem.


##### Lesson Guide

- [Introduction to Decision Trees](#introduction)
- [Part 1: Decision Tree Classifiers](#part-one)
    - [Entropy & Information Gain](#entropy)
    - [Guided Exercise: Compute Information Gain](#group-exercise)
    - [Build a Classifier in `scikit-learn`](#computer-build)
    - [Tuning a Classification Tree](#tuning-tree)
    - [Making Predictions for the Testing Data](#testing-preds)

- [Part 2: Regression Trees](#part-two)
    - [Comparing Regression Trees and Classification Trees](#comparing-trees)
    - [Cut Points for Regression Trees](#cutpoint-demo)
    - [Building a Regression Tree in `scikit-learn`](#sklearn-ctree)

- [Recap](#recap)
    


In [6]:
# standard imports
import pandas as pd 
import numpy as np 

# sklearn imports
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier 
from sklearn.metrics import accuracy_score, roc_curve, auc, confusion_matrix, classification_report
from sklearn import tree 
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
#from sklearn.externals.six import StringIO  
from sklearn.tree import export_graphviz

import matplotlib.pyplot as plt
%matplotlib inline

from IPython.display import Image  
#import pydotplus


# Decision Tree Overview

If you recall logistic regression generates a parametric model that can be represented by a function 

$$f(x) = \frac{1}{1 + \text{e}^{-(\beta_0 + \beta_1 x)}}$$

Decision trees are non-parametric type of classifier that works by performing a **recursive partition of the sample space**. There is no function that describes the partition. A decision tree is a **directed acyclic graph** consiting of **nodes** connected by **edges**. The tree starts at the root node, which is the node that has no incoming edges. All other nodes have one (and only one) incoming edge. Nodes with outgoing edges are called **internal** nodes. Nodes with an incoming edge, but no outgoing edges, are called **leaves** or **terminal nodes**. 

<img src="./assets/dt_model.png" width="400">

Consider the decision tree above, each internal node partitions the sample space into two (or more) sub-spaces. At each node we ask a question and the answer divides the space according to some discrete function that takes the features of the sample space as input.

The simplest form is where each node considers a single feature and the space is partitioned according to the value of that feature or attribute. In general, each internal node checks for a condition and makes a decision, and every leaf node represents a discrete class. In essence, a decision tree is a just series of IF-ELSE statements (rules). Each path from the root of a decision tree to one of its leaves can be transformed into a rule simply by combining the decisions along the path to form the antecedent, and taking the leaf’s class prediction as the consequence.

## Process of Building Decision Trees

Decision trees are a supervised method, which means we provide it labeled data. Building decision trees follows a relatively simple process (**recursive binary splitting**) described below:

1. Input a dataset of training samples consisting of features (predictors) and a target (labels). 

2. The decision tree is trained by making splits for the target using the values of features. Feature selection occurs by using metrics that we'll define below, such as "__information gain__" and "__Gini Index__".

3. The tree is *grown* until we reach a predefined __stopping criteria__. Stopping criteria can include the max depth of the tree, minimum samples per leaf, or other similar measures.

4. Make inferences on unseen data. When we present new data (unlabeled) to the tree it propagates through the nodes of the trained tree. The class predictions is determined by the resulting leaf node. 

### Splitting Criteria (Classification)

Splits are made according to a **cost function** and the split with the lowest cost is selected. The two primary metrics used ar **entropy** and **Gini Index** (these will be described later in this notebook). These two metrics are embodied in two  different tree building algorithms.

* __ID3 (Iterative Dichotomiser 3)__ uses entropy function and information gain.
* __CART (Classification and Regression Trees)__ use Gini Index.
 
Decision trees use a top-down *greedy search* method. At each node, the algorithm determines that best classifies the training data uses this feature to define the root of the tree. Then it considers the next best nodes, and so on. For the ID3 method above the best feature is determined by how much "information" or the **information gain** the feature provides. Since decision trees always try to maximize the information gain the first split (root node) have the highest information gain.

# Decision Tree Classifiers

## Entropy & Information Gain

**_Information gain_** is calculated using a statistical measure called **_Entropy_**. You may be familiar with entropy from science, mathematics, or computer science (information theory). 

> __Entropy is a measure of disorder or uncertainty.__

Without getting into the details, we can loosely describe entropy as an indicator of how messy the data is.  A high degree of entropy always reflects "messed-up" data with low/no information content. The uncertainty about the content of the data, before viewing the data remains the same (or almost the same) as that before the data was available. 

> Claude Shannon’s entropy quantifies the amount of information in a variable, thus providing the foundation for a theory around the notion of information.

For our purposes, higher entropy means less predictive power for doing data science with that data. 

Consider that for a given dataset the initial entropy will be high. Decision trees essentially work to reduce the entropy by separating the data and re-grouping it into their respective classes.

## Decision Trees & Entropy
Decision trees use a supervised learning approach, meaning we know the target variable for our data. We build the tree by maximizing the *purity* (decreasing the entropy) of the nodes as much as possible while making splits, aiming to have clearly defined leaf nodes. In practice, it's not generally possible to remove all the uncertainty i.e., to fully clean up the data. 

<img src="./assets/split_fs.png" width="200">

As you can see the split has not fully classified the data above, the leaves are not *pure*. However, the resulting data is a lot neater than it was before the split. Using splits that focus on different features we're able to separate the data as much as possible in the leaf nodes. At each step, we want to decrease the entropy. This requires that we compute entropy before and after the split. If entropy decreases, the split is retained and we can proceed to the next step, otherwise, we try to split on different feature or stop this branch. Or we quit, in which means the resulting tree is the best solution.


### An Example: Entropy & Information Gain

First, lets give a mathematical definition to entropy and information gain that you can use. Assume we have a dataset with two or more classes and $P(x_i)$ represents the probability of the class $x_i$ out of all possible classes. Then the total entropy is given by:

$$H(X) = -\sum P(x_i) . \log_2(P(x_i))$$

#### Information Gain
When we measure information gain, we're really measuring the difference in entropy from before the split (an untidy sock drawer) to after the split (a group of white socks and underwear, and a group of non-white socks and underwear). Information gain allows us to put a number to exactly how much we've reduced our _uncertainty_ after splitting a dataset $S$ on some attribute, $F$.  The equation for information gain is:

$$ IG(A, X) = H(S) - \sum{}{P(x_i)H(x_i)}  $$

Where:

* $H(S)$ is the entropy of set $S$
* $x_i$ is a subset of the attributes contained in $F$ (all subsets $x_i$ are denoted $X$)
* $P(x_i)$ is the proportion of the number of elements in $x_i$ to the number of elements in $X$
* $H(x_i)$ is the entropy of a given subset $x_i$ 

Entropy is the metric used in the ID3 algorithm. So we use entropy to compute information gain, and then pick the attribute with the largest possible information gain to split our data on at each iteration. 


### Guided Exercise - Compute Entropy & Info Gain by hand

Let's revisit the example problem from earlier. We want to decide should we go for a walk or read given the weather conditions. Here is the data in a slightly different form:

|  weather | temp | humidity | windy | walk |
|:--------:|:----:|:--------:|:-----:|:----:|
| **overcast** | cool |   high   |   Y   |  **yes** |
| **overcast** | mild |  normal  |   N   |  **yes** |
| **sunny**  | cool |  normal  |   N   |  **yes** |
| overcast |  hot |   high   |   Y   |  no  |
|   **sunny**  |  hot |  normal  |   Y   |  **yes** |
|   rain   | mild |   high   |   N   |  no  |
|   rain   | cool |  normal  |   N   |  no  |
|   **sunny**  | mild |   high   |   N   |  **yes** |
|   **sunny**  | cool |  normal  |   Y   |  **yes** |
|   **sunny**  | mild |  normal  |   Y   |  **yes** |
| **overcast** | cool |   high   |   N   |  **yes** |
|   rain   | cool |   high   |   Y   |  no  |
|   sunny  |  hot |  normal  |   Y   |  no  |
|   **sunny**  | mild |   high   |   N   |  **yes** |

**Exercise**: write a function `entropy()` to calculate total entropy for a given discrete probability distribution `Pi`.

- The function should input a probability distribution `Pi` as an array of class distributions

In [7]:
from math import log
def entropy(Pi):
    """
    return the Entropy of a probability distribution:
    entropy(p) = - SUM (Pi * log(Pi) )
    """
    
    # your code here
    
    pass # replace this with your function

# test the function with an assert statement.
assert entropy([1, 1]) == 1

# Then verify the function with the examples below. Expected results are listed at the bottom of this cell.
print(entropy([1, 1])) # Maximum Entropy e.g. a coin toss
print(entropy([2, 10])) # A random mix of classes
print(entropy([0, 7])) # No entropy, ignore the - with zero , its there due to log function
print(entropy([11,6])) # Another random mix of classes


# Expected outcomes
#1.0
#0.6500224216483541
#-0.0
#0.9366673818775626

AssertionError: 

**Exercise** Write a function `IG(D,a)` to calculate the information gain 

- The function should input `D` as a class distribution array for target class, and `a` the class distribution of the attribute to be tested
- Using the `entropy()` function above, calculate the information gain as:

$$IG(D,A) = Entropy(D) - \sum(\frac{|D_i|}{|D|}.Entropy(D_i))$$

where `Di` represents distribution of each class in `a`.



In [8]:
def IG(D, a):
    '''
    return the information gain:
    gain(D, a) = entropy(D)− SUM( |Di| / |D| * entropy(Di) )
    '''
    
    #your code here

    pass # replace this with your function


     
    
assert IG([8, 8], [ [3,1], [2,6], [2,2] ]) == 0.1415414066556504
# set of example of the dataset - distribution of classes
test_dist = [8, 8] # Yes, No
# attribute, number of members (feature)
test_attr = [ [4,0], [1,7], [0,4] ] # class1, class2, class3 of attr1 according to YES/NO classes in test_dist

print(IG(test_dist, test_attr))

# Expected value
# 0.7282177784002017


AssertionError: 

### Pick the Root Node
Use the above functions to determine the root node. 
1. Determine the class distribution for each target class (a list of frequencies).
2. Determine the class distribution for each target class for each feature.

In [9]:
# Your code here - fill in the distributions
will_walk = [Y,N] # Y, N

# feature categories
windy = [ [Y,N], [Y,N] ] # Y,N: Y, N
humidity = [ [4,3], [5,2] ] # high, normal: Y,N
temp = [ [1,2], [4,1], [4,2] ] # hot, mild, cool: Y,N
weather = [ [6,1], [3,1], [0,3] ]  # sunny, overcast, rain: Y,N


# define dictionary for each feature
conditions = {'windy':windy, 'humidity':humidity, 'temp':temp, 'weather':weather}
# loop thru the conditions
print("Information Gain by Condition:")

# loop thru the conditions & build dictionary to hold Information Gain for each feature
gain = {}
for condition, dist in conditions.items():
    result = IG(will_walk, dist)
    gain[condition] = result
    print(condition+':', gain[condition])
    max_gain_condition = max(gain, key=gain.get)

print("\nSplit on max gain condition: ", max_gain_condition)




NameError: name 'Y' is not defined

### Pick the Next  Node (sunny)
Repeat the process relative to the sunny node. 
1. Determine the class distribution for each target class (a list of frequencies).
2. Determine the class distribution for each target class for each feature.

In [5]:
sunny_walk = weather[0] # we only need the first distribution from outlook.

# condition:outcome
windy_sun = [ [3,1], [3,0] ] # y,n:y,n
temp_sun = [ [1,1],[3,0], [2,0]] # hot, mild, cool:y,n
humidity_sun = [[2,0],[4,1]] # hi,norm:y,n


# define dictionary for each remaining feature
conditions_sunny = {'windy':windy, 'humidity':humidity, 'temp':temp}

print("Information Gain by Condition:")

# loop thru the conditions & build dictionary to hold Information Gain for each feature
gain_sunny = {}
for condition, dist in conditions_sunny.items():
    result = IG(will_walk, dist)
    gain_sunny[condition] = result
    print(condition+':', gain_sunny[condition])
    max_gain_sunny = max(gain_sunny, key=gain_sunny.get)

print("\nSplit on max gain for sunny condition: ", max_gain_sunny)




NameError: name 'weather' is not defined

# Decision Trees with scikit-learn

We've had an overview of decision trees and how they are built using entropy. Here we show how to use decision trees (for classification) using scikit-learn and pandas. We'll walk through an example showing the basics and understanding the resulting decision tree. As before we'll use scikit-learn's consistent interface for running classifiers/regressors. Since this is a classification task we use the same metrics as we've used before (*e.g.* confusion matrix, roc, auc, etc.). 

Let's analyze the dataset above.

## Guided Exercise

The "walk" dataset is available in the repo as `walk.csv`. 
1. Import the necessary modules.
2. Load the dataset into a dataframe
3. Encode the data as numerical values (all our features are categorical).
    1. Our target variable is binary so we'll use ```LabelEncoder.```
    2. The features are multi-class so we'll use ```OneHotEncoder.```


**Instructions:** 
- Apply labels to target variable such that `yes=1` and `no=0`
- Apply one hot encoding to the feature set, creating ten features (outlook x 3, temp x 3, humidity x 2 , wind x 2) 
- Print the resulting features and check shape

### Load the data

In [None]:
# # For reference so we don't have to scroll

# # sklearn imports
# from sklearn.model_selection import train_test_split
# from sklearn.tree import DecisionTreeClassifier 
# from sklearn.metrics import accuracy_score, roc_curve, auc, confusion_matrix, classification_report
# from sklearn import tree 
# from sklearn.preprocessing import LabelEncoder, OneHotEncoder
# #from sklearn.externals.six import StringIO  
# from sklearn.tree import export_graphviz

# import matplotlib.pyplot as plt
# %matplotlib inline

# from IPython.display import Image  
# #import pydotplus



In [None]:
# Load the dataset
walk_df = pd.read_csv('./data/walk.csv')
walk_data = pd.read_csv('./data/walk.csv')
walk_df

### Encode the Data
As mentioned all our features are categorical so we need to encode them to numerical values. Previously we used ```pandas'``` ```get_dummies()``` method to create dummy variables. This is also known as one-hot encoding. Below we use ```sci-kit learn's``` built-in methods to encode our data. We'll use ```LabelEncoder``` to encode binary features or targets and we use the ```OneHotEncoder``` to encode non-binary features.

In [None]:
# Create label encoder instance
lb = LabelEncoder() 

# Create Numerical labels for classes
walk_df['walk_'] = lb.fit_transform(walk_df['walk'] ) 
walk_df['weather_'] = lb.fit_transform(walk_df['weather']) 
walk_df['temp_'] = lb.fit_transform(walk_df['temp'] ) 
walk_df['humidity_'] = lb.fit_transform(walk_df['humidity'] ) 
walk_df['windy_'] = lb.fit_transform(walk_df['windy'] ) 

# Split features and target variable
X_enc = walk_df[['weather_', 'temp_', 'humidity_', 'windy_']] 
y = walk_df['walk_']

# Instantiate a one hot encoder
one_enc = OneHotEncoder(categories='auto')

# Fit the feature set X
one_enc.fit(X_enc)

# Transform X to onehot array 
onehotX = one_enc.transform(X_enc).toarray()

print("OneHot Encoded Array")
print(onehotX)
print()
print("Shape of OneHotEncoded Array")
print(onehotX.shape)
print()
print("Shape of Original Feature Array")
print(X_enc.shape)

In [None]:
# Demo purposes only - let's look at the one-hot encoded matrix
cols = ['weather_0', 'weather_1','weather_2','temp_0','temp_1','temp_2', 'humidity_0','humidity_1', 'windy_0','windy_1']

onehotX_df = pd.DataFrame(onehotX, columns=cols)
onehotX_df.head()

In [None]:
# reset X for demo purposes
X = walk_data[['weather', 'temp', 'humidity', 'windy']] 


To illustrate the equivalence between one-hot encoding in sklearn and `get_dummies()` in pandas let's look at how we would do this in Pandas.

In [None]:
#X = X.astype('category')  # data is numerical, so we need to recast it as categorical
X_dummy = pd.get_dummies(X, drop_first=True) # get_dummies and drop the first
X_dummy.head()

In [None]:
# look at class imbalance
y.value_counts()

In [None]:
# Normalized to see ratios
y.value_counts(normalize=True)

### Make Train-Test Split

We've encoded our data now we need to split our data into training and test data. Pass the encoded features and target to ```train_test_split``` using a 60/40 split. 

**Question:** Is train test split the best approach here? Why? What other approach might work?

In [None]:
# split the one-hot encoded data
X_train, X_test , y_train,y_test = train_test_split(onehotX, y, test_size = 0.4, random_state = 42) 

## Split the pandas dummy data
#X_train, X_test , y_train,y_test = train_test_split(X_dummy, y, test_size = 0.4, random_state = 42) 


### Build the Decision Tree 

No need to guess here, we use the same scikit-learn pattern to build the model. We use the same 3-step process, which includes `.fit()` and `.predict()`. We first create an instance of the classifier with appropriate parameter values, then we fit the data to the model using `.fit()` and make predictions with the test data (`X_test`) using `.predict()`. 

In [None]:
# Instantiate
clf= DecisionTreeClassifier(criterion='entropy')

# Fit the model
clf.fit(X_train,y_train) 

# Generate inferences
y_pred = clf.predict(X_test)

### Evaluate the Model Performance

Our model is trained and we've made some inferences so we need to determine how well our model performs. This is a classification problem so we use the standard confusion matrix and related metrics. Notice that this step is the same as when we used KNN or logistic regression. It doesn't matter which classifier you are using, the performance measures are the same.

In [None]:
# Look at feature importance
print(len(df.feature_importances_))

clf.feature_importances_

The above array would be hard to sort out if we had many more features. Below is a simple construct to view the importance and associated feature name.

In [None]:
## put feature importance into a dataframe - uncomment only one statement below.

# use this if you used the one-hot encoded data
pd.DataFrame({'feature':onehotX_df.columns, 'importance':clf.feature_importances_})

# use this if you used the pandas dummy data
#pd.DataFrame({'feature':X_train.columns, 'importance':clf.feature_importances_})


###### Evaluate the Results
We can compute metrics now that we have a model. But how do we know if our model is worth it? First we compare it to a baseline model, which in this case means, determinging what the most frequent class is?

So we define our baseline model on the most frequent class in the training data.

In [None]:
# look at the class balance again.
y.value_counts(normalize=True)


In [None]:
# Baseline model is defined by the most frequent class in our training data

y_baseline = y_train.value_counts().index[0]
baseline_acc = round(y_train.value_counts(normalize=True)[y_baseline]*100,2)
print(f'Most Frequent Category: {y_baseline}')
print(f'Percentage Most Frequent Category: {baseline_acc}%')


In [None]:
# Calculate Accuracy , AUC and Confusion matrix 
accuracy = accuracy_score(y_test, y_pred)

# get roc auc info
fpr, tpr, thresholds = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)

print("Accuracy is : "+ str(round(accuracy,3)*100)+"%")
print("AUC is : "+str(round(roc_auc,3)))

# confusion matrix
print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

#### Plot ROC

In [None]:
# Plot the ROC 
plt.figure(figsize=(10, 8))
lw = 2
plt.plot(fpr, tpr, color='darkorange',
         lw=lw, label='ROC curve')
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.yticks([i/20.0 for i in range(21)])
plt.xticks([i/20.0 for i in range(21)])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic (ROC) Curve for Test Set')
plt.legend(loc='lower right')
print('Test AUC: {}'.format(auc(fpr, tpr)))
plt.show()

In [None]:
## Compute other confusion matrix metrics

tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()

#precision tp/(tp+fp)

precision = tp/(tp+fp)

# Sensitivity (recall) tp/(tp+fn)
sensitivity = tp/(tp+fn)

# Specificity tn/(tn+fp)
specificity = tn/(tn+fp)

# false negative rate (miss rate) fn/(fn+tp)
fnr = fn/(fn+tp)

fpr = fp/(fp+tn)


print(f'precision: {precision}')
print(f'sensitivity: {sensitivity}')
print(f'specificity: {specificity}')

# f1 score - (2*tp)/(2*tp+fp+fn))
f1 = (2*tp)/(2*tp+fp+fn)
print(f'f1: {f1}')


print(f'fnr: {fnr}')
print(f'fpr: {fpr}')


In [None]:
clf_report =   classification_report(y_test, y_pred)
print(clf_report)

## Self-paced Exercise - Forgery or not Forgery
For this in-class exercise, we'll work with the "UCI Bank Note Authentication Dataset'. This data identifies genuine and forged banknotes and is based on images of actual currency (forged and real). The notes were first digitized, followed by a numerical transformation using wavelet transform techniques. The resulting set of engineered features are all continuous so no categorical data to worry about.

We have following features and target in the dataset. 

1. __Variance__ of Wavelet Transformed image (continuous) 
2. __Skewness__ of Wavelet Transformed image (continuous) 
3. __Curtosis__ of Wavelet Transformed image (continuous) 
4. __Entropy__ of image (continuous) 
5. __Class__ (integer) - Target/Label 

We've already imported all the libraries we need so just start with loading the data.

### Import Data

Load the dataset in a DataFrame, perform some basic EDA, and generally get a feel for the data we'll be working with.

- The dataset is at ```url = "data/banknote.csv"``` Load this as a pandas dataframe. Note that there is no header information in this dataset, so be sure to use `header=None`.
- Assign column names 'Variance', 'Skewness', 'Curtosis', 'Entropy', 'Class' to dataset in the given order.
- View the shape and data types of dataset, as well as summary statistics.
- Check for frequency of positive and negative examples in the target variable

In [10]:
## Your code here

path = "../data/banknote.csv"

banknotes_df = None.  # replace to read in the data

# Assign column names

# verify the data frame


SyntaxError: invalid syntax (<ipython-input-10-8530f91f72c8>, line 5)

In [None]:
# check the shape and data types


In [None]:
# check the summary statistics


In [None]:
# check for any imbalance in class labels


**Answer:** There are no major imbalances in the classes.

###  Assign Feature and Target Variables 

Next we create our feature set `X` and labels `y`. 
- Create `X` and `y` by selecting the appropriate columns from the dataset
- Create a 80/20 split on the dataset for training/testing. Use `random_state=42` for reproducibility

In [None]:
## Your code here
feature_cols = None

X = None
y = None

In [None]:
# train test split 
# Create a 80/20 split on the dataset for training/testing. 
# Use `random_state=42` for reproducibility

X_train, X_test , y_train, y_test = None

### Train the Classifier and Make Predictions
Use the standard process to build a classification model. 

In [None]:
# build your model and make predictions
# Your code here



In [None]:
# create dataframe of feature importance
# Your code here



### Check Model Performance

We can now use different evaluation measures to check the predictive performance of the classifier. 
- State what the baseline model is and the baseline accuracy for this data.
- Check the accuracy of your classifier, AUC and create a confusion matrix 
- Plot the ROC
- Interpret the results 

In [None]:
# Baseline model

# Baseline model is defined by the most frequent class

# your code here print the model and the relevant metric.


In [None]:
# Calculate Accuracy , AUC and Confusion matrix 
accuracy = None

# get roc auc info
fpr, tpr, thresholds = None
roc_auc = None

print("Accuracy is : "+ str(round(accuracy,3)*100)+"%")
print("AUC is : "+str(round(roc_auc,3)))

cm = None
# confusion matrix
print('Confusion Matrix:')
print(cm)

In [None]:
# Do not change this code.

# Plot the ROC 
plt.figure(figsize=(10, 8))
lw = 2
plt.plot(fpr, tpr, color='darkorange',
         lw=lw, label='ROC curve')
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.yticks([i/20.0 for i in range(21)])
plt.xticks([i/20.0 for i in range(21)])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic (ROC) Curve for Test Set')
plt.legend(loc='lower right')
print('Test AUC: {}'.format(auc(fpr, tpr)))
plt.show()

In [None]:
# Calculate Accuracy , AUC and Confusion matrix 
accuracy = None

# get roc auc info
fpr, tpr, thresholds = None
roc_auc = None

print("Accuracy is : "+ str(round(accuracy,3)*100)+"%")
print("AUC is : "+str(round(roc_auc,3)))

cm = None
# confusion matrix
print('Confusion Matrix:')
print(cm)

print()
## Compute other confusion matrix metrics

tn, fp, fn, tp = None

#precision tp/(tp+fp)

precision = tp/(tp+fp)

# Sensitivity (recall) tp/(tp+fn)
sensitivity = tp/(tp+fn)

# Specificity tn/(tn+fp)
specificity = tn/(tn+fp)

# false negative rate (miss rate) fn/(fn+tp)
fnr = fn/(fn+tp)

fpr = fp/(fp+tn)


print(f'precision: {precision}')
print(f'sensitivity: {sensitivity}')
print(f'specificity: {specificity}')

# f1 score - (2*tp)/(2*tp+fp+fn))
f1 = (2*tp)/(2*tp+fp+fn)
print(f'f1: {f1}')


print(f'fnr: {fnr}')
print(f'fpr: {fpr}')


In [None]:
clf_report = None
print(clf_report)

# Conclusion
