# Working with Classification Trees in Python

## Learning Objectives
Decision Trees are one of the most popular approaches to supervised machine learning. Decison Trees use an inverted tree-like structure to model the relationship between independent variables and a dependent variable. A tree with a categorical dependent variable is known as a **Classification Tree**. By the end of this tutorial, you will have learned:

+ How to import, explore and prepare data
+ How to build a Classification Tree model
+ How to visualize the structure of a Classification Tree
+ How to Prune a Classification Tree 

## 1. Collect the Data

In [None]:
import pandas as pd
loan = pd.read_csv("loan.csv")
loan.head()

## 2. Explore the Data

In [None]:
loan.info()

In [None]:
loan.describe()

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt
import seaborn as sns

In [None]:
ax = sns.boxplot(data = loan, x = 'Default', y = 'Income')

In [None]:
ax = sns.boxplot(data = loan, x = 'Default', y = 'Loan Amount')

In [None]:
ax = sns.scatterplot(data = loan, 
                     x = 'Loan Amount', 
                     y = 'Income', 
                     hue = 'Default', 
                     style = 'Default', 
                     markers = ['^','o'], 
                     s = 150)
ax = plt.legend(bbox_to_anchor = (1.02, 1), loc = 'upper left')

## 3. Prepare the Data

In [None]:
y = loan[['Default']]

In [None]:
X = loan[['Income', 'Loan Amount']]

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    train_size = 0.8,
                                                    stratify = y,
                                                    random_state = 1234) 

In [None]:
X_train.shape, X_test.shape

## 4. Train and Evaluate the Classification Tree

In [None]:
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(random_state = 1234)

In [None]:
model = classifier.fit(X_train, y_train)

In [None]:
model.score(X_test, y_test)

## 5. Visualize the Classification Tree

In [None]:
from sklearn import tree
plt.figure(figsize = (15,15))
tree.plot_tree(model, 
                   feature_names = list(X.columns), 
                   class_names = ['No','Yes'],
                   filled = True);

In [None]:
importance = model.feature_importances_
feature_importance = pd.Series(importance, index = X.columns)
feature_importance.plot(kind = 'bar')
plt.ylabel('Importance');

## 6. Prune the Classification Tree

In [None]:
model.score(X_train, y_train)

In [None]:
model.score(X_test, y_test)

In [None]:
grid = {'max_depth': [2, 3, 4, 5],
         'min_samples_split': [2, 3, 4],
         'min_samples_leaf': range(1, 7)}

In [None]:
from sklearn.model_selection import GridSearchCV
classifier = DecisionTreeClassifier(random_state = 1234)
gcv = GridSearchCV(estimator = classifier, param_grid = grid)
gcv.fit(X_train, y_train)

In [None]:
model_ = gcv.best_estimator_
model_.fit(X_train, y_train)

In [None]:
model_.score(X_train, y_train)

In [None]:
model_.score(X_test, y_test)

In [None]:
plt.figure(figsize = (8,8))
tree.plot_tree(model_, 
                   feature_names = list(X.columns), 
                   class_names = ['No','Yes'],
                   filled = True);