# Decision Trees in Python

Reference: http://net-informations.com/ds/mla/dtree.htm, https://www.kaggle.com/code

Credit (Image) from https://www.osmosis.org/learn/Lordosis,_kyphosis,_and_scoliosis

![](https://d16qt3wv6xm098.cloudfront.net/D8vzGbPOSmitZdUZkrleQYi-SZ6ZFOpZ/_.jpg)

# Install this package before starting this lab

## Import Libraries

In [None]:
# For checking installed library
# %pip freeze

In [None]:
%pip install scikit-learn
%pip install numpy
%pip install pandas
%pip install seaborn

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Get the Data

In [None]:
!wget -O kyphosis.csv https://github.com/davidjohnnn/all_datasets/raw/master/bay/kyphosis.csv

In [None]:
df = pd.read_csv('kyphosis.csv')

In [None]:
df.head()

## EDA

We'll just check out a simple pairplot for this small dataset.

In [None]:
sns.pairplot(df,hue='Kyphosis',palette='Set1')

## Train Test Split

Let's split up the data into a training set and a test set!

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X = df.drop('Kyphosis',axis=1)
y = df['Kyphosis']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.30, random_state=30)

## Decision Trees

We'll start just by training a single decision tree.

http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier

In [None]:
from sklearn.tree import DecisionTreeClassifier

In [None]:
dtree = DecisionTreeClassifier(min_samples_leaf=10, criterion='entropy')

In [None]:
dtree.fit(X_train,y_train)

In [None]:
import pickle
filename = 'model.sav'
pickle.dump(dtree, open(filename, 'wb'))

## Prediction and Evaluation 

Let's evaluate our decision tree.

In [None]:
dtree = pickle.load(open(filename,'rb'))
dtree

In [None]:
predictions = dtree.predict(X_test)

In [None]:
from sklearn.metrics import classification_report,confusion_matrix

In [None]:
print(classification_report(y_test,predictions,digits=4))

In [None]:
print(confusion_matrix(y_test,predictions,labels=['absent','present']))

## Tree Visualization

Scikit learn actually has some built-in visualization capabilities for decision trees, you won't use this often and it requires you to install the pydot library, but here is an example of what it looks like and the code to execute this:

In [None]:
from sklearn import tree
tree.plot_tree(dtree)

In [None]:
print(X.columns) # feature names
print(y.unique().tolist()) # class names

In [None]:
fn=X.columns # feature names
cn=y.unique().tolist() # class names
fig, axes = plt.subplots(nrows = 1,ncols = 1,figsize = (4,6), dpi=100)
tree.plot_tree(dtree,
               feature_names = fn, 
               class_names=cn,
               filled = True);
fig.savefig('imagename.png')