# Decision Trees and Random Forests in Python

## Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Get the Data

In [None]:
df = pd.read_csv('kyphosis.csv')

In [None]:
df.head()

In [None]:
df.info()

## EDA

We'll just check out a simple pairplot for this small dataset.

In [None]:
sns.pairplot(df, hue='Kyphosis')

## Train Test Split

Let's split up the data into a training set and a test set!

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X = df.drop('Kyphosis', axis=1)
y = df['Kyphosis']

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

## Decision Trees

We'll start just by training a single decision tree.

In [None]:
from sklearn.tree import DecisionTreeClassifier

In [None]:
dtree = DecisionTreeClassifier()

In [None]:
dtree.fit(X_train, y_train)

## Prediction and Evaluation 

Let's evaluate our decision tree.

In [None]:
predictions = dtree.predict(X_test)

## Tree Visualization

Scikit learn actually has some built-in visualization capabilities for decision trees, you won't use this often and it requires you to install the pydot library, but here is an example of what it looks like and the code to execute this:

In [None]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

In [None]:
print(accuracy_score(y_test, predictions))

In [None]:
print(classification_report(y_test, predictions))

In [None]:
print(confusion_matrix(y_test, predictions))

## Random Forests

Now let's compare the decision tree model to a random forest.

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
ranfor = RandomForestClassifier(n_estimators=200)

In [None]:
ranfor.fit(X_train, y_train)

# Prediction and Evaluation


In [None]:
pred = ranfor.predict(X_test) 

## Tree Visualization

In [None]:
print(confusion_matrix(y_test, pred))

In [None]:
print(classification_report(y_test, pred))