# Understanding decision trees

This kernel was inspired by the kernel - [Do You Have Spinal Disease? Decision Tree in R](https://www.kaggle.com/petrkajzar/do-you-have-spinal-disease-decision-tree-in-r)

## Importing the necessary libraries

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
import tflearn.data_utils as du
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import seaborn as sns
from sklearn.metrics import confusion_matrix

from sklearn.externals.six import StringIO  
from IPython.display import Image  
from sklearn.tree import export_graphviz
import pydotplus

In [None]:
data = pd.read_csv('../input/column_3C_weka.csv')

The dataset used here is the [Biomechanical features of orthopedic patients](https://www.kaggle.com/uciml/biomechanical-features-of-orthopedic-patients)

In [None]:
data.info()

In [None]:
# Calculating the correlation matrix
corr = data.corr()
# Generating a heatmap
sns.heatmap(corr,xticklabels=corr.columns, yticklabels=corr.columns)

In [None]:
sns.pairplot(data)

## Splitting the dataset into independent (x) and dependent (y) variables

In [None]:
x = data.iloc[:,:6].values
y = data.iloc[:,6].values

## Splitting the dataset into train and test data
The train data to train the model and the test data to validate the model's performance

In [None]:
x_train , x_test, y_train, y_test = train_test_split(x, y, test_size = 0.25, random_state = 0)

In [None]:
sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

In [None]:
classifier = DecisionTreeClassifier(criterion = 'entropy', max_depth = 4)
classifier.fit(x_train, y_train)

## Making the prediction on the test data

In [None]:
y_pred = classifier.predict(x_test)

In [None]:
cm = confusion_matrix(y_test, y_pred)

In [None]:
accuracy = sum(cm[i][i] for i in range(3)) / y_test.shape[0]
print("accuracy = " + str(accuracy))

In [None]:
dot_data = StringIO()

export_graphviz(classifier, out_file=dot_data,  
                filled=True, rounded=True,
                special_characters=True)

graph = pydotplus.graph_from_dot_data(dot_data.getvalue())  
Image(graph.create_png())

In [None]:
classifier2 = DecisionTreeClassifier(criterion = 'entropy')
classifier2.fit(x_train, y_train)

In [None]:
y_pred2 = classifier2.predict(x_test)

In [None]:
cm2 = confusion_matrix(y_test, y_pred2)

In [None]:
accuracy2 = sum(cm2[i][i] for i in range(3)) / y_test.shape[0]
print("accuracy = " + str(accuracy2))

In [None]:
dot_data = StringIO()

export_graphviz(classifier2, out_file=dot_data,  
                filled=True, rounded=True,
                special_characters=True)

graph = pydotplus.graph_from_dot_data(dot_data.getvalue())  
Image(graph.create_png())

To know more about the different paramteres of the `sklearn.tree.DecisionTreeClassifier`, click [here](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html)