# Construct a binary tree from sklearn decision tree classifier

Load a prefit DecisionTreeClassifier for the US Behavioral Risk Factor Surveillance System 2014 for predicting whether the respondent was tested or not for AIDS.

In [1]:
import pickle
from IPython.display import IFrame
from dtvis.tree_constructor import reconstruct_tree, D3Tree

clf = pickle.load(open('data/decision_tree_AIDS_5lvl.pickle','rb'))




Concentrate on a few features from the dataset:

In [2]:
feature_names = ['x.imprace_1', 'x.imprace_2', 'x.imprace_3', 'x.imprace_4',
       'x.imprace_5', 'x.imprace_6', 'x.impeduc_1', 'x.impeduc_2',
       'x.impeduc_3', 'x.impeduc_4', 'x.impeduc_5', 'x.impeduc_6',
       'x.impmrtl_1', 'x.impmrtl_2', 'x.impmrtl_3', 'x.impmrtl_4',
       'x.impmrtl_5', 'x.impmrtl_6', 'x.impcsex_1', 'x.impcsex_2',
       'x.impcsex_123456789', 'x.asthms1_1', 'x.asthms1_2', 'x.asthms1_3',
       'x.asthms1_9', 'x.incomg_1', 'x.incomg_2', 'x.incomg_3',
       'x.incomg_4', 'x.incomg_5', 'x.incomg_9', 'x.rfseat3_1',
       'x.rfseat3_2', 'x.rfseat3_9', 'x.flshot6_1', 'x.flshot6_2',
       'x.flshot6_9', 'x.flshot6_123456789', 'x.pneumo2_1', 'x.pneumo2_2',
       'x.pneumo2_9', 'x.pneumo2_123456789', 'x.bmi5cat_1', 'x.bmi5cat_2',
       'x.bmi5cat_3', 'x.bmi5cat_4', 'x.bmi5cat_123456789', 'x.rfmam2y_1',
       'x.rfmam2y_2', 'x.rfmam2y_9', 'x.rfmam2y_123456789', 'x.denvst2_1',
       'x.denvst2_2', 'x.denvst2_9', 'x.rfsmok3_1', 'x.rfsmok3_2',
       'x.rfsmok3_9', 'sleptim1', 'x.age80', 'x.impnph', 'htin4', 'wtkg3',
       'drocdy3.', 'x.drnkmo4']
class_names = ['tested', 'not-tested']

## tree construction

We can construct a hierarchical tree, amenable for D3 tree layout plotting using the __D3Tree__ class from __dtvis__, and save the reconstructed tree in __'data/d3tree_data.json'__. Currently, it is hard coded in __index.html__ that the visualizations will look for this path. The option __node_type__ can be one of three options: 'all-pie', 'leaf-pie' or 'no-pie' to visualize the number of cases at each node as pie charts if 'all-pie', or only visualize the leaf nodes as pie charts in case of 'leaf-pie' or do not visualize any of the nodes as pie charts if 'no-pie' is selected.

The __class_show__ option visualizes the classes as they cascade through the decision tree, if 'all', then the width of the links will be proportional to the combined number of observations among all classes and the color of the links will indicate the majoruity class.

In [3]:
d3tree = D3Tree(clf, feature_names, class_names, node_type = 'leaf-bar', class_show = 'all', colors = 'default')
d3tree.export_data(path='data/', filename='d3tree_data.json')
IFrame('index.html', width=1000, height=1000)