---

# 文件目標

想要做出互動式的決策樹視覺化。

Google 相關的關鍵字如下: 

- interactive decision tree
- sankey tree
- collapse tree (Collapsible Tree)


# 第一種方法

剛好兩個參考網站是不同例子，所以就一起貼上來。

參考網站:  

- [Interactive Visualization of Decision Trees with Jupyter Widgets](https://towardsdatascience.com/interactive-visualization-of-decision-trees-with-jupyter-widgets-ca15dd312084)

- [python \- Plot Interactive Decision Tree in Jupyter Notebook \- Stack Overflow](https://stackoverflow.com/questions/50755706/plot-interactive-decision-tree-in-jupyter-notebook): 這個要看 old answer


---

## 第一個例子

In [None]:
from IPython.display import SVG
from graphviz import Source
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn import tree
from ipywidgets import interactive
from IPython.display import display  

In [None]:
# load dataset
data = load_iris()

#Get the feature matrix
features = data.data

#Get the labels for the sampels
target_label = data.target

#Get feature names
feature_names = data.feature_names

In [None]:
def plot_tree(crit, split, depth, min_split, min_leaf=0.17):
    classifier = DecisionTreeClassifier(random_state = 123, criterion = crit, 
                                        splitter = split, 
                                        max_depth = depth, 
                                        min_samples_split=min_split, 
                                        min_samples_leaf=min_leaf)
    classifier.fit(features, target_label)

    graph = Source(tree.export_graphviz(classifier, out_file=None, 
                                        feature_names=feature_names, 
                                        class_names=['0', '1', '2'], 
                                        filled = True))

    display(SVG(graph.pipe(format='svg')))
    return classifier

In [None]:
decision_plot = interactive(plot_tree, 
                            crit = ["gini", "entropy"], 
                            split = ["best", "random"]  , 
                            depth=[1, 2, 3, 4, 5],
                            min_split=(0.1,1), 
                            min_leaf=(0.1,0.5))

display(decision_plot)

---

## 第二個例子

In [None]:
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn import tree
from sklearn.datasets import load_wine
from IPython.display import SVG
from graphviz import Source
from IPython.display import display                               
from ipywidgets import interactive

In [None]:
# load dataset
data = load_wine()

# feature matrix
X = data.data

# target vector
y = data.target

# class labels
labels = data.feature_names

In [None]:
def plot_tree(crit, split, depth, min_split, min_leaf=0.2):
    estimator = DecisionTreeClassifier(random_state = 0,
                                   criterion = crit,
                                   splitter = split,
                                   max_depth = depth,
                                   min_samples_split=min_split,
                                   min_samples_leaf=min_leaf)
    
    estimator.fit(X, y)
    
    graph = Source(tree.export_graphviz(estimator,
                                        out_file=None,
                                        feature_names=labels,
                                        class_names=['0', '1', '2'],
                                        filled = True))
    display(SVG(graph.pipe(format='svg')))
    return estimator

In [None]:
inter=interactive(plot_tree ,
                  crit = ["gini", "entropy"],
                  split = ["best", "random"],
                  depth=[1,2,3,4],
                  min_split=(0.1,1),
                  min_leaf=(0.1,1))
display(inter)

---

# 第二種方法

還不確定要怎麼用 D3 呈現 .json 檔案。  
所以沒有嘗試成功。

[See sklearn trees with D3](https://planspace.org/20151129-see_sklearn_trees_with_d3/)

[python 3\.x \- Create json hierarchy tree from two\-column dataframe \(Python3\) for d3 collapsible tree visual \- Stack Overflow](https://stackoverflow.com/questions/54756195/create-json-hierarchy-tree-from-two-column-dataframe-python3-for-d3-collapsibl)

In [None]:
def rules(clf, features, labels, node_index=0):
    """Structure of rules in a fit decision tree classifier

    Parameters
    ----------
    clf : DecisionTreeClassifier
        A tree that has already been fit.

    features, labels : lists of str
        The names of the features and labels, respectively.

    """
    node = {}
    if clf.tree_.children_left[node_index] == -1:  # indicates leaf
        count_labels = zip(clf.tree_.value[node_index, 0], labels)
        node['name'] = ', '.join(('{} of {}'.format(int(count), label)
                                  for count, label in count_labels))
    else:
        feature = features[clf.tree_.feature[node_index]]
        threshold = clf.tree_.threshold[node_index]
        node['name'] = '{} > {}'.format(feature, threshold)
        left_index = clf.tree_.children_left[node_index]
        right_index = clf.tree_.children_right[node_index]
        node['children'] = [rules(clf, features, labels, right_index),
                            rules(clf, features, labels, left_index)]
    return node

In [None]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

data = load_iris()

clf = DecisionTreeClassifier(max_depth=3)
clf.fit(data.data, data.target)

rules(clf, data.feature_names, data.target_names)

In [None]:
import json

r = rules(clf, data.feature_names, data.target_names)
with open('rules.json', 'w') as f:
    f.write(json.dumps(r))

---

# 第三種方法(不好，是網路)

[python \- Plot Interactive Decision Tree in Jupyter Notebook \- Stack Overflow](https://stackoverflow.com/questions/50755706/plot-interactive-decision-tree-in-jupyter-notebook)


In [None]:
%%html
<div id="d3-example"></div>
<style>

.node circle {
  cursor: pointer;
  stroke: #3182bd;
  stroke-width: 1.5px;
}

.node text {
  font: 12px sans-serif;
  pointer-events: none;
  text-anchor: middle;
}

line.link {
  fill: none;
  stroke: #9ecae1;
  stroke-width: 1.5px;
}
</style>

In [None]:
%%javascript
// We load the d3.js library from the Web.
require.config({paths:
    {d3: "http://d3js.org/d3.v3.min"}});
require(["d3"], function(d3) {
  // The code in this block is executed when the
  // d3.js library has been loaded.

  // First, we specify the size of the canvas
  // containing the visualization (size of the
  // <div> element).
  var width = 960,
    height = 500,
    root;

  // We create a color scale.
  var color = d3.scale.category10();

  // We create a force-directed dynamic graph layout.
//   var force = d3.layout.force()
//     .charge(-120)
//     .linkDistance(30)
//     .size([width, height]);
    var force = d3.layout.force()
    .linkDistance(80)
    .charge(-120)
    .gravity(.05)
    .size([width, height])
    .on("tick", tick);
var svg = d3.select("body").append("svg")
    .attr("width", width)
    .attr("height", height);

var link = svg.selectAll(".link"),
    node = svg.selectAll(".node");

  // In the <div> element, we create a <svg> graphic
  // that will contain our interactive visualization.
 var svg = d3.select("#d3-example").select("svg")
  if (svg.empty()) {
    svg = d3.select("#d3-example").append("svg")
          .attr("width", width)
          .attr("height", height);
  }
var link = svg.selectAll(".link"),
    node = svg.selectAll(".node");
  // We load the JSON file.
  d3.json("graph2.json", function(error, json) { //檔案要放這裡!!!
    // In this block, the file has been loaded
    // and the 'graph' object contains our graph.
 if (error) throw error;
else
    test(1);
root = json;
      test(2);
      console.log(root);
  update();



  });
    function test(rr){console.log('yolo'+String(rr));}

function update() {
    test(3);
  var nodes = flatten(root),
      links = d3.layout.tree().links(nodes);

  // Restart the force layout.
  force
      .nodes(nodes)
      .links(links)
      .start();

  // Update links.
  link = link.data(links, function(d) { return d.target.id; });

  link.exit().remove();

  link.enter().insert("line", ".node")
      .attr("class", "link");

  // Update nodes.
  node = node.data(nodes, function(d) { return d.id; });

  node.exit().remove();

  var nodeEnter = node.enter().append("g")
      .attr("class", "node")
      .on("click", click)
      .call(force.drag);

  nodeEnter.append("circle")
      .attr("r", function(d) { return Math.sqrt(d.size) / 10 || 4.5; });

  nodeEnter.append("text")
      .attr("dy", ".35em")
      .text(function(d) { return d.name; });

  node.select("circle")
      .style("fill", color);
}
    function tick() {
  link.attr("x1", function(d) { return d.source.x; })
      .attr("y1", function(d) { return d.source.y; })
      .attr("x2", function(d) { return d.target.x; })
      .attr("y2", function(d) { return d.target.y; });

  node.attr("transform", function(d) { return "translate(" + d.x + "," + d.y + ")"; });
}
          function color(d) {
  return d._children ? "#3182bd" // collapsed package
      : d.children ? "#c6dbef" // expanded package
      : "#fd8d3c"; // leaf node
}
      // Toggle children on click.
function click(d) {
  if (d3.event.defaultPrevented) return; // ignore drag
  if (d.children) {
    d._children = d.children;
    d.children = null;
  } else {
    d.children = d._children;
    d._children = null;
  }
  update();
}
    function flatten(root) {
  var nodes = [], i = 0;

  function recurse(node) {
    if (node.children) node.children.forEach(recurse);
    if (!node.id) node.id = ++i;
    nodes.push(node);
  }

  recurse(root);
  return nodes;
}

});

In [None]:
# 看一下自己的json檔案

import os

#  my custom js
if os.path.isfile('graph2.json'):
    with open('graph2.json') as f:
        print(f.read())
else:
    print("You don't have a custom.js file")

---

# 第四種方法

我目前最滿意的作法，但是 chrome 開啟html檔案好像會有問題。  
而且目前就是輸出 .json 檔案，可是視覺話的部分還是要寫 D3 跟 html 做呼叫這樣。

p.s. 參考資料有 html 的模板。

[Visualization of scikit\-learn Decision Trees with d3\.js – fractalytics](http://fractalytics.io/visualization-scikit-learn-decision-trees-d3-js)

chrome 無法自動讀取 .json 檔案問題:  
[Ajax 请求本地 Json 出错 · Issue \#10 · endRuz/Blog](https://github.com/endRuz/Blog/issues/10)
> 記得設定完之後，還要把全部的 chrome 關閉，重開之後才會生效。


In [None]:
import json
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier  

In [None]:
#load data from flat file
df=pd.read_csv("./sklearn-tree-d3js-master/bank.csv",sep=';')
df.dropna(inplace=True)

In [None]:
#set the label colomne 
label_name = 'y'
df.sort_values([label_name], ascending=[True], inplace=True)
df = df.sort_index(by=[label_name], ascending=[True])

features = (df.drop(label_name,axis=1).columns.values)

is_number = np.vectorize(lambda x: np.issubdtype(x, np.number))
boolfeatures= is_number(df.drop(label_name,axis=1).dtypes)

df_dummy = pd.get_dummies(df.drop(label_name,axis=1),prefix_sep='_-_')

In [None]:
def generator_1(clf, features, labels,original_features, node_index=0,side=0):
  
    node = {}
    if clf.tree_.children_left[node_index] == -1:  # indicates leaf
        count_labels = zip(clf.tree_.value[node_index, 0], labels)
        node['name'] = ', '.join(('{} of {}'.format(int(count), label)
                                  for count, label in count_labels))
        node['size'] = sum( clf.tree_.value[node_index, 0]  )   
        node['side'] = 'left' if side == 'l' else 'right'                      
    else:

        count_labels = zip(clf.tree_.value[node_index, 0], labels)
        node['pred'] = ', '.join(('{} of {}'.format(int(count), label)
                                  for count, label in count_labels))
                                      
        node['side'] = 'left' if side == 'l' else 'right'                              
        feature = features[clf.tree_.feature[node_index]]
        threshold = clf.tree_.threshold[node_index]
        
        if ('_-_' in feature) and (feature not in original_features):
            node['name'] =  '{} = {}'.format(feature.split('_-_')[0], feature.split('_-_')[1] )
            node['type'] = 'categorical'
        else:
            node['name'] = '{} > {}'.format(feature, round(threshold,2) )
            node['type'] = 'numerical'
        
        left_index = clf.tree_.children_left[node_index]
        right_index = clf.tree_.children_right[node_index]
        
        node['size'] = sum (clf.tree_.value[node_index, 0])
        node['children'] = [generator_1(clf, features, labels, original_features, right_index,'r'),
                            generator_1(clf, features, labels, original_features, left_index,'l')]
                            
    return node

In [None]:
def generator_2(clf, features, labels,original_features, node_index=0,side=0,prev_index=0):

    node = {}
    if clf.tree_.children_left[node_index] == -1:  # indicates leaf
        count_labels = zip(clf.tree_.value[node_index, 0], labels)
        node['pred'] = ', '.join(('{} of {}'.format(int(count), label)
                                  for count, label in count_labels))
                                      
        node['side'] = 'left' if side == 'l' else 'right'                              
        feature = features[clf.tree_.feature[prev_index]]
        threshold = clf.tree_.threshold[prev_index]
        
            
        if node_index == 0:
            node["name"] = 'Root >'
        elif ('_-_' in feature) and (feature not in original_features):
            
            node['name'] =  '{} = {}'.format(feature.split('_-_')[0], feature.split('_-_')[1] ) if side == 'r' else '{} != {}'.format(feature.split('_-_')[0], feature.split('_-_')[1] )  
            node['type'] = 'categorical'
        else:
            node['name'] = '{} > {}'.format(feature, round(threshold,2) ) if side == 'r' else '{} <= {}'.format(feature, round(threshold,2) ) 
            node['type'] = 'numerical'
        
        left_index = clf.tree_.children_left[node_index]
        right_index = clf.tree_.children_right[node_index]
        
        node['size'] = sum (clf.tree_.value[node_index, 0])
           
    else:

        count_labels = zip(clf.tree_.value[node_index, 0], labels)
        node['pred'] = ', '.join(('{} of {}'.format(int(count), label)
                                  for count, label in count_labels))
                                      
        node['side'] = 'left' if side == 'l' else 'right'                              
        feature = features[clf.tree_.feature[prev_index]]
        threshold = clf.tree_.threshold[prev_index]
        
            
        if node_index == 0:
            node["name"] = 'Root >'
        elif ('_-_' in feature) and (feature not in original_features):
            
            node['name'] =  '{} = {}'.format(feature.split('_-_')[0], feature.split('_-_')[1] ) if side == 'r' else '{} != {}'.format(feature.split('_-_')[0], feature.split('_-_')[1] )  
            node['type'] = 'categorical'
        else:
            node['name'] = '{} > {}'.format(feature, round(threshold,2) ) if side == 'r' else '{} <= {}'.format(feature, round(threshold,2) ) 
            node['type'] = 'numerical'
        
        left_index = clf.tree_.children_left[node_index]
        right_index = clf.tree_.children_right[node_index]
        
        node['size'] = sum (clf.tree_.value[node_index, 0])
        node['children'] = [generator_2(clf, features, labels, original_features, right_index,'r',node_index),
                            generator_2(clf, features, labels, original_features, left_index,'l',node_index)]
                            
        
    return node

In [None]:
clf = DecisionTreeClassifier(max_depth=5)
clf.fit(df_dummy, df[label_name])

io=generator_1(clf, df_dummy.columns,np.unique(df[label_name]),features)

print(json.dumps(io, indent=4))

In [None]:
with open('./sklearn-tree-d3js-master/structureC1.json', 'w') as outfile:
    json.dump(io, outfile, indent=4)

In [None]:
io = generator_2(clf, df_dummy.columns,np.unique(df[label_name]),features)
print(json.dumps(io, indent=4))

In [None]:
with open('./sklearn-tree-d3js-master/structureC2.json', 'w') as outfile:
    json.dump(io, outfile, indent=4)

In [18]:
# 開啟 html 的參考方式
# https://stackoverflow.com/questions/47926088/how-to-get-webbrowser-module-for-python-3-6-using-pip
# https://programminghistorian.org/en/lessons/creating-and-viewing-html-files-with-python

import webbrowser
# webbrowser.open('file:///E:/NCHC/00_skill_tips/python_decision_tree_interactive/sklearn-tree-d3js-master/index.html')
# webbrowser.open('https://www.facebook.com')
# webbrowser.open_new_tab('file:///E:/NCHC/00_skill_tips/python_decision_tree_interactive/sklearn-tree-d3js-master/index.html')

True

---

# 第五種方法

還沒嘗試成功，感覺他的 display 功能要在 Azure 中實現才行。

[視覺效果 \- Azure Databricks \| Microsoft Docs](https://docs.microsoft.com/zh-tw/azure/databricks/notebooks/visualizations/)


In [2]:
import pyspark

In [3]:
trainingDF = spark.read.format("libsvm").load("/databricks-datasets/mnist-digits/data-001/mnist-digits-train.txt").cache()
testDF = spark.read.format("libsvm").load("/databricks-datasets/mnist-digits/data-001/mnist-digits-test.txt").cache()

from pyspark.ml.classification import DecisionTreeClassifier
from pyspark.ml.feature import StringIndexer
from pyspark.ml import Pipeline

indexer = StringIndexer().setInputCol("label").setOutputCol("indexedLabel")

dtc = DecisionTreeClassifier().setLabelCol("indexedLabel")

# Chain indexer + dtc together into a single ML Pipeline.
pipeline = Pipeline().setStages([indexer, dtc])

model = pipeline.fit(trainingDF)
display(model.stages[-1])

NameError: name 'spark' is not defined