forked from ledeprogram/algorithms
-
Notifications
You must be signed in to change notification settings - Fork 0
/
decision_trees.html
150 lines (94 loc) · 5.84 KB
/
decision_trees.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
<!DOCTYPE html>
<html>
<head>
<title>Algorithms - Lede Program</title>
<meta charset="utf-8">
<link rel="stylesheet" href="../slide.css"/>
</head>
<body>
<textarea id="source">
layout:true
<!-- change your name for attribution -->
<p class="footer">
<span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Algorithms</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="http://www.datapolitan.com" property="cc:attributionName" rel="cc:attributionURL">Richard Dunks</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative-Commons-License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/80x15.png" /></a>
</p>
---
class: center,middle
![img-center-50](../images/cl_logo.png)
- - -
# Algorithms
## DecisionsTree.ipynb Explainer
---
<!-- insert content here -->
# Imports
`import pandas as pd` #import pandas – a tool for data analysis and data structures. We import it as ‘pd’, which means that when we want to use the tool, we’ll refer to it as ‘pd’ in the code.
`%matplotlib inline` #in .ipynb you can use % or %% to call on the IPython ‘magic functions’. Line magics (%) get the rest of the line as an argument. Cell magics (%%) get as an argument the whole cell. [Source](https://ipython.org/ipython-doc/3/interactive/tutorial.html)
--
`from sklearn import datasets` # ScienceKit Learn: a machine learning module; Built on NumPy, SciPy, and matplotlib
--
`from pandas.tools.plotting import scatter_matrix` # alternative to matplotlib – actually, we can get rid of this line because we use matplotlib instead
`import matplotlib.pyplot as plt` #We are importing matplotlib.pyplot – which is a collection of functions that “make matplotlib work like MATLAB.” Anytime we write plt followed by a period, we are calling on a function from this collection of functions.
---
# Loading the dataset
`iris = datasets.load_iris()` #Load a dataset called iris, and give it a name (‘iris’) by which we can refer to it
datasets.load_iris() returns data, datatype bunch.
“Bunch is a “Dictionary-like object, the interesting attributes are:
* ‘data’, the data to learn,
* ‘target’, the classification labels,
* ‘target_names’, the meaning of the labels,
* ‘feature_names’, the meaning of the features,
*‘DESCR’, the full description of the dataset.”
[Source: http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html#sklearn.datasets.load_iris)
---
`x = iris.data[:,2:`] #the attributes; dataset.data[rows, columns] ← [all rows, from the second column on]
#returns a list-like structure of values in each column e.g. [ [1 2] [3 4] [5 6] ]
`y = iris.target` #the target variable
#returns a list-like structure of target classifications e.g. [0,1,0,0,1,0]
---
`from sklearn import tree` # module for decision tree models. See the [Sci-Kit Learn documentation](http://scikit-learn.org/stable/modules/tree.html#tree)
`dt = tree.DecisionTreeClassifier()` # Creates our empty tree classifier; uses “gini” as criterion for splitting into nodes by default
# sklearn.tree is a module which includes decision tree-based models for classification and regression.
`dt = dt.fit(x,y)` #creates model using our attributes and target
(vs `lm.fit()`: we need to instantiate dt)
---
# Create .dot file
`from sklearn.externals.six import StringIO` # this input/output library enables us to save the tree to a file
#StringIO allows you to read and write files as string buffer (mutable sequence of characters)
`import pydotplus` # Interface to Graphviz’s Dot language. This module provides with a full interface to create handle modify and process graphs in Graphviz’s dot language. For more information about dot: http://www.graphviz.org/doc/info/output.html
---
#opens dot file in write mode as f:
```
with open("iris.dot", 'w') as f:
f = tree.export_graphviz(dt, out_file=f)` # while iris.dot is open, we write the graphviz dot file to iris.dot
```
`import os` # This module provides a portable way of using operating system dependent functionality. If you just want to read or write a file see `open()`, if you want to manipulate paths, see the `os.path` module, and if you want to read all the lines in all the files on the command line see the `fileinput` module.
Source: https://docs.python.org/2/library/os.html
`os.unlink('iris.dot')` # deletes the file
`dot_data = StringIO()` #creates an empty StringIO object that can accept either Unicode or 8-bit strings
`tree.export_graphviz(dt, out_file=dot_data)` #Export a decision tree in DOT format to an outfile (which is dot_data which we set to a StringIO object). This allows graphical renderings to be generated
---
# Create the PDF
`graph = pydotplus.graph_from_dot_data(dot_data.getvalue())` #creates graph from the values in the dot_data
`graph.write_pdf("iris.pdf")` # writes the graph to a pdf
`from IPython.display import IFrame` #.display is a public api for IPython. An IFrame is an HTML document embedded inside another HTML document on a website
`IFrame("iris.pdf", width=800, height=800) `# print the pdf in an IFrame for which we’ve specified the size (800x800)
---
# Our beautiful tree!
![iris-illustrated](iris.jpg)
---
# What if we want to access a node by name?
![iris-illustrated](iris2.jpg)
---
# Thank you!
![asdf](https://media.giphy.com/media/MJmQapwL7ABTG/giphy.gif)
</textarea>
<script src="../js/remark-latest.min.js">
</script>
<script>
var slideshow = remark.create(
// {
// slideNumberFormat: ""}
);
</script>
</body>
</html>