# Decision Trees Refresher

* Example of a decision tree:

![example of a decision tree](img/decision_tree_s.png "Example of a decision tree")

* Predictions are made by following the decision path
* Best attributes for splitting can be found by maximizing Gini gain
* Formula for Gini impurity: $$G(S)= 1-\sum_{i=1}^k p_i^2$$
* Formula for Gini gain: $$Gain_{Gini}(S,A)=G(S)-\sum_{i \in values(A)} \frac{|S_i|}{|S|}G(S_i)$$
* When to stop splitting: pick a maximum depth or a minimum number of samples needed in a new node/leaf
    * Use the validation set to find a good value
* Decision trees can be used for both classification and regression problems
* Random forests are ensembles of decision trees trained on random subsets of the training data

How to create a decision tree in sklearn:

In [None]:
from sklearn.tree import DecisionTreeClassifier

dtc = DecisionTreeClassifier(max_depth=3, min_samples_split=10)

How to fit a decision tree:

In [None]:
dtc.fit(X_train, y_train)

How to evaluate accuracy of a decision tree:

In [None]:
from sklearn.metrics import accuracy_score

print(accuracy_score(y_train, dtc.predict(X_train)))
print(accuracy_score(y_test, dtc.predict(X_test)))

How to evaluate a decision tree in more detail:

In [None]:
from sklearn.metrics import classification_report

print(classification_report(y_test, dtc.predict(X_test)))

How to access importance of features:

In [None]:
dtc.feature_importances_