# Introductino to Decision Tree Algorithm
> In this note, it will show you the basics of Decision Tree Algorithm, Random Forest Algorithm and Gradient Boosting Algorithm.

> 1.What is a Decision Tree? How does it work?

> 2.How to choose between tree based models and linear models?

> 3.Pros and Cons of Desicion Tree?

> 4.Desicion Tree implementation with SK-learn

> 5.Break it down

### 1.What is a Decision Tree? How does it work?
**Tree based learning algorithms** are considered to be one of the best and mostly used supervised learning methods. Tree based methods empower predictive models with high accuracy, stability and ease of interpretation. Unlike linear models, they map non-linear relationships quite well. They are adaptable at solving any kind of problem at hand (classification or regression).

**Decision Tree** works for both **categorical** and **continuous** input and output variables. In this technique, we **split the population or sample into two or more homogeneous sets **(or sub-populations) based on **most significant splitter / differentiator **in input variables.

<img src="asset/Descion_Tree1.png",width=800,height=800, style="float: left;">


**Entropy** controls how Decision Tree decide **where to split the data**, and **Entropy** itself is to measure the **impurity** in a bunch of examples. **The larger Entropy value is the more impure the examples are**, and Engtropy value is between **0~1**.

<img src="asset/Desicion_Tree_Entropy.png",width=400,height=400, style="float: left;">


<img src="asset/Desicion_Tree_Information_Gain.png",width=500,height=500, style="float: left;">

### 2. How to choose between tree based models and linear models?

* If the relationship between dependent & independent variable is well approximated by a linear model, linear regression will outperform tree based model.

* If there is a **high non-linearity & complex relationship ** between dependent & independent variables, a tree model will outperform a classical regression method.

* If you need to build a model which is easy to explain to people, a decision tree model will always do better than a linear model. Decision tree models are even simpler to interpret than linear regression!

### 3.Pros and Cons of Desicion Tree?
**Pros**:
* Simple to understand and to interpret. Trees can be visualised.

* Requires **little data preparation**. Other techniques often require data normalisation, dummy variables need to be created and blank values to be removed. Note however that this module does not support missing values.

* The cost of using the tree (i.e., predicting data) is logarithmic in the number of data points used to train the tree.

**Cons**:
* Decision-tree learners can create **over-complex trees** that do not generalise the data well. This is called **overfitting**. 

* Decision trees can be **unstable** because small variations in the data might result in a completely different tree being generated. This problem is mitigated by using decision trees within an ensemble.

* The problem of learning an optimal decision tree is known to be NP-complete under several aspects of optimality and even for simple concepts. Consequently, practical decision-tree learning algorithms are based on heuristic algorithms such as the greedy algorithm where locally optimal decisions are made at each node. Such algorithms cannot guarantee to return the globally optimal decision tree. This can be mitigated by training multiple trees in an ensemble learner, where the features and samples are randomly sampled with replacement.

* Decision tree learners create **biased trees if some classes dominate**. It is therefore recommended to balance the dataset prior to fitting with the decision tree.

### 4.Desicion Tree implementation with SK-learn

In [2]:
# import libs
from sklearn.datasets import load_iris
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [9]:
# data loading
iris = load_iris()
X, y= iris.data, iris.target
train_feature, test_feature, train_label, test_label= train_test_split(X, y, test_size=0.25, random_state=42)

In [26]:
# build and train model
# you can change the criterion to see the difference bewteen 'entropy' and 'gini'
clf = tree.DecisionTreeClassifier(criterion='entropy') 
clf = clf.fit(train_feature, train_label)
clf.score(train_feature, train_label)

1.0

In [25]:
# make prediction and test the accuracy
pred = clf.predict(test_feature)
accuracy_score(test_label, pred)

0.97368421052631582

### 5. Break it down
* Decision Tree split the population or sample into two or more homogeneous sets using **Entropy**
* Decision Tree requires **little data preparation**
* Decision Tree is better to solve **non-linear problem**
* Decision Tree tend to be **over-fitting** and **unstable**