<div style="display: block;margin-left: auto;margin-right: auto;width: 100%;text-align: center;">
<br><a href="https://www.youtube.com/watch?v=RmajweUFKvM&list=PLEiEAq2VkUULYYgj13YHUWmRePqiu8Ddy&index=16"><b> Source</b></a></div>

# 1.  Decision Tree Algorithm
***

### <center>Theoretical part
- Decision Tree is a shaped diagram used to determine a course of action
- Each branch of the tree represents a possible decision, occurrence or reaction
    
#### Problems that Decision Tree can solve
![](pic/v16/3s.png)

#### Advantages of Decision Tree
- simple to understand, interpret and visualize
- little effort required for data preparation
- can handle and categorical data
- non linear parameters don't effect its performance

#### Disadvantages of Decision Tree
- <b>Overfitting</b>
    - overfitting occurs when the algorithm captures "noise" in the data
- <b>High Variance</b>
    - the model can get unstable due to small variation in data
- <b>Low biased Tree</b>
    - a highly complicated decision tree tends to have a low bias which makes it difficult for the model to work with new data
    
### Decision Tree - Important Terms

- <b>ENTROPY</b>
     - entropy is the measure of randomness or unpredictability in the data set
![](pic/v16/4s.png)  

- <b>INFORMATION GAIN</b>
    - it is the measure of decrease in entropy after the dataset is split
![](pic/v16/5s.png)

- <b>LEAF and ROOT NODE</b>
    - leaf node carries the classification or decision
    - root node is the top most decision
![](pic/v16/6s.png)    

#### How Decision Tree work?
- let's try to classify different types of animals based on their features using a decision tree
![](pic/v16/7s.png)

PROBLEM STATEMENT
- to classify the different types of animals based on their features using decision tree
- the dataset looking quite messy and the entropy is high in this case
![](pic/v16/8s.png)

HOW TO SPLIT DATA
- we have to frame the conditions that split the data in such a way that the information gain is the highest
- <b>GAIN</b> is the measure of decrease in entropy after splitting

<b>Let's try to calculate entropy for current set:</b>

![](pic/v16/10s.png)
![](pic/v16/9s.png)
![](pic/v16/11s.png)

- We will calculate the entropy of the dataset similarly after every split to calculate the GAIN
- Now we will try to choose a condition that gives us the highest gain. 
- We will do that by splitting the data using each condition and checking the gain that we get out of them
- The condition that gives us the highest gain will be used to make the first split
![](pic/v16/12s.png)

- So, we will split the data
![](pic/v16/13s.png)

- the entropy after splitting has decreased considerably
- however we still need some splitting at both the branches to attain an entropy value equal to zero

![](pic/v16/14s.png)

- so we split both nodes using "height" as the condition
- since every branch now contains single label type, we can say that the entropy in this case has reached the least value
- this tree can now predict all the classes of animals present in the data set with 100% accuracy

# 2. Decision Tree - practical example
***

### Loan repayment prediction
PROBLEM STATEMENT: 
- to predict if customer will repay loan amount or not

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree

In [2]:
balance_data = pd.read_csv('Data/loan_data.csv',
sep= ',', header= 0)

In [3]:
print ("Dataset Lenght:: "), len(balance_data)
print ("Dataset Shape:: "), balance_data.shape

Dataset Lenght:: 
Dataset Shape:: 


(None, (1000, 6))

In [4]:
print ("Dataset:: ")
balance_data.head()

Dataset:: 


Unnamed: 0,Result,Inital payment,Last payment,Credit scoring,House number,sum
0,yes,201,10018,250,3046,13515
1,yes,205,10016,395,3044,13660
2,yes,257,10129,109,3251,13746
3,yes,246,10064,324,3137,13771
4,yes,117,10115,496,3094,13822


In [5]:
# separating the target varbale
X = balance_data.values[:, 1:5]
Y = balance_data.values[:,0]

In [6]:
# spliting dataset into test and train
X_train, X_test, y_train, y_test = train_test_split( X, Y, test_size = 0.3, random_state = 100)

In [7]:
# function to perform training with entropy
clf_entropy = DecisionTreeClassifier(criterion = "entropy", random_state = 100,
max_depth=3, min_samples_leaf=5)
clf_entropy.fit(X_train, y_train)

DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=3,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=5, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=100,
            splitter='best')

In [8]:
# function to make predictions
y_pred_en = clf_entropy.predict(X_test)
y_pred_en

array(['yes', 'yes', 'No', 'yes', 'No', 'yes', 'yes', 'yes', 'No', 'No',
       'No', 'No', 'yes', 'No', 'No', 'yes', 'yes', 'No', 'yes', 'No',
       'No', 'yes', 'No', 'yes', 'yes', 'No', 'No', 'yes', 'No', 'No',
       'No', 'yes', 'yes', 'yes', 'yes', 'No', 'No', 'No', 'yes', 'No',
       'yes', 'yes', 'yes', 'No', 'No', 'yes', 'yes', 'yes', 'No', 'No',
       'yes', 'No', 'yes', 'yes', 'yes', 'yes', 'No', 'yes', 'No', 'yes',
       'yes', 'No', 'yes', 'yes', 'No', 'yes', 'yes', 'yes', 'No', 'No',
       'No', 'No', 'No', 'yes', 'No', 'yes', 'yes', 'No', 'yes', 'No',
       'No', 'No', 'No', 'yes', 'No', 'yes', 'No', 'yes', 'yes', 'No',
       'yes', 'yes', 'yes', 'yes', 'yes', 'No', 'yes', 'yes', 'yes',
       'yes', 'No', 'No', 'yes', 'yes', 'No', 'yes', 'yes', 'yes', 'No',
       'yes', 'yes', 'yes', 'yes', 'No', 'No', 'yes', 'yes', 'yes', 'No',
       'No', 'No', 'No', 'yes', 'yes', 'No', 'yes', 'yes', 'yes', 'No',
       'No', 'yes', 'yes', 'No', 'yes', 'yes', 'yes', 'No', 'ye

In [9]:
# checing accuracy
print ("Accuracy is "), accuracy_score(y_test,y_pred_en)*100

Accuracy is 


(None, 93.66666666666667)

#### Model accuracy is 93% which is very good!
#### So the bank can now use this model to decide whether is should approve loan request from a particular customer or not.