![NN](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRsBuN7tJaaVkmdDs0lHMwhUh3n24zjrvs_7fJe2CFkTkMNfyT1)

***
**OBJECTIVE:**
   
       To Understand Decision trees & random forest predictions using Treeinterpreter package and to understand "how" the prediction is arrived at for each observation in a dataset.
   
   
       Note: 1) This package works only with scikit-learn modules.
             
             2) To install treeinterpreter using pip do - pip install treeinterpreter . Refer - https://github.com/andosa/treeinterpreter
***

###  We start by looking at the decision tree which is the building block of the random forest.

### How do decision trees work?

***
A Decision Tree is a tree (and a type of directed, acyclic graph) in which the nodes represent decisions (a square box), 
random transitions (a circular box) or terminal nodes, and the edges or branches are binary (yes/no, true/false) 
representing possible paths from one node to another. The specific type of decision tree used for machine learning contains 
no random transitions. To use a decision tree for classification or regression, one grabs a row of data or a set of features 
and starts at the root, and then through each subsequent decision node to the terminal node. The process is very intuitive and
easy to interpret, which allows trained decision trees to be used for variable selection or more generally, feature engineering.
***
***
For classification trees, the splits are chosen so as to 
    **1) minimize entropy or
      2) Gini impurity in the resulting subsets.**
***

### An example of a learned decision tree for classification to help you make your decision is below:

![](http://dataaspirant.com/wp-content/uploads/2017/01/B03905_05_01-compressor.png)

### Gini Index (Not Gini Impurity) - Difference? Check here - https://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity


Gini index says, if we select two items from a population at random then they must be of same class and probability for this is 1 if population is pure.

### Steps to Calculate Gini for a split

1) Calculate Gini for sub-nodes, using formula sum of square of probability for success and failure (p^2+q^2).

2) Calculate Gini for split using weighted Gini score of each node of that split

### Information Theory

Information theory is a measure to define this degree of disorganization in a system known as Entropy. If the sample is completely homogeneous, then the entropy is zero and if the sample is an equally divided (50% – 50%), it has entropy of one.

Entropy can be calculated using formula:-
                            
                               Entropy = -plog(base2)p - qlog(base2)q
                               
Here p and q is probability of success and failure respectively in that node. Entropy is also used with categorical target variable. It chooses the split which has lowest entropy compared to parent node and other splits. The lesser the entropy, the better it is.

### Steps to calculate entropy for a split:

1) Calculate entropy of parent node

2) Calculate entropy of each individual node of split and calculate weighted average of all sub-nodes available in split.

### But the dataset we are looking at has a continuous output, so how does the tree split? 

***
For regression trees, they are chosen to minimize either
    **1) Variance (Reduction in Variance approach)
      2) MAE (mean absolute error) within all of the subsets.**
***

#### Sklearn by default uses the variance approach as the splitting criteria for regression.

#### For more, refer - http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html

#### Which tree algorithm does scikit-learn use?  CART it is, more on that here - http://scikit-learn.org/stable/modules/tree.html#tree-algorithms

### Variance (Reduction in Variance approach):

This algorithm uses the standard formula of variance to choose the best split. The split with lower variance is selected as the criteria to split the population:

#### Variance = Sum(X - X-bar)^2 / n

Above X-bar is mean of the values, X is actual and n is number of values.

### Steps to calculate Variance:

1) Calculate variance for each node.

2) Calculate variance for each split as weighted average of each node variance.

### An example of a learned decision tree for regression to help you make your decision is below:

![](https://www.saedsayad.com/images/Decision_tree_r1.png)

### Now lets look at those concepts using the county house prices data set, which is a regression problem 

In [None]:
import pandas as pd
import numpy as np

In [None]:
data = pd.read_csv("../input/kc_house_data.csv")

In [None]:
data.head()

In [None]:
from treeinterpreter import treeinterpreter as ti
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor

In [None]:
rf = RandomForestRegressor()
dt = DecisionTreeRegressor()

In [None]:
y = data.iloc[:,2]
x = data.loc[:, data.columns != 'price']

In [None]:
x = x.drop('date',1)
x = x.drop('id', 1)

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

In [None]:
dt.fit(X_train, y_train)

In [None]:
instances = X_test.loc[[735]]
instances

### Turning a black box into a white box: decision paths using treeinterpreter

In [None]:
prediction, bias, contributions = ti.predict(dt, instances)

### Now lets look at the feature contributions

In [None]:
ft_list = []
for i in range(len(instances)):
    #print("Instance", i)
    print("Bias (trainset mean)", bias[i])
    #print("Feature contributions:")
    for c, feature in sorted(zip(contributions[i], 
                                 x.columns), 
                             key=lambda x: -abs(x[0])):
       ft_list.append((feature, round(c, 2)))
    print("-"*50)

In [None]:
labels, values = zip(*ft_list)

In [None]:
ft_list

In [None]:
import numpy as np                                                               
import matplotlib.pyplot as plt
import seaborn as sns

from pylab import rcParams
rcParams['figure.figsize'] = 25, 25

xs = np.arange(len(labels)) 

sns.barplot(xs, values)

#plt.bar(xs, values, width, align='center')

plt.xticks(xs, labels)
plt.yticks(values)

plt.show()

### What do the above results mean?

***
#### The TreeInterpreter library decomposes the predictions as the sum of contributions from each feature i.e.

#### prediction = bias + feature(1)contribution + … + feature(n)contribution. 
***

### contributions of all features for instance 735 from test set

In [None]:
contributions

### Prediction by Decision tree classifier

In [None]:
prediction

### Bias term

In [None]:
bias

### Therefore, prediction must equal

In [None]:
print(bias + np.sum(contributions, axis=1))

### As seen in the plot above, only 2 features have a positive impact in driving the prices higher.

### The feature contributions are sorted by their absolute impact. We can see that in the instance the predicted 
### value is lower than the data set mean, and that latitude has a negative impact, square foot has a high positive impact meaning, higher the sqft. higher the price, which makes sense.

### How did the decision tree arrive at the results? Lets look at the graph for the top 5 rows

In [None]:
top50x = X_train.head(50)
top5x = X_train.head(5)
top50y = y_train.head(50)
top5y = y_train.head(5)

In [None]:
dt1 = DecisionTreeRegressor()
dt1.fit(top5x, top5y)

In [None]:
from sklearn.externals.six import StringIO  
from IPython.display import Image  
#from sklearn.tree import export_graphviz
#import pydotplus
#dot_data = StringIO()
#export_graphviz(dt1, out_file=dot_data,  
#                filled=True, rounded=True,
 #               special_characters=True)
#graph = pydotplus.graph_from_dot_data(dot_data.getvalue())  
#Image(graph.create_png())

![](https://github.com/rakash/Scripts/blob/master/tree2.png?raw=true)

In [None]:
top5x

In [None]:
top5y

### Now lets go to a model that is an ensemble of decision trees.

### Yes, i am talking about Random forest

![](https://images.pexels.com/photos/302804/pexels-photo-302804.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260)

The random forest has been a burgeoning machine learning technique in the last few years. It is a non-linear tree-based model that often provides accurate results. However, being mostly black box, it is oftentimes hard to interpret and fully understand especially when it comes to explaining the results and rationale behind it to stakeholders in organizations.

### From decision trees to forest

We started the kernel with decision trees, so how do we move from a decision tree to a forest? 

This is straightforward, since the prediction of a forest is the average of the predictions of its trees, the prediction is simply the average of the bias terms plus the average contribution of each feature

### How does it work?

1) Assume number of cases in the training set is N. Then, sample of these N cases is taken at random but with replacement. This sample will be the training set for growing the tree.

2) If there are M input variables, a number m<M is specified such that at each node, m variables are selected at random out of the M. The best split on these m is used to split the node. The value of m is held constant while we grow the forest.

#### The splitting criteria is similar to that of decisiontreeregressor in sklearn. for more parameter details, refer - http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html

3) Each tree is grown to the largest extent possible and  there is no pruning.

4) Predict new data by aggregating the predictions of the ntree trees (i.e., majority votes for classification, average for regression).

In [None]:
rf.fit(X_train, y_train)

### Again, turning a black box into a white box for a random forest prediction

In [None]:
rf_prediction, rf_bias, rf_contributions = ti.predict(rf, instances)

In [None]:
rf_ft_list = []
for i in range(len(instances)):
    print("Bias (trainset mean)", rf_bias[i])
    for c, feature in sorted(zip(rf_contributions[i], 
                                 x.columns), 
                             key=lambda x: -abs(x[0])):
       rf_ft_list.append((feature, round(c, 2)))
    print("-"*50)

In [None]:
rf_labels, rf_values = zip(*rf_ft_list)

In [None]:
rf_ft_list

In [None]:
import numpy as np                                                               
import matplotlib.pyplot as plt

from pylab import rcParams
rcParams['figure.figsize'] = 25, 25

rf_xs = np.arange(len(rf_labels)) 

plt.bar(rf_xs, rf_values, 0.8, align='center')

plt.xticks(rf_xs, rf_labels)
plt.yticks(rf_values)

plt.show()

### What does the random forest prediction tell us ? 

### As seen in the plot above, again only 2 features have a positive impact in driving the prices higher, but this time latitude has a very high negative impact, bringing the predictions much less than the bias(trainset mean).

In [None]:
rf_contributions

In [None]:
rf_prediction

In [None]:
rf_bias

### Again, prediction must equal " bias + feature(1)contribution + … + feature(n)contribution "

In [None]:
print(rf_bias + np.sum(rf_contributions, axis=1))

### How did the random forest regressor arrive at the results? Lets look at the graph for the top 5 rows from train set

In [None]:
from sklearn.ensemble import RandomForestRegressor
rf_model = RandomForestRegressor(n_estimators=10)

In [None]:
top5xrf = X_train.head(5)
top5yrf = y_train.head(5)

In [None]:
rf_model.fit(top5xrf, top5yrf)

### Extracting only a single tree to visualise

In [None]:
estimator = rf_model.estimators_[5]
estimator1 = rf_model.estimators_[6]

In [None]:
from sklearn.externals.six import StringIO  
from IPython.display import Image
#from sklearn.tree import export_graphviz
#import pydotplus
#dot_data1 = StringIO()
#export_graphviz(estimator, out_file=dot_data1,  
 #               filled=True, rounded=True,
  #              special_characters=True)
#graph = pydotplus.graph_from_dot_data(dot_data1.getvalue())  
#Image(graph.create_png())

![](https://raw.githubusercontent.com/rakash/Scripts/master/tree.png)

### Another tree

In [None]:
from sklearn.externals.six import StringIO  
from IPython.display import Image  
#from sklearn.tree import export_graphviz
#import pydotplus
#dot_data3 = StringIO()
#export_graphviz(estimator1, out_file=dot_data3,  
 #               filled=True, rounded=True,
  #              special_characters=True)
#graph = pydotplus.graph_from_dot_data(dot_data3.getvalue())  
#Image(graph.create_png())

![](https://github.com/rakash/Scripts/blob/master/tree1.png?raw=true)

### Although one image is not going to solve the issue, looking at an individual decision tree shows us that a random forest is not an unexplainable method, but a sequence of logical questions and answers and every prediction can be trivially presented as a sum of feature contributions, showing how the features lead to a particular prediction.

### This opens up a lot of opportunities in practical machine learning tasks:

#### REFERENCES:

#### 1) https://github.com/andosa/treeinterpreter

#### 2) http://blog.datadive.net/interpreting-random-forests/

### Please upvote if you find the kernel useful and share your thoughts or suggestions