### Random Forest
Random Forest is an ensemble of Decision Trees, generally trained via the bagging method (or sometimes pasting), typically with max_samples set to the size of the training set.

In [4]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
import pandas as pd

In [8]:
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data['data'], data['target'], test_size=0.20)

In [9]:
rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1)
rnd_clf.fit(X_train, y_train)

RandomForestClassifier(max_leaf_nodes=16, n_estimators=500, n_jobs=-1)

In [12]:
y_cap = rnd_clf.predict(X_test)
accuracy_score(y_test, y_cap)

0.9298245614035088

In [14]:
#using grid search for optimizing parameters.
from sklearn.model_selection import GridSearchCV

grid_parameters = [
    {'n_estimators':[100, 150, 200, 250, 500], 'max_leaf_nodes':[4, 8, 16, 32], 'criterion':["gini", "entropy"]},
    {'n_estimators':[200, 250, 500], 'max_leaf_nodes':[4, 16, 32], 'criterion':["gini", "entropy"]}
]
rfc = RandomForestClassifier()

gsearch = GridSearchCV(estimator=rfc, param_grid = grid_parameters, cv=3, verbose=2, n_jobs=-1)

In [15]:
gsearch.fit(X_train, y_train)

Fitting 3 folds for each of 58 candidates, totalling 174 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done  17 tasks      | elapsed:    5.2s
[Parallel(n_jobs=-1)]: Done 138 tasks      | elapsed:   11.6s
[Parallel(n_jobs=-1)]: Done 174 out of 174 | elapsed:   14.2s finished


GridSearchCV(cv=3, estimator=RandomForestClassifier(), n_jobs=-1,
             param_grid=[{'criterion': ['gini', 'entropy'],
                          'max_leaf_nodes': [4, 8, 16, 32],
                          'n_estimators': [100, 150, 200, 250, 500]},
                         {'criterion': ['gini', 'entropy'],
                          'max_leaf_nodes': [4, 16, 32],
                          'n_estimators': [200, 250, 500]}],
             verbose=2)

In [16]:
gsearch.best_params_

{'criterion': 'entropy', 'max_leaf_nodes': 16, 'n_estimators': 200}

In [17]:
rnd_clf1 = RandomForestClassifier(n_estimators=200, max_leaf_nodes=16, n_jobs=-1, criterion='entropy')
rnd_clf1.fit(X_train, y_train)
y_cap = rnd_clf.predict(X_test)
accuracy_score(y_test, y_cap)

0.9298245614035088

With a few exceptions, a RandomForestClassifier has all the hyperparameters of a DecisionTreeClassifier (to control how trees are grown), plus all the hyperparameters of a BaggingClassifier to control the ensemble itself.

__The Random Forest algorithm introduces extra randomness when growing trees instead of searching for the very best feature when splitting a node , it searches for the best feature among a random subset of features__. This results in a greater tree diversity, which (once again) trades a higher bias for a lower variance, generally yielding an overall better model.

Random Forest:
```python
rnd_clf1 = RandomForestClassifier(n_estimators=200, max_leaf_nodes=16, n_jobs=-1, criterion='entropy')
```

Implementation RandomForest with bagging:
```python
bag_clf = BaggingClassifier(DecisionTreeClassifier(splitter="random", max_leaf_nodes=16), n_estimators=500, max_samples=1.0, bootstrap=True, n_jobs=-1)

```

On each node we select the feature(s) and calulate the gini index for the node in question. How the feature(s) along with the output affects gini.

|x1 |x2 |$\hat{y}$|
|---|---|---------|
|2  |1  |0        |
|3  |2  |0        |        
|5  |1  |0        |                
|4  |2  |1        |
|5  |1  |1        |

__condition 1: x1<5 : 0 else 1,__ 

gini index on that node1 => $1 -( \frac{3}{3})^{2} -  (\frac{0}{3})^{2} = 0$

gini index on that node2 => $1 -( \frac{2}{2})^{2} -  (\frac{0}{2})^{2} = 0$

GINI impurity for x1 = $\frac{3}{3+2} \times 0 + \frac{2}{3+2} \times 0 = 0$

__condition 2: x2==1 : 0 else 1,__ 

gini index on that node1 => $1 - (\frac{2}{3})^{2} -(\frac{1}{3})^{2}= 0.445$

gini index on that node2 => $1 - (\frac{1}{2})^{2} -(\frac{1}{2})^{2}= 0.5$

GINI impurity for x2 = $\frac{3}{3+2} \times 0.445 + \frac{2}{3+2} \times 0.5 = 0.467$

Hence condition 1 is prefered

### proximity matrix?


## Extra Tree Classifiers
Growing a tree in a Random Forest, at each node only a random subset of the features is considered for splitting (as discussed earlier). It is possible to make trees even __more random by also using random thresholds__ for each feature rather than searching for the best possible thresholds.

They are called Extremely Randomized Tree ensemble, Extra-tree for short.

This trades more bias for a lower variance.

It also makes Extra-Trees much faster to train than regular Random Forests since finding the best possible threshold for each feature at every node is one of the most time-consuming tasks of growing a tree.

Feature importance can also for fetched in sklearn by feature_importances_ variable.

0.467