# Testing Random Forests with Foresight

There are two set of objects we'll have to comprehensive test.

| Basic                  | Foresight enabled        |
|------------------------|--------------------------|
| DecisionTreeClassifier | FSDecisionTreeClassifier |
| DecisionTreeRegressor  | FSDecisionTreeRegressor  | 
| RandomForestClassifier | FSRandomForestClassifier | 
| RandomForestRegressor  | FSRandomForestRegressor  |

**Basic** and **Foresight enabled** only differ in one place. In **Basic**, `n_features` features are randomly selected with *uniform weights* for all features, however, in **Foresight enabled**, `n_features` features are randomly selected with *mutual information* used as weights. 

All the arguments to be sent to both set of classes are the same as well. 


In [1]:
import sys
sys.path.append('./TreeMethods/')
from TreeMethods import  DecisionTreeClassifier, DecisionTreeRegressor, RandomForestClassifier, RandomForestRegressor
from TreeMethods import  FSDecisionTreeRegressor, FSRandomForestClassifier, FSRandomForestRegressor
reload(FSRandomForestClassifier)

<module 'TreeMethods.FSRandomForestClassifier' from 'TreeMethods/FSRandomForestClassifier.pyc'>

In [2]:
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification as makec
from sklearn.datasets import load_wine, load_iris       # classification 
from sklearn.datasets import load_boston, load_diabetes # regression 
from sklearn.metrics import classification_report, confusion_matrix

## Classification

We'll be comparing `FSRandomForestClassifier` with `RandomForestClassifier` to see if `Foresight` leads to any improvement.

In [3]:
w = load_wine()
i = load_iris()
###
wine = pd.DataFrame(w.data,columns=[w.feature_names])
wine['Target'] = pd.Series(data=w.target)
iris = pd.DataFrame(i.data,columns=[i.feature_names])
iris['Target'] = pd.Series(data=i.target)

In [4]:
rfc   = RandomForestClassifier.RandomForestClassifier(n_trees=10,max_depth=2,min_size=2,cost='gini')
fsrfc = FSRandomForestClassifier.FSRandomForestClassifier(n_feat=4,n_trees=10,max_depth=2,min_size=2,cost='gini')

In [5]:
print wine.columns
print iris.columns

Index([u'alcohol', u'malic_acid', u'ash', u'alcalinity_of_ash', u'magnesium',
       u'total_phenols', u'flavanoids', u'nonflavanoid_phenols',
       u'proanthocyanins', u'color_intensity', u'hue',
       u'od280/od315_of_diluted_wines', u'proline', u'Target'],
      dtype='object')
Index([u'sepal length (cm)', u'sepal width (cm)', u'petal length (cm)',
       u'petal width (cm)', u'Target'],
      dtype='object')


In [6]:
rfc.fit(wine,target='Target')
fsrfc.fit(wine,target='Target')

  y = column_or_1d(y, warn=True)
  wgts = 1. / self.mi_features[final_features[-1], get_a_heap_of_features]
  wgts /= np.sum(wgts)
  final_features.append(np.random.choice(get_a_heap_of_features, size=1, p=wgts)[0])


In [7]:
res_nfs = []
res_fs  = []
for idx in wine.index:
    res_nfs.append(rfc.predict(wine.loc[[idx]].squeeze()))
    res_fs.append(fsrfc.predict(wine.loc[[idx]].squeeze()))

In [8]:
print confusion_matrix(res_nfs,wine['Target'])
print confusion_matrix(res_fs,wine['Target'])

[[53  0  0]
 [ 6 71  4]
 [ 0  0 44]]
[[48  0  0]
 [11 71  4]
 [ 0  0 44]]


In [9]:
tn = ['class 0', 'class 1', 'class 2']
print classification_report(res_nfs,wine['Target'],target_names=tn)
print classification_report(res_fs,wine['Target'],target_names=tn)

             precision    recall  f1-score   support

    class 0       0.90      1.00      0.95        53
    class 1       1.00      0.88      0.93        81
    class 2       0.92      1.00      0.96        44

avg / total       0.95      0.94      0.94       178

             precision    recall  f1-score   support

    class 0       0.81      1.00      0.90        48
    class 1       1.00      0.83      0.90        86
    class 2       0.92      1.00      0.96        44

avg / total       0.93      0.92      0.92       178



### Set Get params

`RandomForestClassifier RandomForestRegressor FSRandomForestClassifier  FSRandomForestRegressor ` all of them have `setparams` and `getparams` methods. 

In case of regular Random Forests, dictionary has `max_depth, min_size, n_trees`. In other case of Foresight enabled RF, dictionary additionally has `n_features` too.

In [10]:
rfc.getparams(), fsrfc.getparams()

({'max_depth': 2, 'min_size': 2, 'n_trees': 10},
 {'max_depth': 2, 'min_size': 2, 'n_features': 4, 'n_trees': 10})