# Hoeffding Tree model tests

This notebook aims to, as the logistic-regression-tests.ipynb one, implement a Hoeffding Tree model to be used as a black-box to eval the feature selection performance on datasets.

## Dataset setup

In [None]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split

dataset_path = "../datasets/pima-indians-diabetes-database/diabetes.csv"
data = pd.read_csv(dataset_path)
print(data.info(verbose=True))

y = data['Outcome']
X = data.drop('Outcome', axis=1)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None


## Model setup and evaluation

In [None]:
from river.tree import HoeffdingTreeClassifier
from river.metrics import ClassificationReport
from river.metrics import ConfusionMatrix

model = HoeffdingTreeClassifier()
report = ClassificationReport()
cm = ConfusionMatrix()

for i, x in X.iterrows():
    predicted_y = model.predict_one(x)

    if type(predicted_y) == type(y[i]): # to avoid incorpore predictions with types that aren't comparable
        report.update(y[i], predicted_y)
        cm.update(y[i], predicted_y)

    model.learn_one(x, y[i])

print(report)
print(cm)

           Precision   Recall   F1       Support  
                                                  
       0      78.70%   79.80%   79.25%       500  
       1      61.15%   59.55%   60.34%       267  
                                                  
   Macro      69.93%   69.68%   69.79%            
   Micro      72.75%   72.75%   72.75%            
Weighted      72.59%   72.75%   72.66%            

                 72.75% accuracy                  
    0     1    
0   399   101  
1   108   159  
