## OOB Evaluation

Out-of-Bag (OOB) evaluation is a technique used in Random Forests to estimate the performance of the model without the need for a separate validation set. It leverages the concept of bootstrapping, where multiple training datasets are created by sampling with replacement from the original dataset. Here's how OOB evaluation works in Random Forest:

### Out-of-Bag Evaluation Process:

1. **Bootstrap Sampling**:
   - When constructing each decision tree in the Random Forest, a bootstrap sample is created by randomly selecting a subset of the original dataset with replacement.
   - On average, about 63.2% of the original data is included in each bootstrap sample, and the remaining 36.8% (approximately) is left out.

2. **Out-of-Bag Samples**:
   - The samples that are not included in the bootstrap sample for a particular tree are called out-of-bag (OOB) samples.
   - Each tree in the Random Forest uses a different bootstrap sample, so the OOB samples for each tree are unique.

3. **Prediction**:
   - During the training of each tree, the OOB samples are not used for training but are still passed down the tree for prediction.
   - Each OOB sample is passed through the decision tree, and its prediction is recorded.

4. **Aggregation**:
   - After training all the trees in the Random Forest, the predictions for each OOB sample are aggregated.
   - For classification tasks, the mode (most frequent class) of the predictions across all trees is considered the final prediction.
   - For regression tasks, the predictions are averaged to obtain the final prediction.

### Advantages of OOB Evaluation:

1. **Unbiased Estimate**: OOB evaluation provides an unbiased estimate of the model's performance without the need for a separate validation set.
  
2. **Efficiency**: It leverages the available data more efficiently by using all samples for both training and evaluation.

3. **Avoids Overfitting**: OOB evaluation helps prevent overfitting by evaluating the model's performance on unseen data during training.

### Implementation in Scikit-learn:

In Scikit-learn's Random Forest implementation, OOB evaluation can be enabled by setting the `oob_score` hyperparameter to `True` when creating the Random Forest classifier or regressor. After training the model, you can access the OOB score using the `oob_score_` attribute.

### Example Code:

```python
from sklearn.ensemble import RandomForestClassifier

# Create Random Forest classifier with OOB evaluation
rf_classifier = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=42)

# Train the model
rf_classifier.fit(X_train, y_train)

# OOB score
oob_score = rf_classifier.oob_score_
print("OOB Score:", oob_score)
```

In this example, `oob_score_` provides an estimate of the model's accuracy based on the out-of-bag samples. It serves as a useful diagnostic tool for evaluating the Random Forest's performance during training.

In [10]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score

In [11]:
df = pd.read_csv('heart.csv')
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
4,57,0,0,120,354,0,1,163,1,0.6,2,0,2,1


In [12]:
X = df.iloc[:,0:-1]
y = df.iloc[:,-1]

In [13]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)

In [14]:
rf = RandomForestClassifier(oob_score=True)

In [15]:
rf.fit(X_train,y_train)

RandomForestClassifier(oob_score=True)

In [16]:
rf.oob_score_

0.8057851239669421

In [17]:
y_pred = rf.predict(X_test)
accuracy_score(y_test,y_pred)

0.8688524590163934