Skip to content

CaliForest: Calibrated Random Forest Model#995

Open
kumar70uiuc wants to merge 13 commits intosunlabuiuc:masterfrom
kumar70uiuc:cs598-califorest-model
Open

CaliForest: Calibrated Random Forest Model#995
kumar70uiuc wants to merge 13 commits intosunlabuiuc:masterfrom
kumar70uiuc:cs598-califorest-model

Conversation

@kumar70uiuc
Copy link
Copy Markdown

@kumar70uiuc kumar70uiuc commented Apr 15, 2026

CaliForest: Calibrated Random Forest Model

Contributor:

Type: Model Contribution (Option 2)

Paper: CaliForest: Calibrated Random Forest for Clinical Risk Prediction (https://pmc.ncbi.nlm.nih.gov/articles/PMC8299436/)

Description

This PR implements CaliForest, a calibrated Random Forest that uses Out-of-Bag (OOB) predictions for internal calibration, eliminating the need for a separate calibration holdout set. This is particularly valuable in data-limited clinical settings.

Key Features

  • OOB-based calibration (no data splitting required)
  • Support for Isotonic Regression and Platt Scaling
  • Configurable minimum OOB trees for reliability filtering
  • Full compatibility with scikit-learn interface

Files to Review

  1. pyhealth/models/califorest.py - Main implementation
  2. tests/test_califorest.py - Test suite
  3. examples/califorest_mortality_ablation.py - Ablation study
  4. docs/api/models/pyhealth.models.califorest.rst - Documentation

8 test passed, using synthetic data

@joshuasteier
Copy link
Copy Markdown
Collaborator

Hi Abhinav(@kumar70uiuc), Saswati, and Zhi, thanks for the detailed implementation. A few things here are genuinely strong: the hand-rolled OOB aggregation walking through rf_.estimators_ and rf_.estimators_samples_, the min_oob_trees reliability threshold from the paper, and the clean separation between fit, predict_proba, predict, and get_oob_calibration_data. The tests cover shapes and edge cases well.

I want to flag one thing up front. There is a parallel PR at #999 that also implements CaliForest, and it uses the BaseModel integration pattern that the rest of pyhealth/models/ follows. Could you take a look at how it is structured and compare with your approach? Specifically, PyHealth models inherit from pyhealth.models.BaseModel, take a SampleDataset in the constructor, and return {"loss", "y_prob", "y_true", "logit"} from forward so they work with pyhealth.trainer.Trainer. A good reference is pyhealth/models/logistic_regression.py, which despite the name is a torch-based model built on EmbeddingModel. The sklearn base class and raw X, y interface here would not plug into the trainer the same way.

The most valuable path forward for your PR is to port it to inherit BaseModel and take a SampleDataset. The sklearn RF and your calibration logic can stay inside the class. You just wrap them behind the forward dict. The min_oob_trees reliability filtering you implemented is a real contribution that is not in #999, and if you get the architecture aligned we would want your version to be what lands.

Smaller items worth fixing alongside the refactor:

  1. predict_proba is hardcoded to binary (raw_proba[:, 1] and the 2-column output), while fit partially generalizes with n_classes and pos_idx. On a 3-class input this would silently calibrate on class index 1. Please enforce binary in fit with a clear ValueError, or implement multiclass properly.

  2. The hand-rolled OOB aggregation duplicates what sklearn gives you as rf_.oob_decision_function_ when oob_score=True. Please either use the built-in, or add a test that asserts the hand-rolled output matches rf_.oob_decision_function_ on samples that pass the tree-count threshold. Right now there is no cross-check.

  3. The 21 tests cover shapes and smoke behavior but not the paper's main claim. A test that compares Brier score or ECE between CaliForest and a plain RandomForestClassifier on synthetic data would directly support the calibration improvement claim. A test that min_oob_trees above 1 actually filters samples out of the calibration fit would verify your threshold logic.

  4. tests/utils.py is added but never imported by the test file. It also has a print("Testing Utils") inside the function body above the docstring, which will show up in test output. If you have a use for the helper please import it, otherwise remove the file.

  5. except Exception: continue inside the OOB loop swallows every error silently. Please narrow it to the specific exceptions you expect, or log a warning.

  6. np.clip(calibrated_proba, 0.0, 1.0) after fitting isotonic with y_min=0.0, y_max=1.0, out_of_bounds="clip" is redundant. Either the clip or the bounds can go.

Happy to answer questions as you work on the refactor.

@kumar70uiuc
Copy link
Copy Markdown
Author

We have addressed the review comments provided above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants