CaliForest: Calibrated Random Forest Model by kumar70uiuc · Pull Request #995 · sunlabuiuc/PyHealth

kumar70uiuc · 2026-04-15T22:33:43Z

CaliForest: Calibrated Random Forest Model

Contributor:

Abhinav Kumar (kumar70@illinois.edu)
Saswati Munshi (saswati2@illinois.edu)
Zhi Liu (zhiliu3@illinois.edu)

Type: Model Contribution (Option 2)

Paper: CaliForest: Calibrated Random Forest for Clinical Risk Prediction (https://pmc.ncbi.nlm.nih.gov/articles/PMC8299436/)

Description

This PR implements CaliForest, a calibrated Random Forest that uses Out-of-Bag (OOB) predictions for internal calibration, eliminating the need for a separate calibration holdout set. This is particularly valuable in data-limited clinical settings.

Key Features

OOB-based calibration (no data splitting required)
Support for Isotonic Regression and Platt Scaling
Configurable minimum OOB trees for reliability filtering
Full compatibility with scikit-learn interface

Files to Review

pyhealth/models/califorest.py - Main implementation
tests/test_califorest.py - Test suite
examples/califorest_mortality_ablation.py - Ablation study
docs/api/models/pyhealth.models.califorest.rst - Documentation

8 test passed, using synthetic data

Califorest saswati

joshuasteier · 2026-04-18T03:55:43Z

Hi Abhinav(@kumar70uiuc), Saswati, and Zhi, thanks for the detailed implementation. A few things here are genuinely strong: the hand-rolled OOB aggregation walking through rf_.estimators_ and rf_.estimators_samples_, the min_oob_trees reliability threshold from the paper, and the clean separation between fit, predict_proba, predict, and get_oob_calibration_data. The tests cover shapes and edge cases well.

I want to flag one thing up front. There is a parallel PR at #999 that also implements CaliForest, and it uses the BaseModel integration pattern that the rest of pyhealth/models/ follows. Could you take a look at how it is structured and compare with your approach? Specifically, PyHealth models inherit from pyhealth.models.BaseModel, take a SampleDataset in the constructor, and return {"loss", "y_prob", "y_true", "logit"} from forward so they work with pyhealth.trainer.Trainer. A good reference is pyhealth/models/logistic_regression.py, which despite the name is a torch-based model built on EmbeddingModel. The sklearn base class and raw X, y interface here would not plug into the trainer the same way.

The most valuable path forward for your PR is to port it to inherit BaseModel and take a SampleDataset. The sklearn RF and your calibration logic can stay inside the class. You just wrap them behind the forward dict. The min_oob_trees reliability filtering you implemented is a real contribution that is not in #999, and if you get the architecture aligned we would want your version to be what lands.

Smaller items worth fixing alongside the refactor:

predict_proba is hardcoded to binary (raw_proba[:, 1] and the 2-column output), while fit partially generalizes with n_classes and pos_idx. On a 3-class input this would silently calibrate on class index 1. Please enforce binary in fit with a clear ValueError, or implement multiclass properly.
The hand-rolled OOB aggregation duplicates what sklearn gives you as rf_.oob_decision_function_ when oob_score=True. Please either use the built-in, or add a test that asserts the hand-rolled output matches rf_.oob_decision_function_ on samples that pass the tree-count threshold. Right now there is no cross-check.
The 21 tests cover shapes and smoke behavior but not the paper's main claim. A test that compares Brier score or ECE between CaliForest and a plain RandomForestClassifier on synthetic data would directly support the calibration improvement claim. A test that min_oob_trees above 1 actually filters samples out of the calibration fit would verify your threshold logic.
tests/utils.py is added but never imported by the test file. It also has a print("Testing Utils") inside the function body above the docstring, which will show up in test output. If you have a use for the helper please import it, otherwise remove the file.
except Exception: continue inside the OOB loop swallows every error silently. Please narrow it to the specific exceptions you expect, or log a warning.
np.clip(calibrated_proba, 0.0, 1.0) after fitting isotonic with y_min=0.0, y_max=1.0, out_of_bounds="clip" is redundant. Either the clip or the bounds can go.

Happy to answer questions as you work on the refactor.

kumar70uiuc · 2026-04-20T20:03:54Z

We have addressed the review comments provided above.

kumar70uiuc and others added 10 commits April 8, 2026 00:58

This PR implements CaliForest,

2cd5c26

test utils

3616113

Add utility functions to tests/utils.py for testing

9771a3a

Add utility functions to tests/utils.py for testing

b84c013

Update califorest.py

ddba614

Update test_califorest.py

e132eb4

Adding synthetic pytest suite for dataset, task, and model testing

1d0eee5

Merge pull request #3 from saswati2/califorest-saswati

c9d23be

Califorest saswati

Merge branch 'sunlabuiuc:master' into califorest-abhinav

2e3deac

Add more tests cases for califorest model

9cd1225

joshuasteier mentioned this pull request Apr 18, 2026

dl4h final project kobeguo2 - CaliForest #999

Merged

kumar70uiuc added 3 commits April 20, 2026 14:57

Delete tests/utils.py

6cc0c72

Update test_califorest.py

24403df

Update califorest.py

9d285c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CaliForest: Calibrated Random Forest Model#995

CaliForest: Calibrated Random Forest Model#995
kumar70uiuc wants to merge 13 commits intosunlabuiuc:masterfrom
kumar70uiuc:cs598-califorest-model

kumar70uiuc commented Apr 15, 2026 •

edited

Loading

Uh oh!

joshuasteier commented Apr 18, 2026

Uh oh!

kumar70uiuc commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kumar70uiuc commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!