bench: add Polars streaming vs in-memory loader benchmark by AhmedAli58 · Pull Request #871 · sunlabuiuc/PyHealth

AhmedAli58 · 2026-02-24T00:58:52Z

What this does

Adds a benchmarking script that compares PyHealth's new Polars streaming data loader
against the legacy in-memory loader across RAM usage, wall-clock time, and throughput
at 3 dataset scales (100, 1k, 5k patients).

Why it matters

The new streaming loader was added without systematic benchmarks. This PR gives
maintainers and users data to make informed decisions about which loader to use
based on dataset size and available memory.

How to run

python benchmarks/loader_benchmark.py

Results

See benchmarks/results.csv and benchmarks/benchmark_chart.png for outputs.

Logiquo · 2026-02-24T01:06:31Z

InMemorySampleDataset is primarly designed for unittest, it is not as battle tested as the SampleDataset.

And also I think InMemorySampleDataset lacks a few functionality compare with SampleDataset.

jhnwu3 · 2026-02-24T15:10:08Z

Hey we did run some tests across a variety of things to make our decision: https://pyhealth.readthedocs.io/en/latest/why_pyhealth.html

Will be closing this PR. Would love to talk more if you want to discuss benchmarking PyHealth 2.0 in general on the discord

ahmed79x7 added 2 commits February 23, 2026 09:29

feat: add streaming vs in-memory loader benchmark

44f1d2b

bench: add Polars streaming vs in-memory loader benchmark

c16e4aa

jhnwu3 closed this Feb 24, 2026

Logiquo mentioned this pull request Mar 31, 2026

feat: add streaming vs in-memory loader benchmark #869

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: add Polars streaming vs in-memory loader benchmark#871

bench: add Polars streaming vs in-memory loader benchmark#871
AhmedAli58 wants to merge 2 commits intosunlabuiuc:masterfrom
AhmedAli58:bench/polars-streaming-loader-benchmark

AhmedAli58 commented Feb 24, 2026

Uh oh!

Logiquo commented Feb 24, 2026 •

edited

Loading

Uh oh!

jhnwu3 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

AhmedAli58 commented Feb 24, 2026

What this does

Why it matters

How to run

Results

Uh oh!

Logiquo commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhnwu3 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Logiquo commented Feb 24, 2026 •

edited

Loading