Skip to content

Heartbeat backfill#1191

Merged
michaeldjeffrey merged 7 commits into
mainfrom
mj/heartbeat-backfill
May 15, 2026
Merged

Heartbeat backfill#1191
michaeldjeffrey merged 7 commits into
mainfrom
mj/heartbeat-backfill

Conversation

@michaeldjeffrey
Copy link
Copy Markdown
Contributor

Backfill CLI command for heartbeats

  • heartbeat timestamp is the received_timestamp from ingest
  • we don't have historical asserted locations

Many of the early heartbeat files contain ~500k valid heartbeats.

Locally, I've been running with a --batch-size 20,000,000 as to not create a snapshot per file.
This gives us an arrow file of ~2.5Gb before upload. The parquet files in iceberg are ~32Mb.

Because of the size of heartbeat files, and the constraints we've been running the backfill jobs with, I would suggest also providing --batch-timeout 30min so we don't roll the default 1min and create a load of snapshots while waiting on parsing.

(--batch-size and --batch-timeout were also added as args for bans, speedtests, and speedtest-avgs.)

@michaeldjeffrey michaeldjeffrey requested review from bbalser and macpie May 13, 2026 23:40
@michaeldjeffrey michaeldjeffrey force-pushed the mj/heartbeat-backfill branch from a23455d to c6b7c97 Compare May 13, 2026 23:51
@michaeldjeffrey michaeldjeffrey force-pushed the mj/heartbeat-backfill branch from c6b7c97 to 02c0365 Compare May 13, 2026 23:57
@michaeldjeffrey michaeldjeffrey merged commit cee96ef into main May 15, 2026
54 of 55 checks passed
@michaeldjeffrey michaeldjeffrey deleted the mj/heartbeat-backfill branch May 15, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants