Conversation
Replace PostgreSQL partitioned schema with TimescaleDB hypertable: - 1-hour chunks with 32 hash partitions on node_id - Continuous aggregates: event_stats_1m (2min refresh), event_stats_1h - Compression after 2h (segmentby node_id + event_type) - Retention: raw 7d, 1m aggs 30d, 1h aggs 365d - event_type SMALLINT (was INTEGER), event_id BIGINT (no PK) - No per-row triggers (app-level batch stats instead) - Event types lookup table with convenience view - Docker image: timescale/timescaledb:latest-pg16
- Remove all partition management code (ensure_partitions_exist, spawn_partition_maintenance, shutdown, check_partition_health, PartitionHealth struct) - Add store_events_batch() with COPY BINARY for >10 events, simple INSERT for <=10 events - Add update_node_stats() using unnest() for concurrent-safe batch updates - Update event_type from i32 to i16 (SMALLINT) - Update store_nodes_connected_batch() with address parameter - Add ping(), get_node_by_id(), get_cores_telemetry_agg() - Remove PartitionHealth export from lib.rs - Adapt all query methods for parameterized DurationPreset intervals
Replace single-task batch writer with parallel workers: - Arc<Mutex<Receiver>> shared across all workers - Each worker drains events into local batch (up to 16k) then flushes via store.store_events_batch() - Timeout-based accumulation (100ms) prevents tiny flushes - Separate stats flusher task aggregates node counts every 5s - node_connected() now takes address parameter - flush() sends sentinel to all workers and waits for responses - Channel buffer: 5M events
- health.rs: remove partition_check() (no partitions in TimescaleDB) - main.rs: remove partition health check (5 checks instead of 6) - rate_limiter.rs: make MAX_CONNECTIONS pub for test access
- Restructure TelemetryServer to store TcpListener (enables port 0 binding) - Add with_options() constructor with no_rate_limit parameter - Add local_addr(), wait_for_connections() for deterministic test sync - Add connection_watch channel for tracking connection count changes - Remove BytesMutExt trait in favor of bytes::Buf - Remove read timeouts (handled by TCP keepalive) - api.rs: remove secondary_interval() params from store query call sites
…eBuilt These events had stub encoding (0 bytes) which caused data-driven tests to silently fail — events were sent as empty payloads and never stored.
- Add 13 data-driven tests validating JSONB query paths against real events - Add now_jce_micros() helper for realistic test timestamps - Update all test setup functions to use port 0 + local_addr() pattern - Set test cache TTL to zero to avoid stale cache hits - Use realistic JCE-relative timestamps so events pass time-window filters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance: TimescaleDB migration + parallel batch writer
Motivation
We ran jip-3-spammer against v0.3.0 with 300 nodes at
realisticrates (~258K events/s total) and hit performance problems pretty quickly — the single
batch writer couldn't keep up, events started dropping, and the write buffer filled up
within seconds of the nodes connecting.
Why
The original PostgreSQL schema uses a single flat
eventstable with 6+ indexes. At 3M events/s from 1024 nodes, every INSERT must update all indexes, all writes hit the same table, and aggregate queries (dashboards, stats) scan the entire table. This doesn't scale.Database: PostgreSQL → TimescaleDB
TimescaleDB is a PostgreSQL extension — same Postgres, just with time-series superpowers.
Hypertable with automatic chunking
The
eventstable is split into 1-hour chunks automatically. Queries like "events in the last hour" scan 1-2 chunks instead of the whole table. Old data is dropped per-chunk (DROP TABLE, instant) instead ofDELETE+ vacuum.32 hash partitions on
node_idEach 1-hour chunk is further split into 32 sub-chunks by hashing
node_id. This spreads writes from 1024 nodes across 32 parallel physical tables — 32x less lock contention on indexes and WAL. Queries filtering bynode_idonly scan 1/32 of each chunk. This is a DB-internal detail, transparent to application code.Continuous aggregates (pre-computed rollups)
Instead of running
COUNT(*)over billions of rows on every API request:event_stats_1m— per-minute counts per node/type, refreshed every 2 minevent_stats_1h— per-hour counts, built from the 1m aggregate (not raw events)These are incrementally maintained by TimescaleDB — only changed chunks get re-aggregated.
Data retention pyramid
After 7 days raw event data is gone, but you still know how many events each node sent — per-minute for 30 days, per-hour for a year.
Compression after 2 hours
Columnar compression grouped by
(node_id, event_type), ordered bytimestamp DESC. Typical 10-20x compression ratio. Queries can skip irrelevant segments without decompressing.Other schema changes
event_type:INTEGER→SMALLINT(130 types fit in 2 bytes, saves ~500GB/day at full throughput)id BIGSERIAL PRIMARY KEY→event_id BIGINT(no PK — hypertables don't support it, and ON CONFLICT dedup is too expensive at 3M/s)postgres:18-alpine→timescale/timescaledb:latest-pg16Ingestion: Work-stealing batch writer pool
Single writer replaced with 32 parallel workers sharing an
Arc<Mutex<Receiver>>:store_event()— everything is batchedServer improvements
debug!totrace!to reduce log noisewait_for_connections()watch channel for deterministic test synchronizationTests
WorkPackageReceived,Authorized,Refined,GuaranteeBuiltBug fixes
da_statsINTEGER overflow: Status fields (num_shards,num_preimages,preimages_size) cast toBIGINTinstead ofINTEGER