Downsampling with aggregation instead of compaction
When first launched, M3 employed batched read downsampling of stored metrics, and then added a more efficient stream-based downsampling that could be used alongside; however, we found it much more efficient and far simpler operationally to downsample these metrics upfront and avoid batched read downsampling altogether. M3’s stream-based downsampling used to utilize batch read downsampling to catch up on time periods that were missed while the stream downsampling process was not running.
However, M3’s batch read downsampling performance was far less efficient than its stream-based downsampling; sometimes, it would even take several hours to downsample periods missed by stream-based downsampling, leading to stale metrics. With sufficient downtime of stream-based downsampling, batched read downsampling could take half the time it took to retain the metrics read during peak times. For these reasons, we switched to purely stream-based downsampling with replication and leader election.
The M3 aggregator provides all the downsampling capabilities for large deployments, while the M3 coordinator embeds the aggregator and provides in-process stream based downsampling when using it as a sidecar for Prometheus. If deploying and using Prometheus in an HA manner with multiple instances providing active-active scraping, the M3 coordinator can be deployed alongside these multiple instances to provide HA downsampling in a similar manner.