Skip to content

[Bug] event_store on_disk_data_size can overflow to EiB #4062

@tenfyzhong

Description

@tenfyzhong

Description

The Grafana panel "Data Size On Disk" occasionally shows EiB-scale spikes, which are far larger than the real data size.

Context

Panel query:
sum(ticdc_event_store_on_disk_data_size{...}) by (instance)

Metric source:
collectAndReportStoreMetrics calls diskSpaceUsage(stats) and exports
ticdc_event_store_on_disk_data_size.

In diskSpaceUsage, pebble.Metrics.Compact.InProgressBytes (int64) is cast
directly to uint64 and added:
usageBytes += uint64(m.Compact.InProgressBytes)

In Pebble, InProgressBytes is signed and may go negative when compactions finish.
If it becomes negative, the uint64 cast wraps to ~2^64, producing an EiB-scale value.

Relevant code:

  • logservice/eventstore/event_store.go:1090-1103, 1138-1144
  • pebble/metrics.go: Compact.InProgressBytes is int64

Expected vs Actual

Expected: on_disk_data_size reflects real on-disk bytes and never shows EiB spikes.
Actual: occasional EiB-level spikes on the panel.

Proposed Solution

Clamp Compact.InProgressBytes to 0 when it is negative (or otherwise ensure a non-negative
value) before casting to uint64 in diskSpaceUsage.

Screenshot

Image

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions