Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

insights: over-reported numbers from backend #18582

@emidoots

Description

@emidoots

Whoopsies, I made a mistake here and the numbers reported by the backend are wrong in the 3.25 release - over-reported by ~2x roughly depending on the situation. They aren't recorded wrong, but are queried wrong.

I originally added the aggregation behavior in https://github.com/sourcegraph/sourcegraph/pull/18506 - and did so on the assumption that we store 1 data point every 12h and thus SUMing all data points over a 12h span would give the accurate count.

@efritz called out this might be wrong, and I created https://github.com/sourcegraph/sourcegraph/issues/18510 which signals an issue with this type of SUM aggregation during periods of frequent service restarts - but I failed to realize there were two other edge cases:

  1. If the 12h aggregation window does not align on the dot with the interval of recorded data, then we will sometimes get 2 of the 12h data points SUM'd together.
  2. The above can also be affected by whether or not the data point is recorded exactly on the 12h interval, or is delayed a bit: this could happen due to the search taking e.g. 59s to complete - or in the future by nature of how the data back-filler will operate on a different interval schedule than the regular enqueuer.

SUM was a bad choice, likely need MAX.

Metadata

Metadata

Assignees

Labels

code-insightsIssues related to the Code Insights product

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions