Skip to content

Conversation

tkaemming
Copy link
Contributor

This adds a new data type, frequency tables, to the time series database and the API methods for:

  • adding items,
  • retrieving scores of known items over a time series (or aggregated over the series),
  • retrieving the most frequent items seen over that a time series.

The notable implementation of this is implemented in the Redis backend, which uses Lua scripting to implement the frequency table data structure as a combination of a top-N index (modeled as a sorted set in Redis) and an estimation matrix for a Count-Min Sketch (implemented using a hash table). These two structures allows implementing estimated top-N queries of high (effectively unbounded) cardinality in almost fixed space. Counts are 100% accurate until the index is filled (and no extra space is used for the estimation matrix until this point), after which the data structure switches to a probabilistic implementation and accuracy begins to degrade for less frequently observed items, but remains accurate for more frequently observed items. Check out the comment in src/sentry/scripts/tsdb/cmsketch.lua for a more detailed explanation on implementation.

This adds 6 new metrics to the TSDB:

# number of events sent to server for an organization (key is always 0)
frequent_organization_received_by_system = 400
# number of events rejected by server for an organization (key is always 0)
frequent_organization_rejected_by_system = 401
# number of events blacklisted by server for an organization (key is always 0)
frequent_organization_blacklisted_by_system = 402
# number of events seen for a project, by organization
frequent_projects_by_organization = 403
# number of issues seen for a project, by project
frequent_issues_by_project = 404
# number of issues seen for a tag value, by issue:tag
frequent_values_by_issue_tag = 405

This depends on getsentry/rb#14, which adds a cluster.execute_commands method that can be used to run pipelines containing Lua scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weird because there is an assumption in the API that all metrics have keys (in the case of the frequency table, a composite key) and I didn't want to overload the internal label with (0) another data type in case they end up having collisions.

@dcramer
Copy link
Member

dcramer commented Feb 3, 2016

#2487

@tkaemming tkaemming force-pushed the frequency-tables branch 3 times, most recently from a012dd2 to 7fca744 Compare February 9, 2016 00:59
@codecov-io
Copy link

Current coverage is 82.72%

Merging #2637 into master will decrease coverage by -0.45% as of 8b04d56

@@            master   #2637   diff @@
======================================
  Files          862     864     +2
  Stmts        32699   32946   +247
  Branches         0       0       
  Methods          0               
======================================
+ Hit          27197   27256    +59
  Partial          0       0       
- Missed        5502    5690   +188

Review entire Coverage Diff as of 8b04d56


Uncovered Suggestions

  1. +0.07% via ...try/utils/apidocs.py#414...436
  2. +0.07% via ...try/utils/apidocs.py#117...139
  3. +0.07% via ...gs/sentry_helpers.py#191...211
  4. See 7 more...

Powered by Codecov. Updated on successful CI builds.

tkaemming pushed a commit that referenced this pull request Feb 9, 2016
@tkaemming tkaemming merged commit 68e16be into master Feb 9, 2016
tkaemming added a commit that referenced this pull request Feb 10, 2016
- Revert "Merge pull request #2637 from getsentry/frequency-tables" This
  reverts commit 68e16be, reversing
  changes made to 1136acd.
- Revert "Fix shadowing behavior that resulted in incorrect routing for
  frequencies." This reverts commit
  e68ab9b.
@tkaemming tkaemming deleted the frequency-tables branch February 10, 2016 00:38
@tkaemming tkaemming restored the frequency-tables branch February 10, 2016 00:38
@tkaemming tkaemming deleted the frequency-tables branch February 10, 2016 00:41
@github-actions github-actions bot locked and limited conversation to collaborators Dec 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants