Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(txnames): Mark project as ready after N clusterer runs #48993

Merged
merged 17 commits into from
May 22, 2023

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented May 12, 2023

Previously, Relay would mark any URL transaction as sanitized as long as there is at least one clusterer rule. The idea was to keep new projects in << unparameterized >> mode until they had some rules, but this backfired because some projects regressed after their single rule had expired.

To make sure that no project ever flips from "sanitize everything" back to << unparametrized >>, this PR introduces a counter stored in project options. Once the cluster has run 10 times, set a flag in the project config so Relay knows it can mark everything as sanitized.

After deploying this + the accompanying Relay change, we should closely monitor the following metrics:

  • relay_pop.event.transaction_name_changes with source_in:url, source_out:sanitized - we should see an increase here.
  • sentry.sentry_metrics.indexer.process_messages.dropped_message. The cardinality rate limiter might drop more messages because more transactions get marked as sanitized.

Requires getsentry/relay#2128

ref: https://github.com/getsentry/team-ingest/issues/124

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label May 12, 2023
@codecov
Copy link

codecov bot commented May 12, 2023

Codecov Report

Merging #48993 (a8bb34b) into master (f0d75d3) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head a8bb34b differs from pull request most recent head 6e1f07c. Consider uploading reports for the commit 6e1f07c to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #48993   +/-   ##
=======================================
  Coverage   80.95%   80.95%           
=======================================
  Files        4819     4820    +1     
  Lines      202111   202132   +21     
  Branches    11365    11365           
=======================================
+ Hits       163616   163635   +19     
- Misses      38241    38243    +2     
  Partials      254      254           
Impacted Files Coverage Δ
src/sentry/ingest/transaction_clusterer/meta.py 100.00% <100.00%> (ø)
src/sentry/ingest/transaction_clusterer/tasks.py 97.67% <100.00%> (+0.11%) ⬆️
src/sentry/relay/config/__init__.py 96.69% <100.00%> (+0.05%) ⬆️

... and 4 files with indirect coverage changes

jjbayer added a commit to getsentry/relay that referenced this pull request May 16, 2023
Add a flag to project configs which indicates whether URL transaction
names should always be considered `sanitized`, even if there are no
clustering rules.

This PR does not yet _use_ the flag, it is just a prerequisite for
merging getsentry/sentry#48993.

ref: getsentry/team-ingest#124
@jjbayer jjbayer changed the base branch from master to feat/relay-version-2 May 17, 2023 08:56
@jjbayer jjbayer marked this pull request as ready for review May 17, 2023 09:33
@jjbayer jjbayer requested a review from a team May 17, 2023 09:33
Base automatically changed from feat/relay-version-2 to master May 17, 2023 12:38
@asottile-sentry asottile-sentry requested a review from a team as a code owner May 17, 2023 12:38
jjbayer added a commit that referenced this pull request May 17, 2023
Use newest Relay library version. Required for
#48993.
Comment on lines +13 to +14
first_run: int
last_run: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to keep track of the first and last runs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason, just for debugging.

@jjbayer jjbayer merged commit 3c55f87 into master May 22, 2023
53 checks passed
@jjbayer jjbayer deleted the feat/txnames-cluster-meta branch May 22, 2023 11:38
jjbayer added a commit to getsentry/relay that referenced this pull request May 23, 2023
Start using the flag introduced by
getsentry/sentry#48993 and
#2128.

Once the clusterer has run ~10 times, trust that all the rules than can
be discovered have been discovered, and mark all incoming URL
transactions as sanitized.

ref: getsentry/team-ingest#124
jjbayer added a commit to getsentry/relay that referenced this pull request May 25, 2023
We now mark all URL transactions as sanitized after 10 clusterer runs
(see getsentry/sentry#48993), except 404s. Tag
those explicitly on the metric so we can confirm that non-404s are now
sanitized at 100%.

ref: getsentry/team-ingest#124
volokluev pushed a commit that referenced this pull request May 30, 2023
Use newest Relay library version. Required for
#48993.
volokluev pushed a commit that referenced this pull request May 30, 2023
Previously, Relay would mark _any_ URL transaction as sanitized as long
as there is at least one clusterer rule. The idea was to keep new
projects in `<< unparameterized >>` mode until they had some rules, but
this backfired because some projects regressed after their single rule
had expired.

To make sure that no project ever flips from "sanitize everything" back
to `<< unparametrized >>`, this PR introduces a counter stored in
project options. Once the cluster has run 10 times, set a flag in the
project config so Relay knows it can mark everything as sanitized.

After deploying this + the accompanying Relay change, we should closely
monitor the following metrics:

* `relay_pop.event.transaction_name_changes` with `source_in:url,
source_out:sanitized` - we should see an increase here.
* `sentry.sentry_metrics.indexer.process_messages.dropped_message`. The
cardinality rate limiter might drop more messages because more
transactions get marked as `sanitized`.

Requires getsentry/relay#2128

ref: getsentry/team-ingest#124
@github-actions github-actions bot locked and limited conversation to collaborators Jun 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants