Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ddm): Allow multiple use case ids to be queried at once and parallelize #66298

Merged
merged 5 commits into from Mar 5, 2024

Conversation

iambriccardo
Copy link
Member

@iambriccardo iambriccardo commented Mar 5, 2024

This PR implements a new fully parallelized implementation of fetching metrics meta. The need for such implementation arose for two reasons:

  1. The previous implementation was slow
  2. The previous implementation didn't support multiple use case ids to be fetched in a single request

The new implementation parallelizes all queries for fetching data across entities and use case ids. It also maximizes parallelization when reverse resolving metric ids.

Closes: #66126

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 5, 2024
request = Request(
dataset=Dataset.Metrics.value,
dataset=Dataset.Metrics.value
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, the wrong dataset was used but we still got data back, interesting.

break

stored_metrics = get_stored_metrics_of_projects(projects, use_case_ids, start, end)
metrics_blocking_state = (
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to move out the check for the use case, since it's not a responsibility of get_metrics_blocking_state_of_projects since it should be use case agnostic.

@iambriccardo iambriccardo marked this pull request as ready for review March 5, 2024 09:51
@iambriccardo iambriccardo requested a review from a team as a code owner March 5, 2024 09:51
Copy link

codecov bot commented Mar 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.29%. Comparing base (e4571d1) to head (abcd6c3).
Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #66298      +/-   ##
==========================================
- Coverage   84.29%   84.29%   -0.01%     
==========================================
  Files        5306     5306              
  Lines      237101   237095       -6     
  Branches    41031    41033       +2     
==========================================
- Hits       199866   199859       -7     
- Misses      37016    37017       +1     
  Partials      219      219              
Files Coverage Δ
src/sentry/api/endpoints/organization_metrics.py 91.37% <100.00%> (+0.15%) ⬆️
src/sentry/snuba/metrics/datasource.py 97.61% <100.00%> (-0.18%) ⬇️
src/sentry/snuba/metrics/fields/base.py 94.55% <100.00%> (+0.03%) ⬆️

... and 6 files with indirect coverage changes

start: datetime | None = None,
end: datetime | None = None,
) -> Sequence[MetricMeta]:
if not projects:
return []

stored_metrics = get_stored_metrics_of_projects(projects, use_case_id, start, end)
metrics_blocking_state = get_metrics_blocking_state_of_projects(projects, use_case_id)
has_custom_use_case_id = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
has_custom_use_case_id = False
has_custom_use_case_id = UseCaseID.CUSTOM in use_case_ids

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work for lists? I will check now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works, nice!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ofc, python is pseudo code 😅

Comment on lines 313 to 334
entity_keys = defaultdict(set)
for use_case_id in use_case_ids:
entity_keys[use_case_id] = entity_keys[use_case_id].union(
get_entity_keys_of_use_case_id(use_case_id=use_case_id)
)

grouped_stored_metrics = {}
for stored_metric in stored_metrics:
grouped_stored_metrics.setdefault(stored_metric["metric_id"], []).append(
stored_metric["project_id"]
)
# We compute a list of all the queries that we want to run in parallel across entities and use cases.
requests = []
use_case_id_to_index = defaultdict(list)
for use_case_id, entity_keys in entity_keys.items():
for entity_key in entity_keys:
requests.append(
_get_metrics_by_project_for_entity_query(
entity_key=entity_key,
project_ids=project_ids,
org_id=org_id,
use_case_id=use_case_id,
start=start,
end=end,
)
)
use_case_id_to_index[use_case_id].append(len(requests) - 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i would do these two things in the same nested loops

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which two things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

computing entity_keys dict and then computing a list of requests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did put them separate to make the implementation work even if you pass two use case ids and if entity keys are different, but it's a bit overengineered. I can do like you said, will simplify! Thanks for the suggestion

Comment on lines 54 to 69
# mris = get_stored_metrics_of_projects([self.project], [UseCaseID.TRANSACTIONS])
# assert mris == {
# "d:transactions/duration@millisecond": [self.project.id],
# }
#
# mris = get_stored_metrics_of_projects([self.project], [UseCaseID.SESSIONS])
# assert mris == {
# "d:sessions/duration@second": [self.project.id],
# "c:sessions/session@none": [self.project.id],
# "s:sessions/user@none": [self.project.id],
# }
#
# mris = get_stored_metrics_of_projects([self.project], [UseCaseID.CUSTOM])
# assert mris == {
# custom_mri: [self.project.id],
# }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# mris = get_stored_metrics_of_projects([self.project], [UseCaseID.TRANSACTIONS])
# assert mris == {
# "d:transactions/duration@millisecond": [self.project.id],
# }
#
# mris = get_stored_metrics_of_projects([self.project], [UseCaseID.SESSIONS])
# assert mris == {
# "d:sessions/duration@second": [self.project.id],
# "c:sessions/session@none": [self.project.id],
# "s:sessions/user@none": [self.project.id],
# }
#
# mris = get_stored_metrics_of_projects([self.project], [UseCaseID.CUSTOM])
# assert mris == {
# custom_mri: [self.project.id],
# }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol

@iambriccardo iambriccardo merged commit ebfdc02 into master Mar 5, 2024
49 checks passed
@iambriccardo iambriccardo deleted the riccardo/feat/multi-use-case-endpoint branch March 5, 2024 11:46
Copy link

sentry-io bot commented Mar 5, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ QueryExecutionTimeMaximum: DB::Exception: Estimated query execution time (71.86432810202771 seconds) is too long. Maximum: 3... /api/0/organizations/{organization_slug}/metric... View Issue
  • ‼️ TypeError: 'NoneType' object is not iterable /api/0/organizations/{organization_slug}/metric... View Issue
  • ‼️ SnubaError: HTTPConnectionPool(host='127.0.0.1', port=10006): Read timed out. (read timeout=30) /api/0/organizations/{organization_slug}/metric... View Issue

Did you find this useful? React with a 👍 or 👎

@github-actions github-actions bot locked and limited conversation to collaborators Mar 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SPIKE] Figure out if we can query multiple use cases in the metrics meta (at once)
2 participants