-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ddm): Allow multiple use case ids to be queried at once and parallelize #66298
Conversation
request = Request( | ||
dataset=Dataset.Metrics.value, | ||
dataset=Dataset.Metrics.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, the wrong dataset was used but we still got data back, interesting.
break | ||
|
||
stored_metrics = get_stored_metrics_of_projects(projects, use_case_ids, start, end) | ||
metrics_blocking_state = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to move out the check for the use case, since it's not a responsibility of get_metrics_blocking_state_of_projects
since it should be use case agnostic.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #66298 +/- ##
==========================================
- Coverage 84.29% 84.29% -0.01%
==========================================
Files 5306 5306
Lines 237101 237095 -6
Branches 41031 41033 +2
==========================================
- Hits 199866 199859 -7
- Misses 37016 37017 +1
Partials 219 219
|
start: datetime | None = None, | ||
end: datetime | None = None, | ||
) -> Sequence[MetricMeta]: | ||
if not projects: | ||
return [] | ||
|
||
stored_metrics = get_stored_metrics_of_projects(projects, use_case_id, start, end) | ||
metrics_blocking_state = get_metrics_blocking_state_of_projects(projects, use_case_id) | ||
has_custom_use_case_id = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has_custom_use_case_id = False | |
has_custom_use_case_id = UseCaseID.CUSTOM in use_case_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work for lists? I will check now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works, nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ofc, python is pseudo code 😅
entity_keys = defaultdict(set) | ||
for use_case_id in use_case_ids: | ||
entity_keys[use_case_id] = entity_keys[use_case_id].union( | ||
get_entity_keys_of_use_case_id(use_case_id=use_case_id) | ||
) | ||
|
||
grouped_stored_metrics = {} | ||
for stored_metric in stored_metrics: | ||
grouped_stored_metrics.setdefault(stored_metric["metric_id"], []).append( | ||
stored_metric["project_id"] | ||
) | ||
# We compute a list of all the queries that we want to run in parallel across entities and use cases. | ||
requests = [] | ||
use_case_id_to_index = defaultdict(list) | ||
for use_case_id, entity_keys in entity_keys.items(): | ||
for entity_key in entity_keys: | ||
requests.append( | ||
_get_metrics_by_project_for_entity_query( | ||
entity_key=entity_key, | ||
project_ids=project_ids, | ||
org_id=org_id, | ||
use_case_id=use_case_id, | ||
start=start, | ||
end=end, | ||
) | ||
) | ||
use_case_id_to_index[use_case_id].append(len(requests) - 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: i would do these two things in the same nested loops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which two things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
computing entity_keys dict and then computing a list of requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did put them separate to make the implementation work even if you pass two use case ids and if entity keys are different, but it's a bit overengineered. I can do like you said, will simplify! Thanks for the suggestion
# mris = get_stored_metrics_of_projects([self.project], [UseCaseID.TRANSACTIONS]) | ||
# assert mris == { | ||
# "d:transactions/duration@millisecond": [self.project.id], | ||
# } | ||
# | ||
# mris = get_stored_metrics_of_projects([self.project], [UseCaseID.SESSIONS]) | ||
# assert mris == { | ||
# "d:sessions/duration@second": [self.project.id], | ||
# "c:sessions/session@none": [self.project.id], | ||
# "s:sessions/user@none": [self.project.id], | ||
# } | ||
# | ||
# mris = get_stored_metrics_of_projects([self.project], [UseCaseID.CUSTOM]) | ||
# assert mris == { | ||
# custom_mri: [self.project.id], | ||
# } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# mris = get_stored_metrics_of_projects([self.project], [UseCaseID.TRANSACTIONS]) | |
# assert mris == { | |
# "d:transactions/duration@millisecond": [self.project.id], | |
# } | |
# | |
# mris = get_stored_metrics_of_projects([self.project], [UseCaseID.SESSIONS]) | |
# assert mris == { | |
# "d:sessions/duration@second": [self.project.id], | |
# "c:sessions/session@none": [self.project.id], | |
# "s:sessions/user@none": [self.project.id], | |
# } | |
# | |
# mris = get_stored_metrics_of_projects([self.project], [UseCaseID.CUSTOM]) | |
# assert mris == { | |
# custom_mri: [self.project.id], | |
# } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
This PR implements a new fully parallelized implementation of fetching metrics meta. The need for such implementation arose for two reasons:
The new implementation parallelizes all queries for fetching data across entities and use case ids. It also maximizes parallelization when reverse resolving metric ids.
Closes: #66126