Skip to content

feat(perf): Calculate min and max for span histograms#32836

Merged
udameli merged 8 commits into
masterfrom
du/feat/calculate-mix-max-for-span-histograms
Mar 25, 2022
Merged

feat(perf): Calculate min and max for span histograms#32836
udameli merged 8 commits into
masterfrom
du/feat/calculate-mix-max-for-span-histograms

Conversation

@udameli

@udameli udameli commented Mar 22, 2022

Copy link
Copy Markdown
Contributor

This PR calculates min and max exclusive time duration for a given span and sets them as histogram params. I've also added more tests.

Fixes PERF-1487

Comment thread src/sentry/search/events/datasets/discover.py
[
Lambda(
[
"x",

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but is x required if we're not using it in the lambda?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was mostly because arrayFilter returns the first column, so exclusive_time has to be selected as x here

Comment on lines -29 to -34
if "start_timestamp" not in kwargs:
kwargs["start_timestamp"] = self.min_ago

if "timestamp" not in kwargs:
kwargs["timestamp"] = self.min_ago + timedelta(seconds=8)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove these? Were they not actually being used? If they were used we should make sure to keep dates consistent to avoid flake, if there weren't, ignore me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this was used anywhere 🤔

assert response.status_code == 200
for bucket in response.data:
assert bucket["count"] == 0
assert response.data[0] == {"bin": min, "count": 0}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also be curious to test what max is here, basically asserting what max became when the param is undefined

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, so when min < queried max then only one bucket for that min is returned. this is consistent with how other histograms behave I believe, so i'm making an assertion here that there's only one element in the returned data

If left unspecified, it is queried using `user_query` and `params`.
:param str user_query: Filter query string to create conditions from.
:param {str: str} params: Filtering parameters with start, end, project_id, environment
:param str data_filter: Indicate the filter strategy to be applied to the data.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't using data_filter.. yet?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet, I was gonna get more context on it before implementing it. I wonder under what circumstances do we want to exclude outliers for spans histogram. Doesn't it make sense to just show all data?

return HistogramParams(num_buckets, bucket_size, start_offset, multiplier)


def find_span_histogram_min_max(span, min_value, max_value, user_query, params, data_filter=None):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be merging this with the original? or decompose the original into functions which they can both compose from? It feels like a lot of this code is almost identical minus the columns/span info and lack of data filter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I've noticed it too. I can follow up with a clean up

@udameli udameli merged commit 427b8df into master Mar 25, 2022
@udameli udameli deleted the du/feat/calculate-mix-max-for-span-histograms branch March 25, 2022 15:02
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants