feat(perf): Calculate min and max for span histograms#32836
Conversation
| [ | ||
| Lambda( | ||
| [ | ||
| "x", |
There was a problem hiding this comment.
Not sure, but is x required if we're not using it in the lambda?
There was a problem hiding this comment.
it was mostly because arrayFilter returns the first column, so exclusive_time has to be selected as x here
| if "start_timestamp" not in kwargs: | ||
| kwargs["start_timestamp"] = self.min_ago | ||
|
|
||
| if "timestamp" not in kwargs: | ||
| kwargs["timestamp"] = self.min_ago + timedelta(seconds=8) | ||
|
|
There was a problem hiding this comment.
Why did you remove these? Were they not actually being used? If they were used we should make sure to keep dates consistent to avoid flake, if there weren't, ignore me.
There was a problem hiding this comment.
I don't think this was used anywhere 🤔
| assert response.status_code == 200 | ||
| for bucket in response.data: | ||
| assert bucket["count"] == 0 | ||
| assert response.data[0] == {"bin": min, "count": 0} |
There was a problem hiding this comment.
I'd also be curious to test what max is here, basically asserting what max became when the param is undefined
There was a problem hiding this comment.
interesting, so when min < queried max then only one bucket for that min is returned. this is consistent with how other histograms behave I believe, so i'm making an assertion here that there's only one element in the returned data
| If left unspecified, it is queried using `user_query` and `params`. | ||
| :param str user_query: Filter query string to create conditions from. | ||
| :param {str: str} params: Filtering parameters with start, end, project_id, environment | ||
| :param str data_filter: Indicate the filter strategy to be applied to the data. |
There was a problem hiding this comment.
You aren't using data_filter.. yet?
There was a problem hiding this comment.
not yet, I was gonna get more context on it before implementing it. I wonder under what circumstances do we want to exclude outliers for spans histogram. Doesn't it make sense to just show all data?
| return HistogramParams(num_buckets, bucket_size, start_offset, multiplier) | ||
|
|
||
|
|
||
| def find_span_histogram_min_max(span, min_value, max_value, user_query, params, data_filter=None): |
There was a problem hiding this comment.
Should we be merging this with the original? or decompose the original into functions which they can both compose from? It feels like a lot of this code is almost identical minus the columns/span info and lack of data filter?
There was a problem hiding this comment.
yeah I've noticed it too. I can follow up with a clean up
This PR calculates min and max exclusive time duration for a given span and sets them as histogram params. I've also added more tests.
Fixes PERF-1487