-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add runner for paging through composite aggregations #1526
Conversation
@@ -732,7 +733,7 @@ def __repr__(self, *args, **kwargs): | |||
return "node-stats" | |||
|
|||
|
|||
def parse(text: BytesIO, props: List[str], lists: List[str] = None) -> dict: | |||
def parse(text: BytesIO, props: List[str], lists: List[str] = None, objects: List[str] = None) -> dict: | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding some information about the new objects
parameter to the docstring for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in 2f868ba
esrally/driver/runner.py
Outdated
|
||
paths_to_composite = paths_to_composite_agg(body, []) | ||
if not paths_to_composite or len(paths_to_composite) != 1: | ||
raise Exception("Unique path to composite agg required") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you raise exceptions.DataError
instead? We typically use that for cases where the provided parameters don't match expectations. This would also apply everywhere else exceptions are raised below within this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, fixed in 2f868ba
return results | ||
|
||
def select_aggs(obj): | ||
if isinstance(obj, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can refactor the isinstance
checks in this function (as well as in paths_to_composite_agg
, resolve_composite_agg
, and tree_copy_composite_agg
below.) As far as I can tell--and let me know if I'm missing something--everywhere isinstance
is called in these functions, the variable whose type is being inspected can only ever be a dict
or None
.
This is because body
is what's being passed through the chain of function calls, and it can only ever be a dict
. Along the way, we can also get intermediate None
return values. I think that covers all cases, unless I'm misreading.
If so, this function could look something like:
def select_aggs(obj):
if obj is not None:
return obj.get("aggs") or obj.get("aggregations")
else:
return None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's two types of functions here. The first ones are paths_to_composite_agg
(and the select_aggs
function that is being used within), which figures out whether the given user-defined request body contains a composite agg definition. The parsing here cannot assume for the request body to be well-formed, and is rather lenient in the parsing (i.e. double-checks that we have dicts in the right place) so that it can later throw a proper error stating that it couldn't find the composite agg. If we were to just assume that some of these definitions would be dicts, we would give a cryptic Python error message accessing a non-dict in the wrong way.
For the above I prefer the current parsing code.
There are two more functions, resolve_composite_agg
and tree_copy_composite_agg
, where we use as input whatever was derived via paths_to_composite_agg
, and hence we can assume that the right structure and types are in place. I made the code in those functions less defensive, as paths_to_composite_agg
has already checked the right shape of dicts.
I hope this addresses your concerns.
esrally/driver/runner.py
Outdated
return paths | ||
|
||
def resolve_composite_agg(obj, key_path): | ||
if isinstance(obj, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar refactoring as the above two examples would work here, too.
esrally/driver/runner.py
Outdated
raise Exception("Could not find composite agg - parser inconsistency") | ||
|
||
def tree_copy_composite_agg(obj, key_path): | ||
if isinstance(obj, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here, as well.
@ywelsch Thanks for putting so much effort into this! Very thorough. It looks good overall. I left some comments after a first pass, but I'm just about at the end of my day. I'll give it another look tomorrow and take a closer look at the tests in particular. |
Okay, I have no further suggestions. If you could address the comments I've left, we'll be good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Thanks @michaelbaamonde! |
Adds a "composite-agg" runner that allows paginating through composite aggregations. This allows benchmarking realistic use of composite aggs better (e.g. ML transforms) as users of composite aggs are typically paginating through all pages.
Relates elastic/elasticsearch#70035