Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add runner for paging through composite aggregations #1526

Merged
merged 9 commits into from
Jun 27, 2022

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Jun 22, 2022

Adds a "composite-agg" runner that allows paginating through composite aggregations. This allows benchmarking realistic use of composite aggs better (e.g. ML transforms) as users of composite aggs are typically paginating through all pages.

Relates elastic/elasticsearch#70035

@dliappis dliappis added enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like labels Jun 22, 2022
@@ -732,7 +733,7 @@ def __repr__(self, *args, **kwargs):
return "node-stats"


def parse(text: BytesIO, props: List[str], lists: List[str] = None) -> dict:
def parse(text: BytesIO, props: List[str], lists: List[str] = None, objects: List[str] = None) -> dict:
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding some information about the new objects parameter to the docstring for this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 2f868ba


paths_to_composite = paths_to_composite_agg(body, [])
if not paths_to_composite or len(paths_to_composite) != 1:
raise Exception("Unique path to composite agg required")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you raise exceptions.DataError instead? We typically use that for cases where the provided parameters don't match expectations. This would also apply everywhere else exceptions are raised below within this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, fixed in 2f868ba

return results

def select_aggs(obj):
if isinstance(obj, dict):
Copy link
Contributor

@michaelbaamonde michaelbaamonde Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can refactor the isinstance checks in this function (as well as in paths_to_composite_agg, resolve_composite_agg, and tree_copy_composite_agg below.) As far as I can tell--and let me know if I'm missing something--everywhere isinstance is called in these functions, the variable whose type is being inspected can only ever be a dict or None.

This is because body is what's being passed through the chain of function calls, and it can only ever be a dict. Along the way, we can also get intermediate None return values. I think that covers all cases, unless I'm misreading.

If so, this function could look something like:

def select_aggs(obj):
    if obj is not None:
        return obj.get("aggs") or obj.get("aggregations")
    else:
        return None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two types of functions here. The first ones are paths_to_composite_agg (and the select_aggs function that is being used within), which figures out whether the given user-defined request body contains a composite agg definition. The parsing here cannot assume for the request body to be well-formed, and is rather lenient in the parsing (i.e. double-checks that we have dicts in the right place) so that it can later throw a proper error stating that it couldn't find the composite agg. If we were to just assume that some of these definitions would be dicts, we would give a cryptic Python error message accessing a non-dict in the wrong way.

For the above I prefer the current parsing code.

There are two more functions, resolve_composite_agg and tree_copy_composite_agg, where we use as input whatever was derived via paths_to_composite_agg, and hence we can assume that the right structure and types are in place. I made the code in those functions less defensive, as paths_to_composite_agg has already checked the right shape of dicts.

I hope this addresses your concerns.

esrally/driver/runner.py Show resolved Hide resolved
return paths

def resolve_composite_agg(obj, key_path):
if isinstance(obj, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar refactoring as the above two examples would work here, too.

raise Exception("Could not find composite agg - parser inconsistency")

def tree_copy_composite_agg(obj, key_path):
if isinstance(obj, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here, as well.

@michaelbaamonde
Copy link
Contributor

@ywelsch Thanks for putting so much effort into this! Very thorough. It looks good overall. I left some comments after a first pass, but I'm just about at the end of my day. I'll give it another look tomorrow and take a closer look at the tests in particular.

@michaelbaamonde
Copy link
Contributor

Okay, I have no further suggestions. If you could address the comments I've left, we'll be good to go.

Copy link
Contributor

@michaelbaamonde michaelbaamonde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@ywelsch ywelsch merged commit b7159de into elastic:master Jun 27, 2022
@ywelsch
Copy link
Contributor Author

ywelsch commented Jun 27, 2022

Thanks @michaelbaamonde!

@pquentin pquentin added this to the 2.5.1 milestone Jul 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Track Management New operations, changes in the track format, track download changes and the like
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants