Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEST] Fix slowness of AutoDateHistogramAggregatorTests #35072

Merged

Conversation

ekalgolas
Copy link
Contributor

Fixes #33181

Randomize test assertion and test set size instead of asserting on an
exhaustive list of dates with fixed test set size.

Also refactor common objects used to avoid recreating them, avoid date
to string conversion and reduce duplicate test code

Randomize test assertion and test set size instead of asserting on an
exhaustive list of dates with fixed test set size.

Also refactor common objects used to avoid recreating them, avoid date
to string conversion and reduce duplicate test code
@colings86 colings86 added >test Issues or PRs that are addressing/adding tests :Analytics/Aggregations Aggregations v7.0.0 v6.6.0 labels Oct 30, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@cbuescher
Copy link
Member

@ekalgolas thanks for opening the PR looks really great at first glance but since this is a larger change it will require some more detailed attention.
Let me run some initial CI checks.
@elasticmachien test this please

@cbuescher
Copy link
Member

cbuescher commented Nov 2, 2018

@ekalgolas there are a couple of checkstyle errors coming from AutoDateHistogramAggregatorTests (see https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request/1581/console). Could you fix those so I can re-run the tests? In general, if you run ./gradlew precommit you can catch those problems locally without having to run the whole test suite (you can even reduce that to the server sub-project with ./gradlew -p server precommit

@cbuescher cbuescher self-assigned this Nov 2, 2018
@ekalgolas
Copy link
Contributor Author

Sure, thanks for the info. Will upload a patch

Randomize test assertion and test set size instead of asserting on an
exhaustive list of dates with fixed test set size.

Also refactor common objects used to avoid recreating them, avoid date
to string conversion and reduce duplicate test code
…ests' into fix-AutoDateHistogramAggregatorTests

# Conflicts:
#	server/src/test/java/org/elasticsearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregatorTests.java
@ekalgolas
Copy link
Contributor Author

Fixed now

@cbuescher
Copy link
Member

@elasticmachine test this please

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekalgolas great change, I quickly compared the runtime of the test on my local machine and where it took ~25sec on average before it is now down to 10sec. Sorry it took a bit, but the change set is quite large due to the many formatting changes, but I think I got the gist of it now.

I left a few short questions and will kick of running the tests in the meantime.

histogram -> {
final List<? extends Histogram.Bucket> buckets = histogram.getBuckets();
final int expectedDocCount = bucketsToExpectedDocCountMap.get(randomNumberOfBuckets);
final int expectedSizeCeil = (seed + expectedDoc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did this test move to? I'm not sure but I think it has a readon to be here and I cannot find its replacement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this test as it felt kind of redundant. In this test case we first set the buckets so that data point is a key with doc count as 1, which is tested in testRandomHourInterval and testIntervalHour. Next, this sets the number of buckets to 6, a number such that keys are now 3 hour intervals, which is again tested by testRandomHourInterval. I did not see a special case being tested cases which the existing hour test cases could not test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that part of this seems to be covered by the other randomized test, the case here seems to be somehow more carfuly crafted so I assumed it might have a special purpose. Maybe @colings86 who authored the initial commit of this test can elaborate more on this? If we want to keep it we should probably add a short note about the special purpose of this test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on removing this test.

@cbuescher
Copy link
Member

@elasticmachine test this please

@ekalgolas
Copy link
Contributor Author

Also I see the build has failed due to "> Task :distribution:bwc:next-bugfix-snapshot:buildBwcVersion FAILED" which might be due to #28653

…ests' into fix-AutoDateHistogramAggregatorTests

# Conflicts:
#	server/src/test/java/org/elasticsearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregatorTests.java
@ekalgolas
Copy link
Contributor Author

Uploaded changes for renaming seed and undoing the conditional operator

Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ekalgolas, thanks for the update. I think the PR looks good in general but I'm wondering if we can simplify the new randomized tests a bit by making the dataset size fixed to avoid a few of the more complicated calculations now needed in the test. I left a comment along those lines, would be nice if you could try this on one method and if this still keeps runtime down apply this to all the other methods as well.

histogram -> {
final List<? extends Histogram.Bucket> buckets = histogram.getBuckets();
final int expectedDocCount = bucketsToExpectedDocCountMap.get(randomNumberOfBuckets);
final int expectedSizeCeil = (seed + expectedDoc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that part of this seems to be covered by the other randomized test, the case here seems to be somehow more carfuly crafted so I assumed it might have a special purpose. Maybe @colings86 who authored the initial commit of this test can elaborate more on this? If we want to keep it we should probably add a short note about the special purpose of this test.

Copy link
Contributor

@pcsanwald pcsanwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this code vs master; cut down test run time from 30-40s to about 10s (local machine, small sample size, etc). Nice work! There's probably some further work we can do to reduce times, but we can tweak the randomization parameters to find a balance between test coverage and execution time.

histogram -> {
final List<? extends Histogram.Bucket> buckets = histogram.getBuckets();
final int expectedDocCount = bucketsToExpectedDocCountMap.get(randomNumberOfBuckets);
final int expectedSizeCeil = (seed + expectedDoc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on removing this test.

…ests' into fix-AutoDateHistogramAggregatorTests

# Conflicts:
#	server/src/test/java/org/elasticsearch/search/aggregations/bucket/histogram/AutoDateHistogramAggregatorTests.java
Copy link
Member

@cbuescher cbuescher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ekalgolas Thanks for the last update, LGTM now. I will kick of the CI tests and merge after we get a successful run.

@cbuescher
Copy link
Member

@elasticmachine test this please

@cbuescher
Copy link
Member

@ekalgolas there were some unused imports creating issues with our "checkstyle" tests. I removed them with my last commit. I hope you don't mind me pushing to your branch to speed up getting a clean CI built.
@elasticmachine test this please

@cbuescher
Copy link
Member

@elasticmachine test this please

@cbuescher cbuescher merged commit 450db7f into elastic:master Nov 19, 2018
@cbuescher
Copy link
Member

cbuescher commented Nov 19, 2018

@ekalgolas thanks a lot for working on this, I have merged to master and will also backport to 6.x. Great speed improvement for this test indeed!

cbuescher pushed a commit that referenced this pull request Nov 19, 2018
Randomize test assertion and test set size instead of asserting on an
exhaustive list of dates with fixed test set size. Also refactor common
objects used to avoid recreating them, avoid date to string conversion
and reduce duplicate test code

Closes #33181
@ekalgolas ekalgolas deleted the fix-AutoDateHistogramAggregatorTests branch November 19, 2018 21:33
@ekalgolas
Copy link
Contributor Author

Thanks for removing the unused imports @cbuescher

@pcsanwald
Copy link
Contributor

@ekalgolas thank you for your contribution here, much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >test Issues or PRs that are addressing/adding tests v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AutoDateHistogramAggregatorTests is slow
5 participants