New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add a track for Metricbeat data #56

Merged

pcsanwald merged 13 commits into master from add-metricbeat

Mar 18, 2019

Contributor

pcsanwald commented Dec 12, 2018

This PR creates a track for metricbeat data, which I'd like to use to benchmark the autohisto aggregation. I've marked this "WIP (work in progress)" because I need to generate a realistic dataset, but, I'd like any feedback on what I've done so far also, as this is the first rally track I've created.

I hope that metricbeat data will be broadly useful, and not just useful for my use case, per the discussion in #19. My idea for generating a dataset is to have one with two metricbeat modules running, that collect beats at different intervals, perhaps an elasticsearch module running hourly, and a system module running on the minute. We'd then run the autohisto aggregation on a field that's only present in elasticsearch (node_stats maybe), across a time range and bucket number combination that would result in minute level buckets.

Paul Sanwald added 2 commits

December 10, 2018 13:34


          WIP commit for metricbeat stuffs

a8d1c9e


          update default tracks

9d6c6da

pcsanwald added the WIP label

pcsanwald requested review from danielmitterdorfer and dliappis

December 12, 2018 00:54

danielmitterdorfer reviewed

View reviewed changes

Member

danielmitterdorfer left a comment

Thanks for the PR. I did a first pass and left a couple of comments / suggestions.

metricbeat/challenges/default.json

+                      },
+                      {
+                        "operation": "index-append",
+                        "warmup-time-period": 0,

Member

danielmitterdorfer Dec 12, 2018

I guess bulk-indexing the document corpus is considered more of a setup task here (i.e. you're not interested in bulk-indexing throughput)? In that case setting the warmup time-period to zero is fine.

metricbeat/challenges/default.json Outdated

+                        "clients": 1
+                      },
+                      {
+                        "operation": "index-stats",

Member

danielmitterdorfer Dec 12, 2018

I don't think we need to benchmark index stats here?

metricbeat/challenges/default.json Outdated

+                        "target-throughput": 100
+                      },
+                      {
+                        "operation": "node-stats",

Member

danielmitterdorfer Dec 12, 2018

I don't think we need to benchmark node stats here?

metricbeat/challenges/default.json Outdated

+                    ]
+                  },
+                  {
+                    "name": "append-no-conflicts-index-only",

Member

danielmitterdorfer Dec 12, 2018

I would not include any of the other challenges unless you are interested in those metrics.

metricbeat/challenges/default.json Outdated

+                        "target-throughput": 100
+                      },
+                      {
+                        "operation": "default",

Member

danielmitterdorfer Dec 12, 2018

Is it intentional that the aggregation queries are not referenced here?

metricbeat/files.txt Outdated

		@@ -0,0 +1 @@
		documents.json

Member

danielmitterdorfer Dec 12, 2018

This file is needed for the script that prepares the corpus for offline usage (see https://github.com/elastic/rally-tracks/blob/master/download.sh). I think when you're done this should only contain the compressed corpus file (+ a smaller file that contains the test corpus, see our docs).

Member

danielmitterdorfer Mar 6, 2019

I think this should contain:

documents.json.bz2
documents-1k.json.bz2

For details please see my comment above.

metricbeat/index.json Outdated Show resolved Hide resolved

metricbeat/track.json Outdated Show resolved Hide resolved

metricbeat/track.json Outdated Show resolved Hide resolved

Paul Sanwald added 5 commits

March 4, 2019 09:00


          Merge branch 'master' into add-metricbeat

5d112a3


          remove unnecessary benchmarks

aa24271


          update index to use latest metricbeat

af261d7


          address code review comments

04a1f75


          update challenges to use autohisto stuff

95d322f

pcsanwald removed the WIP label

Contributor Author

pcsanwald commented Mar 4, 2019

@danielmitterdorfer I think I've addressed most of your comments, this is ready for another review I think.

Note that I have gotten an actual slug of metricbeat data from infra-stats, so, when we are ready to merge this, I can upload that to S3 and update the config to account for the zipped version, etc.

pcsanwald requested a review from danielmitterdorfer

March 4, 2019 22:00

danielmitterdorfer reviewed

View reviewed changes

Member

danielmitterdorfer left a comment

Thanks for iterating @pcsanwald! I had another look and it looks mostly fine to me. The only remaining things to do are IMHO:

Upload the compressed version of the corpus to the S3 bucket + create a smaller test corpus (see Rally docs)
Update the mapping to 7.0.0 to avoid deprecation warnings (see my comments inline)
Add a README file similar to the other tracks (see geonames for an example). The most important part is IMHO that we choose a license and I suggest to default to Apache 2.

Once this is done I think we should run the track on our nightly benchmarking hardware to see whether the chosen target throughput for the aggregations is fine (hard to tell upfront)

metricbeat/files.txt Outdated

		@@ -0,0 +1 @@
		documents.json

Member

danielmitterdorfer Mar 6, 2019

I think this should contain:

documents.json.bz2
documents-1k.json.bz2

For details please see my comment above.

metricbeat/operations/default.json Outdated

+                    "ingest-percentage": {{ingest_percentage | default(100)}}
+                  },
+                  {
+                    "name": "default",

Member

danielmitterdorfer Mar 6, 2019

Are you interested in measuring performance of match_all? If not I suggest to remove it from the track.

metricbeat/index.json Outdated

    
                  "index.search.slowlog.threshold.fetch.debug" : "0s",

                  "index.search.slowlog.threshold.query.debug" : "0s"

                },

                "mappings": {

Member

danielmitterdorfer Mar 6, 2019

Meanwhile types have been removed on master so this will issue deprecation warnings. I have extracted the Metricbeat mapping from 7.0.0-beta1 in https://gist.github.com/danielmitterdorfer/72686146100dea7b775f0efdc80bd2f1. Can you please use that mapping instead so we avoid the deprecation warnings?

metricbeat/track.json Outdated

+                  {
+                    "name": "metricbeat",
+                    "body": "index.json",
+                    "types": ["_doc"]

Member

danielmitterdorfer Mar 6, 2019

As types have been removed on master you can just remove this property (the most recent version of Rally will allow you to do this).

Paul Sanwald added 3 commits

March 8, 2019 10:06


          remove match_all from challenge, was just using for testing

ecfb545


          compress files and add one for test mode

21ff318


          use new mappings for metricbeat

2bbb86e

pcsanwald requested a review from danielmitterdorfer

March 8, 2019 21:36

Contributor Author

pcsanwald commented Mar 8, 2019

certainly no urgency on this, but I think I've addressed review comments and this should be ready for another review.

danielmitterdorfer reviewed

View reviewed changes

Member

danielmitterdorfer left a comment

Thanks for creating this and iterating with me on it. The track looks good to me but I'd like to execute it at least once to ensure I did not miss anything. Can you please upload the data files to the S3 bucket @pcsanwald? If you need help with this please ping me.

Contributor Author

pcsanwald commented Mar 18, 2019

I've pinged daniel and we are uploading the track data

danielmitterdorfer reviewed

View reviewed changes

metricbeat/track.json

+                ],
+                "corpora": [
+                  {
+                    "name": "metricbeat",

Member

danielmitterdorfer Mar 18, 2019

can you please add the following property:

"base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/metricbeat"

Contributor Author

pcsanwald Mar 18, 2019

I pushed b9326

danielmitterdorfer reviewed

View reviewed changes

metricbeat/track.json Outdated

+                    "name": "metricbeat",
+                    "documents": [
+                      {
+                        "source-file": "documents.json",

Member

danielmitterdorfer Mar 18, 2019

This should be documents.json.bz2 so it is using the compressed version.

Member

danielmitterdorfer commented Mar 18, 2019

The track data are now uploaded to S3 and I tested it locally. I left two more comments (see above). Can you please address them @pcsanwald? Then I think we're good to merge.

Paul Sanwald added 3 commits

March 18, 2019 11:11


          add base url

ee5245c


          add base url

cc7b712


          add compressed version of file

b93264f

Contributor Author

pcsanwald commented Mar 18, 2019

@danielmitterdorfer thanks for the review! I believe I have addressed both your comments.

danielmitterdorfer approved these changes

View reviewed changes

Member

danielmitterdorfer left a comment

Thanks for your PR and iterating with me @pcsanwald! LGTM - Feel free to merge at any time.

pcsanwald merged commit c3a3ec9 into master

pcsanwald deleted the add-metricbeat branch

March 18, 2019 15:33

dliappis mentioned this pull request

Fix the metricbeat track #164

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment