Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a geoshapes track #61

Merged
merged 10 commits into from Feb 26, 2019
Merged

Adds a geoshapes track #61

merged 10 commits into from Feb 26, 2019

Conversation

imotov
Copy link
Contributor

@imotov imotov commented Jan 2, 2019

Adds a track from OSM-derived geoshape data.

Closes #60

Adds a track from OSM-derived geoshape data.

Closes elastic#60
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo! I'm not familiar enough with Rally to do a proper review though so let's wait for Daniel to review. :)

@@ -0,0 +1,2 @@
documents.json.bz2
documents-1k.json.bz2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the latter still needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is needed in order to support test-mode.

@jpountz
Copy link
Contributor

jpountz commented Jan 2, 2019

@imotov I'm curious whether you already ran this benchmark with both legacy quad-tree based geoshapes and the new bkd-backed geo shapes?

@imotov
Copy link
Contributor Author

imotov commented Jan 2, 2019

@jpountz I tried to run it on 6.5.4 with default settings but it died after going through 3% of the data set. I suspect it probably ran out of memory but I don't know for sure.

The result for master can be found here. There are some failures that still need to be investigated. I am pretty sure the indexing issues are related to elastic/elasticsearch#26286 but search issues are somewhat puzzling. I am planning to look into it next.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I left a few comments. I also tried to run it against latest master of ES (revision 046f86f) with:

esrally --track=geoshape --on-error=abort

However, indexing failed immediately with:

[ERROR] Cannot race. Error in load generator [3]
	('Request returned an error. Error type: bulk, Description: HTTP status: 400, message: failed to parse field [shape] of type [geo_shape]', None)

The mapping it has created, looks fine to me:

{
  "osmgeoshapes" : {
    "mappings" : {
      "_doc" : {
        "dynamic" : "strict",
        "properties" : {
          "shape" : {
            "type" : "geo_shape"
          }
        }
      }
    }
  }
}

This is probably related to what you mention in #61 (comment).

I also checked your gist quickly and from the large variation in indexing throughput it seems to me the system is not properly warmed up. Thus I suggest you increase the warmup time-period.

geoshape/README.md Outdated Show resolved Hide resolved
* `bulk_size` (default: 5000)
* `bulk_indexing_clients` (default: 8): Number of clients that issue bulk indexing requests.
* `ingest_percentage` (default: 100): A number between 0 and 100 that defines how much of the document corpus should be ingested.
* `conflict_probability` (default: 25): A number between 0 and 100 that defines the probability of id conflicts. This requires to run the respective challenge.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we interested in update performance? If not I suggest to remove this parameter.

* `bulk_indexing_clients` (default: 8): Number of clients that issue bulk indexing requests.
* `ingest_percentage` (default: 100): A number between 0 and 100 that defines how much of the document corpus should be ingested.
* `conflict_probability` (default: 25): A number between 0 and 100 that defines the probability of id conflicts. This requires to run the respective challenge.
* `on_conflict` (default: "index"): Whether to use an "index" or an "update" action when simulating an id conflict.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we interested in update performance? If not I suggest to remove this parameter.

* `ingest_percentage` (default: 100): A number between 0 and 100 that defines how much of the document corpus should be ingested.
* `conflict_probability` (default: 25): A number between 0 and 100 that defines the probability of id conflicts. This requires to run the respective challenge.
* `on_conflict` (default: "index"): Whether to use an "index" or an "update" action when simulating an id conflict.
* `recency` (default: 0): A number between 0 and 1 that defines whether to bias towards more recent ids when simulating conflicts. See the [Rally docs](http://esrally.readthedocs.io/en/latest/track.html#bulk) for the full definition of this parameter. This requires to run the respective challenge.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we interested in update performance? If not I suggest to remove this parameter.

### Example Document

```json
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide the script that you have used to generate the corpus? Otherwise we are unable to update it in case this is necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. The script basically takes the file that @nknize provided in #60 (comment) and converts it into a single element JSON, so we are unlikely to run it ever again. But I will add it just in case.

]
},
{
"name": "append-no-conflicts-index-only",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this challenge. geopoints still has it but only for backwards compatibility reasons and we will also remove it there at some point.

]
},
{
"name": "append-fast-with-conflicts",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this challenge unless we are interested in benchmarking update performance with this specific corpus.

@@ -0,0 +1,2 @@
documents.json.bz2
documents-1k.json.bz2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is needed in order to support test-mode.

@@ -0,0 +1,19 @@
{
"settings": {
"index.number_of_shards": {{number_of_shards | default(5)}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you want to compare geoshape with geopointwe should probably go with the new default of Elasticsearch (which is one shard) for new tracks. Wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

"ingest-percentage": {{ingest_percentage | default(100)}}
},
{
"name": "index-update",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only necessary if we are interested in update benchmarks.

@jpountz
Copy link
Contributor

jpountz commented Jan 3, 2019

+1 to skip update benchmarks. Updates make some sense due to the fact that they trigger increased merging activity and that multidimensional points are slower than other Lucene data structures at merging when the number of dimensions is greater than 1. That said I could easily live without it.

@imotov
Copy link
Contributor Author

imotov commented Jan 3, 2019

I pushed some changes.

However, indexing failed immediately with...

I spent some time analyzing the failures. It looks like we have 3 types of failures there. As I mentioned before, some of them seems to be related to elastic/elasticsearch#26286. However, I also found some instances of elastic/elasticsearch#36883 and a few instances of Invalid polygon, interior cannot share more than one point with the exterior. For the last set of errors some cases seem to be pretty obvious but in others not as much. I am still spot checking the failures.

I also checked your gist quickly and from the large variation in indexing throughput it seems to me the system is not properly warmed up.

The test was running for several hours and I noticed significant slowdown towards the end of the test. So, I think it might be something else.

@danielmitterdorfer
Copy link
Member

I reran the benchmark once more on the same environment and let it continue when errors occur. This is the error rate (on a per document basis, not per bulk request) over time:

per-doc-error-rate

I also created a chart for the indexing throughput over time:

indexing-throughput

In both charts, the warmup is the grey part and the actual measurement phase is the green part. Leaving the errors aside, I think it's problematic that the benchmark never reaches a steady state. While at the beginning of the measurement phase we reach roughly 45500 docs/s, throughput declines over time and reaches 10900 docs/s at the end of the benchmark. I think it is dangerous to report a median throughput in that case because it suggests that a higher throughput can be reached over a longer period of time when in fact it is heavily dependent on how long we execute the benchmark. One thing I am wondering about is why we never reach steady state. Maybe @nknize has more thoughts on this?

@jpountz
Copy link
Contributor

jpountz commented Jan 4, 2019

why we never reach steady state

I suspect it's related to the fact that merging becomes increasingly costly as the index size grows with multi-dimensional points since the whole BKD tree needs to be rebuilt from scratch. Geo points and range fields have the issue as well, but geo shapes make it worse by using 4 indexed dimensions when ranges and geo points only use 2.

"clients": 1,
"warmup-iterations": 200,
"iterations": 100,
"target-throughput": 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empirical data that I have gathered on hardware that is similar to our nightly environment suggests that this throughput is too high. With service times around 2.5 seconds I suggest a target throughput of 0.3 (meaning that a query will be scheduled roughly every 3 seconds)

"clients": 1,
"warmup-iterations": 200,
"iterations": 100,
"target-throughput": 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empirical data that I have gathered on hardware that is similar to our nightly environment suggests that this throughput is too high. With service times around 3 to 3.5 seconds I suggest a target throughput of 0.25 (meaning that a query will be scheduled every 4 seconds)

@danielmitterdorfer
Copy link
Member

That sounds reasonable. If I understand you correctly this means we will never reach steady state though. I wonder how we can make the benchmark meaningful w.r.t. to throughput metrics.

FWIW the complete benchmark has finished by now and I did not see any errors for any of the queries (in Igor's benchmark results we see a positive error rate for polygon queries).

@imotov
Copy link
Contributor Author

imotov commented Jan 4, 2019

That sounds reasonable. If I understand you correctly this means we will never reach steady state though. I wonder how we can make the benchmark meaningful w.r.t. to throughput metrics.

Maybe we can create a new index every 5mln records or so? It is strange though, I was kind of expecting that limit on the maximum segment size should take care of that.

in Igor's benchmark results we see a positive error rate for polygon queries

Yeah, my machine was unresponsive at the end of the indexing run. I am not sure if it was thermal throttling or something else.

@jpountz
Copy link
Contributor

jpountz commented Jan 4, 2019

I was kind of expecting that limit on the maximum segment size should take care of that.

This is a good point, the index is much larger than 5GB in the end so merging activity should reach a steady state at some point.

@iverase
Copy link
Contributor

iverase commented Jan 7, 2019

Not sure if relevant but I would not expect a constant indexing throughput because of the nature of the data. It is not the same to index a document with a point (one triangle) than a document with a polygon with thousand of points (~thousand of triangles + tessellator overhead).

I think the batch size might be too big as some of the documents can be quite big (polygons with thousand of points). In my experience I had to limit the batch size to 2500.

We should expect 8751 documents to fail.

@imotov
Copy link
Contributor Author

imotov commented Jan 7, 2019

Not sure if relevant but I would not expect a constant indexing throughput because of the nature of the data.

Indeed, the test file starts with a whole bunch of LINESTRINGS followed by MULTILINES and then POLYGONS. However, I thought we randomize the records that we index. If I am mistaken, then this might be indeed an issue.

@imotov
Copy link
Contributor Author

imotov commented Jan 8, 2019

As per our discussion, I am going to split this PR into 3 separate tracks for LINESTRINGS, MULTILINES and POLYGONS and will remove all shapes that elasticsearch cannot process at the moment.

@danielmitterdorfer
Copy link
Member

As per our discussion, I am going to split this PR into 3 separate tracks for LINESTRINGS, MULTILINES and POLYGONS and will remove all shapes that elasticsearch cannot process at the moment.

Thanks for iterating @imotov. Just a suggestion: You can keep everything in one track but you could create three different corpora and three challenges (one for each corpus). That makes it a bit easier to maintain because you can keep everything in the same track. I'm happy to assist if you have more specific questions.

- splits shapes into 3 separate corpora
- removes failed polygons
- reduces default bulk size
@imotov
Copy link
Contributor Author

imotov commented Jan 12, 2019

I ran the new track without any errors over night and got interesting results. The first graph is the average throughput for the first two hours of the run and the second is the average throughput for the entire run:
screen shot 2019-01-12 at 7 31 23 am
screen shot 2019-01-12 at 7 30 35 am

@danielmitterdorfer
Copy link
Member

Thanks for rerunning it. Could this be explained by #61 (comment)?

merging becomes increasingly costly as the index size grows with multi-dimensional points since the whole BKD tree needs to be rebuilt from scratch.

Btw, I think it is great that we have isolated the three different types now and can see how they behave.

@imotov
Copy link
Contributor Author

imotov commented Jan 14, 2019

Could this be explained by #61 (comment)?

I am not sure. The index is much bigger than a maximum segment size. I don't understand the mechanism that prevents it from stabilizing. We shouldn't modify the older segments, so I am not sure what exactly kicks in that slows down the overall performance. The change is quite drastic though especially at the beginning where we index linestrings for 1 hour and the performance steadily drops from 24,000 to 8,000. That's 3x difference. At the end of the test our throughput is 2,000, which is 10x slower than when we started.

Btw, I think it is great that we have isolated the three different types now and can see how they behave.

@danielmitterdorfer I wonder if it would make sense to also index them in 3 different indices instead of a single index as I do at the moment. Any other ideas on how to make this test more meaningful? My main concern is that if settle on using this overall test running time as "a canary in a coal mine" whatever is causing the slowdown on the large index will overshadow whatever smaller regressions and improvements we will introduce until we figure out and fix the source of this slowdown.

@danielmitterdorfer
Copy link
Member

Could this be explained by #61 (comment)?

I am not sure.

You could try rerunning with a profiler (--telemetry=jfrto capture a flight recording) or maybe we need to add run a benchmark with more verbose logs enabled so we can see what is happening?

I wonder if it would make sense to also index them in 3 different indices instead of a single index as I do at the moment.

That would make sense to me, yes.

Switch to using different indices for different shape types.
@imotov
Copy link
Contributor Author

imotov commented Jan 15, 2019

Could this be explained by #61 (comment)?

I think the answer is actually yes. My assumption was wrong. With the final index size 71.6gb here are the 10 largest segments at the end of the linestrings run with total of 42 segments. So, it looks like there is still plenty of merging going on there. I was watching this test run for a while and while the biggest 2.7GB segment was formed quite earlier in the run the second and third showed up only closer to the end of it.

index               shard prirep ip        segment generation docs.count docs.deleted    size size.memory committed searchable version compound
osmlinestrings      0     p      127.0.0.1 _d4            472    6065818            0   2.7gb     3660671 true      true       8.0.0   false
osmlinestrings      0     p      127.0.0.1 _m7            799    5512526            0   2.5gb     2858274 true      true       8.0.0   false
osmlinestrings      0     p      127.0.0.1 _se           1022    4794982            0   2.3gb     2125728 true      true       8.0.0   false
osmlinestrings      0     p      127.0.0.1 _xx           1221      83458            0     1gb      370221 true      true       8.0.0   true
osmlinestrings      0     p      127.0.0.1 _xl           1209      85501            0     1gb      366481 true      true       8.0.0   true
osmlinestrings      0     p      127.0.0.1 _wq           1178     484087            0 944.3mb      526567 true      true       8.0.0   true
osmlinestrings      0     p      127.0.0.1 _x0           1188      69701            0 934.9mb      345104 true      true       8.0.0   true
osmlinestrings      0     p      127.0.0.1 _wi           1170     486499            0 532.5mb      398799 true      true       8.0.0   true
osmlinestrings      0     p      127.0.0.1 _w8           1160     309137            0 399.5mb      240792 true      true       8.0.0   true
osmlinestrings      0     p      127.0.0.1 _vo           1140     250744            0 348.5mb      248503 true      true       8.0.0   true

I also realized that I need to wait for the merge of linestrings to finish before I start indexing multilinestrings since it looks like it still continues for a while and interferes with multilinestrings indexing.

@danielmitterdorfer
Copy link
Member

@imotov is this PR superseded by #62 or do you intent to still work on this one?

@imotov
Copy link
Contributor Author

imotov commented Feb 14, 2019

s this PR superseded by #62 or do you intent to still work on this one?

@danielmitterdorfer I would say #62 is a parallel effort. Knowing how geoshapes work for points is very useful for the effort to consolidate geo_point and geo_shape data types but it is not sufficient as a geo_shape metric. We still need a full-blown geo_shape test with different geo_shape types.

I think I addressed your comments from this PR as much as I could. If you see any additional issues I can address, please let me know. The slowdown through the test is still present, but I think it's more a reflection of reality rather than a flaw in the test. Hopefully, @iverase's merge optimization will reduce this issue.

@danielmitterdorfer
Copy link
Member

Thanks for the update @imotov! I'll do a final pass very soon.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly fine but I left a few comments around removal of types and the bulk size.


This track allows to overwrite the following parameters with Rally 0.8.0+ using `--track-params`:

* `bulk_size` (default: 500)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This differs from the value in the track (100) + also I am not sure whether the same bulk size for all corpora is right. I wonder whether we need to introduce dedicated parameters per corpus?

"index.number_of_replicas": {{number_of_replicas | default(0)}}
},
"mappings": {
"_doc": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for 6.x but not on master anymore as types have been removed now.

{
"name": "index-append-linestrings",
"operation-type": "bulk",
"bulk-size": {{bulk_size | default(100)}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I've mentioned in my feedback for the README, I'm not sure whether we should have a different bulk size per corpus?

{
"name": "osmlinestrings",
"body": "index.json",
"types": ["_doc"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for types has been removed on master, so this works against 6.x but not against master.

"name": "linestrings",
"base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geoshape",
"target-index": "osmlinestrings",
"target-type": "_doc",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for types has been removed on master, so this works against 6.x but not against master.

Adds different bulk size settings for different 
shape types and removes doc type.
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating @imotov. LGTM. Feel free to merge at any time.

@imotov imotov merged commit 1952843 into elastic:master Feb 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants