Adds a geoshapes track #61

imotov · 2019-01-02T14:56:25Z

Adds a track from OSM-derived geoshape data.

Closes #60

Adds a track from OSM-derived geoshape data. Closes elastic#60

jpountz

Woohoo! I'm not familiar enough with Rally to do a proper review though so let's wait for Daniel to review. :)

jpountz · 2019-01-02T15:37:21Z

geoshape/files.txt

@@ -0,0 +1,2 @@
+documents.json.bz2
+documents-1k.json.bz2


is the latter still needed?

Yes, it is needed in order to support test-mode.

jpountz · 2019-01-02T15:59:43Z

@imotov I'm curious whether you already ran this benchmark with both legacy quad-tree based geoshapes and the new bkd-backed geo shapes?

imotov · 2019-01-02T18:54:35Z

@jpountz I tried to run it on 6.5.4 with default settings but it died after going through 3% of the data set. I suspect it probably ran out of memory but I don't know for sure.

The result for master can be found here. There are some failures that still need to be investigated. I am pretty sure the indexing issues are related to elastic/elasticsearch#26286 but search issues are somewhat puzzling. I am planning to look into it next.

danielmitterdorfer

Thanks for the PR. I left a few comments. I also tried to run it against latest master of ES (revision 046f86f) with:

esrally --track=geoshape --on-error=abort

However, indexing failed immediately with:

[ERROR] Cannot race. Error in load generator [3]
	('Request returned an error. Error type: bulk, Description: HTTP status: 400, message: failed to parse field [shape] of type [geo_shape]', None)

The mapping it has created, looks fine to me:

{
  "osmgeoshapes" : {
    "mappings" : {
      "_doc" : {
        "dynamic" : "strict",
        "properties" : {
          "shape" : {
            "type" : "geo_shape"
          }
        }
      }
    }
  }
}

This is probably related to what you mention in #61 (comment).

I also checked your gist quickly and from the large variation in indexing throughput it seems to me the system is not properly warmed up. Thus I suggest you increase the warmup time-period.

geoshape/README.md

danielmitterdorfer · 2019-01-03T06:52:08Z

geoshape/README.md

+* `bulk_size` (default: 5000)
+* `bulk_indexing_clients` (default: 8): Number of clients that issue bulk indexing requests.
+* `ingest_percentage` (default: 100): A number between 0 and 100 that defines how much of the document corpus should be ingested.
+* `conflict_probability` (default: 25): A number between 0 and 100 that defines the probability of id conflicts. This requires to run the respective challenge.


Are we interested in update performance? If not I suggest to remove this parameter.

danielmitterdorfer · 2019-01-03T06:52:12Z

geoshape/README.md

+* `bulk_indexing_clients` (default: 8): Number of clients that issue bulk indexing requests.
+* `ingest_percentage` (default: 100): A number between 0 and 100 that defines how much of the document corpus should be ingested.
+* `conflict_probability` (default: 25): A number between 0 and 100 that defines the probability of id conflicts. This requires to run the respective challenge.
+* `on_conflict` (default: "index"): Whether to use an "index" or an "update" action when simulating an id conflict.


Are we interested in update performance? If not I suggest to remove this parameter.

danielmitterdorfer · 2019-01-03T06:52:16Z

geoshape/README.md

+* `ingest_percentage` (default: 100): A number between 0 and 100 that defines how much of the document corpus should be ingested.
+* `conflict_probability` (default: 25): A number between 0 and 100 that defines the probability of id conflicts. This requires to run the respective challenge.
+* `on_conflict` (default: "index"): Whether to use an "index" or an "update" action when simulating an id conflict.
+* `recency` (default: 0): A number between 0 and 1 that defines whether to bias towards more recent ids when simulating conflicts. See the [Rally docs](http://esrally.readthedocs.io/en/latest/track.html#bulk) for the full definition of this parameter. This requires to run the respective challenge.


Are we interested in update performance? If not I suggest to remove this parameter.

danielmitterdorfer · 2019-01-03T06:53:19Z

geoshape/README.md

+### Example Document
+
+```json
+{


Can you please provide the script that you have used to generate the corpus? Otherwise we are unable to update it in case this is necessary.

Sure. The script basically takes the file that @nknize provided in #60 (comment) and converts it into a single element JSON, so we are unlikely to run it ever again. But I will add it just in case.

danielmitterdorfer · 2019-01-03T06:55:26Z

geoshape/challenges/default.json

+      ]
+    },
+    {
+      "name": "append-no-conflicts-index-only",


I think we should remove this challenge. geopoints still has it but only for backwards compatibility reasons and we will also remove it there at some point.

danielmitterdorfer · 2019-01-03T06:55:51Z

geoshape/challenges/default.json

+      ]
+    },
+    {
+      "name": "append-fast-with-conflicts",


I think we should remove this challenge unless we are interested in benchmarking update performance with this specific corpus.

danielmitterdorfer · 2019-01-03T06:56:07Z

geoshape/files.txt

@@ -0,0 +1,2 @@
+documents.json.bz2
+documents-1k.json.bz2


Yes, it is needed in order to support test-mode.

danielmitterdorfer · 2019-01-03T06:56:46Z

geoshape/index.json

@@ -0,0 +1,19 @@
+{
+  "settings": {
+    "index.number_of_shards": {{number_of_shards | default(5)}},


Unless you want to compare geoshape with geopointwe should probably go with the new default of Elasticsearch (which is one shard) for new tracks. Wdyt?

Makes sense.

danielmitterdorfer · 2019-01-03T06:58:33Z

geoshape/operations/default.json

+      "ingest-percentage": {{ingest_percentage | default(100)}}
+    },
+    {
+      "name": "index-update",


This is only necessary if we are interested in update benchmarks.

jpountz · 2019-01-03T10:49:18Z

+1 to skip update benchmarks. Updates make some sense due to the fact that they trigger increased merging activity and that multidimensional points are slower than other Lucene data structures at merging when the number of dimensions is greater than 1. That said I could easily live without it.

Co-Authored-By: imotov <igor@motovs.org>

imotov · 2019-01-03T20:30:50Z

I pushed some changes.

However, indexing failed immediately with...

I spent some time analyzing the failures. It looks like we have 3 types of failures there. As I mentioned before, some of them seems to be related to elastic/elasticsearch#26286. However, I also found some instances of elastic/elasticsearch#36883 and a few instances of Invalid polygon, interior cannot share more than one point with the exterior. For the last set of errors some cases seem to be pretty obvious but in others not as much. I am still spot checking the failures.

I also checked your gist quickly and from the large variation in indexing throughput it seems to me the system is not properly warmed up.

The test was running for several hours and I noticed significant slowdown towards the end of the test. So, I think it might be something else.

danielmitterdorfer · 2019-01-04T10:12:53Z

I reran the benchmark once more on the same environment and let it continue when errors occur. This is the error rate (on a per document basis, not per bulk request) over time:

I also created a chart for the indexing throughput over time:

In both charts, the warmup is the grey part and the actual measurement phase is the green part. Leaving the errors aside, I think it's problematic that the benchmark never reaches a steady state. While at the beginning of the measurement phase we reach roughly 45500 docs/s, throughput declines over time and reaches 10900 docs/s at the end of the benchmark. I think it is dangerous to report a median throughput in that case because it suggests that a higher throughput can be reached over a longer period of time when in fact it is heavily dependent on how long we execute the benchmark. One thing I am wondering about is why we never reach steady state. Maybe @nknize has more thoughts on this?

jpountz · 2019-01-04T11:38:59Z

why we never reach steady state

I suspect it's related to the fact that merging becomes increasingly costly as the index size grows with multi-dimensional points since the whole BKD tree needs to be rebuilt from scratch. Geo points and range fields have the issue as well, but geo shapes make it worse by using 4 indexed dimensions when ranges and geo points only use 2.

danielmitterdorfer · 2019-01-04T12:33:37Z

geoshape/challenges/default.json

+          "clients": 1,
+          "warmup-iterations": 200,
+          "iterations": 100,
+          "target-throughput": 2


The empirical data that I have gathered on hardware that is similar to our nightly environment suggests that this throughput is too high. With service times around 2.5 seconds I suggest a target throughput of 0.3 (meaning that a query will be scheduled roughly every 3 seconds)

danielmitterdorfer · 2019-01-04T12:34:35Z

geoshape/challenges/default.json

+          "clients": 1,
+          "warmup-iterations": 200,
+          "iterations": 100,
+          "target-throughput": 2


The empirical data that I have gathered on hardware that is similar to our nightly environment suggests that this throughput is too high. With service times around 3 to 3.5 seconds I suggest a target throughput of 0.25 (meaning that a query will be scheduled every 4 seconds)

danielmitterdorfer · 2019-01-04T12:44:47Z

That sounds reasonable. If I understand you correctly this means we will never reach steady state though. I wonder how we can make the benchmark meaningful w.r.t. to throughput metrics.

FWIW the complete benchmark has finished by now and I did not see any errors for any of the queries (in Igor's benchmark results we see a positive error rate for polygon queries).

imotov · 2019-01-04T14:44:06Z

That sounds reasonable. If I understand you correctly this means we will never reach steady state though. I wonder how we can make the benchmark meaningful w.r.t. to throughput metrics.

Maybe we can create a new index every 5mln records or so? It is strange though, I was kind of expecting that limit on the maximum segment size should take care of that.

in Igor's benchmark results we see a positive error rate for polygon queries

Yeah, my machine was unresponsive at the end of the indexing run. I am not sure if it was thermal throttling or something else.

jpountz · 2019-01-04T15:06:10Z

I was kind of expecting that limit on the maximum segment size should take care of that.

This is a good point, the index is much larger than 5GB in the end so merging activity should reach a steady state at some point.

iverase · 2019-01-07T08:33:22Z

Not sure if relevant but I would not expect a constant indexing throughput because of the nature of the data. It is not the same to index a document with a point (one triangle) than a document with a polygon with thousand of points (~thousand of triangles + tessellator overhead).

I think the batch size might be too big as some of the documents can be quite big (polygons with thousand of points). In my experience I had to limit the batch size to 2500.

We should expect 8751 documents to fail.

imotov · 2019-01-07T15:44:32Z

Not sure if relevant but I would not expect a constant indexing throughput because of the nature of the data.

Indeed, the test file starts with a whole bunch of LINESTRINGS followed by MULTILINES and then POLYGONS. However, I thought we randomize the records that we index. If I am mistaken, then this might be indeed an issue.

imotov · 2019-01-08T16:48:36Z

As per our discussion, I am going to split this PR into 3 separate tracks for LINESTRINGS, MULTILINES and POLYGONS and will remove all shapes that elasticsearch cannot process at the moment.

danielmitterdorfer · 2019-01-09T08:50:41Z

As per our discussion, I am going to split this PR into 3 separate tracks for LINESTRINGS, MULTILINES and POLYGONS and will remove all shapes that elasticsearch cannot process at the moment.

Thanks for iterating @imotov. Just a suggestion: You can keep everything in one track but you could create three different corpora and three challenges (one for each corpus). That makes it a bit easier to maintain because you can keep everything in the same track. I'm happy to assist if you have more specific questions.

- splits shapes into 3 separate corpora - removes failed polygons - reduces default bulk size

imotov · 2019-01-12T12:43:45Z

I ran the new track without any errors over night and got interesting results. The first graph is the average throughput for the first two hours of the run and the second is the average throughput for the entire run:

danielmitterdorfer · 2019-01-14T08:59:45Z

Thanks for rerunning it. Could this be explained by #61 (comment)?

merging becomes increasingly costly as the index size grows with multi-dimensional points since the whole BKD tree needs to be rebuilt from scratch.

Btw, I think it is great that we have isolated the three different types now and can see how they behave.

imotov · 2019-01-14T15:00:23Z

Could this be explained by #61 (comment)?

I am not sure. The index is much bigger than a maximum segment size. I don't understand the mechanism that prevents it from stabilizing. We shouldn't modify the older segments, so I am not sure what exactly kicks in that slows down the overall performance. The change is quite drastic though especially at the beginning where we index linestrings for 1 hour and the performance steadily drops from 24,000 to 8,000. That's 3x difference. At the end of the test our throughput is 2,000, which is 10x slower than when we started.

Btw, I think it is great that we have isolated the three different types now and can see how they behave.

@danielmitterdorfer I wonder if it would make sense to also index them in 3 different indices instead of a single index as I do at the moment. Any other ideas on how to make this test more meaningful? My main concern is that if settle on using this overall test running time as "a canary in a coal mine" whatever is causing the slowdown on the large index will overshadow whatever smaller regressions and improvements we will introduce until we figure out and fix the source of this slowdown.

danielmitterdorfer · 2019-01-14T16:11:42Z

Could this be explained by #61 (comment)?

I am not sure.

You could try rerunning with a profiler (--telemetry=jfrto capture a flight recording) or maybe we need to add run a benchmark with more verbose logs enabled so we can see what is happening?

I wonder if it would make sense to also index them in 3 different indices instead of a single index as I do at the moment.

That would make sense to me, yes.

Switch to using different indices for different shape types.

imotov · 2019-01-15T02:23:16Z

Could this be explained by #61 (comment)?

I think the answer is actually yes. My assumption was wrong. With the final index size 71.6gb here are the 10 largest segments at the end of the linestrings run with total of 42 segments. So, it looks like there is still plenty of merging going on there. I was watching this test run for a while and while the biggest 2.7GB segment was formed quite earlier in the run the second and third showed up only closer to the end of it.

index               shard prirep ip        segment generation docs.count docs.deleted    size size.memory committed searchable version compound
osmlinestrings      0     p      127.0.0.1 _d4            472    6065818            0   2.7gb     3660671 true      true       8.0.0   false
osmlinestrings      0     p      127.0.0.1 _m7            799    5512526            0   2.5gb     2858274 true      true       8.0.0   false
osmlinestrings      0     p      127.0.0.1 _se           1022    4794982            0   2.3gb     2125728 true      true       8.0.0   false
osmlinestrings      0     p      127.0.0.1 _xx           1221      83458            0     1gb      370221 true      true       8.0.0   true
osmlinestrings      0     p      127.0.0.1 _xl           1209      85501            0     1gb      366481 true      true       8.0.0   true
osmlinestrings      0     p      127.0.0.1 _wq           1178     484087            0 944.3mb      526567 true      true       8.0.0   true
osmlinestrings      0     p      127.0.0.1 _x0           1188      69701            0 934.9mb      345104 true      true       8.0.0   true
osmlinestrings      0     p      127.0.0.1 _wi           1170     486499            0 532.5mb      398799 true      true       8.0.0   true
osmlinestrings      0     p      127.0.0.1 _w8           1160     309137            0 399.5mb      240792 true      true       8.0.0   true
osmlinestrings      0     p      127.0.0.1 _vo           1140     250744            0 348.5mb      248503 true      true       8.0.0   true

I also realized that I need to wait for the merge of linestrings to finish before I start indexing multilinestrings since it looks like it still continues for a while and interferes with multilinestrings indexing.

danielmitterdorfer · 2019-02-14T13:01:39Z

@imotov is this PR superseded by #62 or do you intent to still work on this one?

imotov · 2019-02-14T15:20:49Z

s this PR superseded by #62 or do you intent to still work on this one?

@danielmitterdorfer I would say #62 is a parallel effort. Knowing how geoshapes work for points is very useful for the effort to consolidate geo_point and geo_shape data types but it is not sufficient as a geo_shape metric. We still need a full-blown geo_shape test with different geo_shape types.

I think I addressed your comments from this PR as much as I could. If you see any additional issues I can address, please let me know. The slowdown through the test is still present, but I think it's more a reflection of reality rather than a flaw in the test. Hopefully, @iverase's merge optimization will reduce this issue.

danielmitterdorfer · 2019-02-15T08:09:48Z

Thanks for the update @imotov! I'll do a final pass very soon.

danielmitterdorfer

Looks mostly fine but I left a few comments around removal of types and the bulk size.

danielmitterdorfer · 2019-02-21T12:32:58Z

geoshape/README.md

+
+This track allows to overwrite the following parameters with Rally 0.8.0+ using `--track-params`:
+
+* `bulk_size` (default: 500)


This differs from the value in the track (100) + also I am not sure whether the same bulk size for all corpora is right. I wonder whether we need to introduce dedicated parameters per corpus?

danielmitterdorfer · 2019-02-21T12:33:33Z

geoshape/index.json

+    "index.number_of_replicas": {{number_of_replicas | default(0)}}
+  },
+  "mappings": {
+    "_doc": {


This works for 6.x but not on master anymore as types have been removed now.

danielmitterdorfer · 2019-02-21T12:34:22Z

geoshape/operations/default.json

+    {
+      "name": "index-append-linestrings",
+      "operation-type": "bulk",
+      "bulk-size": {{bulk_size | default(100)}},


As I've mentioned in my feedback for the README, I'm not sure whether we should have a different bulk size per corpus?

danielmitterdorfer · 2019-02-21T12:34:50Z

geoshape/track.json

+    {
+      "name": "osmlinestrings",
+      "body": "index.json",
+      "types": ["_doc"]


Support for types has been removed on master, so this works against 6.x but not against master.

danielmitterdorfer · 2019-02-21T12:35:14Z

geoshape/track.json

+      "name": "linestrings",
+      "base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geoshape",
+      "target-index": "osmlinestrings",
+      "target-type": "_doc",


Support for types has been removed on master, so this works against 6.x but not against master.

Adds different bulk size settings for different shape types and removes doc type.

danielmitterdorfer

Thanks for iterating @imotov. LGTM. Feel free to merge at any time.

Adds a geoshapes track

ab9b5ba

Adds a track from OSM-derived geoshape data. Closes elastic#60

imotov added enhancement Review labels Jan 2, 2019

imotov requested review from jpountz, nknize and danielmitterdorfer January 2, 2019 14:56

jpountz reviewed Jan 2, 2019

View reviewed changes

danielmitterdorfer requested changes Jan 3, 2019

View reviewed changes

danielmitterdorfer and others added 3 commits January 3, 2019 14:50

Fix the name of the track

ea77930

Co-Authored-By: imotov <igor@motovs.org>

Add the parsing script

5d6b550

Remove unnecessary update operations

b92f2da

danielmitterdorfer reviewed Jan 4, 2019

View reviewed changes

Update target-throughput values

98a4d3f

Split shapes into 3 corpora by type

7bf1493

- splits shapes into 3 separate corpora - removes failed polygons - reduces default bulk size

Switch to index by shape type

ce57162

Switch to using different indices for different shape types.

Add force merge and refresh after each shape type

17b4d82

danielmitterdorfer reviewed Feb 21, 2019

View reviewed changes

imotov added 2 commits February 22, 2019 14:23

Merge remote-tracking branch 'elastic/master' into geoshapes

066964c

Address review comments

f76a1e5

Adds different bulk size settings for different shape types and removes doc type.

imotov requested a review from danielmitterdorfer February 22, 2019 19:37

danielmitterdorfer approved these changes Feb 26, 2019

View reviewed changes

danielmitterdorfer removed the Review label Feb 26, 2019

imotov merged commit 1952843 into elastic:master Feb 26, 2019


		This track allows to overwrite the following parameters with Rally 0.8.0+ using `--track-params`:

		* `bulk_size` (default: 500)

		@@ -0,0 +1,2 @@
		documents.json.bz2
		documents-1k.json.bz2

		@@ -0,0 +1,2 @@
		documents.json.bz2
		documents-1k.json.bz2

Adds a geoshapes track #61

Adds a geoshapes track #61

Conversation

imotov commented Jan 2, 2019

jpountz left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz commented Jan 2, 2019

imotov commented Jan 2, 2019

danielmitterdorfer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz commented Jan 3, 2019

imotov commented Jan 3, 2019

danielmitterdorfer commented Jan 4, 2019

jpountz commented Jan 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielmitterdorfer commented Jan 4, 2019

imotov commented Jan 4, 2019

jpountz commented Jan 4, 2019

iverase commented Jan 7, 2019

imotov commented Jan 7, 2019

imotov commented Jan 8, 2019

danielmitterdorfer commented Jan 9, 2019

imotov commented Jan 12, 2019 • edited

danielmitterdorfer commented Jan 14, 2019

imotov commented Jan 14, 2019

danielmitterdorfer commented Jan 14, 2019

imotov commented Jan 15, 2019

danielmitterdorfer commented Feb 14, 2019

imotov commented Feb 14, 2019

danielmitterdorfer commented Feb 15, 2019

danielmitterdorfer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielmitterdorfer left a comment

Choose a reason for hiding this comment

jpountz left a comment •

edited

imotov commented Jan 12, 2019 •

edited