[ingest] geo-ip performance improvements #33029

jakelandis · 2018-08-21T14:44:08Z

Re-implement the cache to avoid jackson JSON de-serialization for
every IP lookup. The built in maxmind cache caches JsonNode objects.
This requires de-serialization for every lookup, even if the object
is found in cache. Profiling shows that is very expensive (CPU).

The cache will now consist of the fully de-serialized objects.
Profiling shows that the new footprint for the CityDB is ~6KB per cache
entry. This may result in ~6MB increase with the 1000 entry default.
The performance has been measured up to 40% faster on a modern 4 core/8 thread
CPU for an ingest (minimal indexing) workflow.

Further, the since prior implementation cached the JsonNode objects,
and there is not a 1:1 relationship between an IP lookup / JsonNode
object, the default cache size was most likely too small to be very
effective. While this change does not change the 1000 default cache
size, it will now cache more since there is now a 1:1 relationship between
an IP lookup and value in the cache.

This issue was discovered while adding ingest node benchmarks to rally. It is believed that this is the same root cause as maxmind/GeoIP2-java#68.

Profiling before this change:

Via Rally and JFR:
Considerably more expensive then Grok:

Majority of the cost is coming from Jackson deserialization

Similar results as measured locally with YourKit:

The fully populated cache is pretty small , 1000 entries ~= 216KB

Profiling this change:

The Jackson hotspot is now gone:

However our cache size grew to ~6MB for 1000 entries

As mentioned in the commit, there is now a much stronger correlation between 1 cache entry and 1 IP address lookup. So not only are we avoiding the hotspot, we are also effectively caching more with the same 1000 default configuration.

The difference of the cache size is substantial, but imo ~6MB is still with reason especially for the performance gains.

Rally benchmark

Using a track that turns off indexing for all fields (to better isolate ingest performance), I was able to see a ~40% increase in performance on a i7 4 core/8 thread physical node.

The baseline is before this change, and the contender is this change:

|                               Metric |          Task |   Baseline |   Contender |     Diff |   Unit |
|-------------------------------------:|--------------:|-----------:|------------:|---------:|-------:|
|                  Total indexing time |               |    64.8516 |     93.7348 |  28.8832 |    min |
|          Min indexing time per shard |               |    64.8516 |     93.7348 |  28.8832 |    min |
|       Median indexing time per shard |               |    64.8516 |     93.7348 |  28.8832 |    min |
|          Max indexing time per shard |               |    64.8516 |     93.7348 |  28.8832 |    min |
|                     Total merge time |               |    37.6405 |     34.9398 |  -2.7008 |    min |
|             Min merge time per shard |               |    37.6405 |     34.9398 |  -2.7008 |    min |
|          Median merge time per shard |               |    37.6405 |     34.9398 |  -2.7008 |    min |
|             Max merge time per shard |               |    37.6405 |     34.9398 |  -2.7008 |    min |
|            Total merge throttle time |               |    29.8068 |     25.7024 | -4.10447 |    min |
|    Min merge throttle time per shard |               |    29.8068 |     25.7024 | -4.10447 |    min |
| Median merge throttle time per shard |               |    29.8068 |     25.7024 | -4.10447 |    min |
|    Max merge throttle time per shard |               |    29.8068 |     25.7024 | -4.10447 |    min |
|                   Total refresh time |               |     1.8583 |     1.81025 | -0.04805 |    min |
|           Min refresh time per shard |               |     1.8583 |     1.81025 | -0.04805 |    min |
|        Median refresh time per shard |               |     1.8583 |     1.81025 | -0.04805 |    min |
|           Max refresh time per shard |               |     1.8583 |     1.81025 | -0.04805 |    min |
|                     Total flush time |               |    1.41893 |     1.51308 |  0.09415 |    min |
|             Min flush time per shard |               |    1.41893 |     1.51308 |  0.09415 |    min |
|          Median flush time per shard |               |    1.41893 |     1.51308 |  0.09415 |    min |
|             Max flush time per shard |               |    1.41893 |     1.51308 |  0.09415 |    min |
|                   Total Young Gen GC |               |     159.79 |     207.766 |   47.976 |      s |
|                     Total Old Gen GC |               |     18.628 |      17.926 |   -0.702 |      s |
|                           Store size |               |    5.07259 |     6.90976 |  1.83717 |     GB |
|                        Translog size |               |   0.563634 |    0.603898 |  0.04026 |     GB |
|                           Index size |               |     5.6045 |     5.73432 |  0.12982 |     GB |
|                      Totally written |               |    88.6814 |     90.8917 |  2.21029 |     GB |
|               Heap used for segments |               |    83.5198 |     83.7625 |  0.24275 |     MB |
|             Heap used for doc values |               | 0.00259399 |  0.00382614 |  0.00123 |     MB |
|                  Heap used for terms |               |    77.4618 |     77.7049 |  0.24318 |     MB |
|                  Heap used for norms |               |          0 |           0 |        0 |     MB |
|                 Heap used for points |               |    1.15786 |     1.16739 |  0.00953 |     MB |
|          Heap used for stored fields |               |    4.89755 |     4.88636 | -0.01119 |     MB |
|                        Segment count |               |         40 |          59 |       19 |        |
|                       Min Throughput | process-geoip |    82937.4 |      115209 |  32271.5 | docs/s |
|                    Median Throughput | process-geoip |      85895 |      120224 |  34329.3 | docs/s |
|                       Max Throughput | process-geoip |    93623.2 |      128194 |  34570.7 | docs/s |
|              50th percentile latency | process-geoip |    440.402 |     313.561 | -126.841 |     ms |
|              90th percentile latency | process-geoip |    622.882 |     410.827 | -212.056 |     ms |
|              99th percentile latency | process-geoip |    876.212 |     697.616 | -178.595 |     ms |
|            99.9th percentile latency | process-geoip |    1356.22 |     1148.03 | -208.195 |     ms |
|           99.99th percentile latency | process-geoip |    1673.39 |     1270.14 | -403.247 |     ms |
|             100th percentile latency | process-geoip |    1882.06 |     1362.75 | -519.308 |     ms |
|         50th percentile service time | process-geoip |    440.402 |     313.561 | -126.841 |     ms |
|         90th percentile service time | process-geoip |    622.882 |     410.827 | -212.056 |     ms |
|         99th percentile service time | process-geoip |    876.212 |     697.616 | -178.595 |     ms |
|       99.9th percentile service time | process-geoip |    1356.22 |     1148.03 | -208.195 |     ms |
|      99.99th percentile service time | process-geoip |    1673.39 |     1270.14 | -403.247 |     ms |
|        100th percentile service time | process-geoip |    1882.06 |     1362.75 | -519.308 |     ms |
|                           error rate | process-geoip |          0 |           0 |        0 |      % |

Re-implement the cache to avoid jackson JSON de-serialization for every IP lookup. The built in maxmind cache caches JsonNode objects. This requires de-serialization for every lookup, even if the object is found in cache. Profiling shows that is very expensive (CPU). The cache will now consist of the fully de-serialized objects. Profiling shows that the new footprint for the CityDB is ~6kb per cache entry. This may result in ~6Mb increase with the 1000 entry default. The performance has been measured up to 40% faster on a modern 4 core/8 thread CPU for an ingest (minimal indexing) workflow. Further, the since prior implementation cached the JsonNode objects, and there is not a 1:1 relationship between an IP lookup / JsonNode object, the default cache size was most likely too small to be very effective. While this change does not change the 1000 default cache size, it will now cache more since there is now a 1:1 relationship between an IP lookup and value in the cache.

elasticmachine · 2018-08-21T14:44:22Z

Pinging @elastic/es-core-infra

jakelandis · 2018-08-21T14:46:43Z

plugins/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/IngestGeoIpPlugin.java

+     * cost of deserialization for each lookup (cached or not). This comes at slight expense of higher memory usage, but significant
+     * reduction of CPU usage.
+     */
+    class GeoIpCache {


Moved this to an inner class since this needs to be 1 instance per plugin (same cache used across multiple processors).

jakelandis · 2018-08-21T15:01:08Z

Jenkins test this please.

talevy · 2018-08-23T01:47:03Z

these performance improvements are great! sorry I didn't get a chance to look at this yet. Will allocate some time this week!

jakelandis · 2018-09-04T15:23:16Z

@talevy @rjernst ping

rjernst

Looks ok to me, although I am not that familiar with the previous inner workings.

rjernst · 2018-09-04T22:05:31Z

plugins/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/IngestGeoIpPlugin.java

+     * cost of deserialization for each lookup (cached or not). This comes at slight expense of higher memory usage, but significant
+     * reduction of CPU usage.
+     */
+    class GeoIpCache {


can't this be static?

I don't think it makes sense to allow instances of this class to exist outside of the IngestGeoIpPlugin which it is nested. Hence not static.

Static is about whether there is state needed from the outer class. This is still package private, so I don't see it as a problem for someone trying to construct this (especially since it is all within the geoid plugin project anyways.

Static is about whether there is state needed from the outer class.

Non static inner classes are also an effective for defining the logical scope to which something belongs (if even if the outer state's class is not needed). In this case you need a instance of a plugin to instantiate a cache. This helps to enforce the relationship between these two classes. IMO this helps to enforce this cache is not intended to be used by any other class (especially the by processor class...we don't a cache per processor instance) since it's logical scope is per-plugin. If I could, it would be fully private, but for testing reasons it is not.

This is mostly academic and I don't have strong opposition, just wanted to describe my rationale in more detail.

I'll defer to your opinion and change this (and below) to static.

rjernst · 2018-09-04T22:05:47Z

plugins/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/IngestGeoIpPlugin.java

+         * provides a means to safely cast the return objects.
+         * @param <T> The AbstractResponse type used to scope the key and cast the result.
+         */
+        private class CacheKey<T extends AbstractResponse> {


can't this be static?

I don't think it makes sense to allow instances of the this class to exist outside the IngestGeoIpPlugin.GeoIpCache class which it is nested. Hence not static.

rjernst · 2018-09-04T22:07:08Z

plugins/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/IngestGeoIpPlugin.java

+        //package private for testing
+        GeoIpCache(long maxSize) {
+            if (maxSize < 0) {
+                throw new IllegalArgumentException("geoip cache size must be 0 or greater");


cache size -> max cache size?

The setting which effects this is ingest.geoip.cache_size hence the wording. I don't have a strong opinion, but do believe it is pretty unlikely anyone would set the value to negative.

Since this is actually the max size, I think we should use that wording? We aren't referring to the setting directly in the exception, and documentation + renaming the setting long term can fix the confusion?

done. updated to max cache size.

jakelandis · 2018-09-05T16:39:36Z

updated branch to re-run CI

rjernst

Thanks, LGTM.

rjernst · 2018-09-05T18:20:36Z

plugins/ingest-geoip/src/main/java/org/elasticsearch/ingest/geoip/IngestGeoIpPlugin.java

+        //package private for testing
+        GeoIpCache(long maxSize) {
+            if (maxSize < 0) {
+                throw new IllegalArgumentException("geoip cache size must be 0 or greater");


Since this is actually the max size, I think we should use that wording? We aren't referring to the setting directly in the exception, and documentation + renaming the setting long term can fix the confusion?

Re-implement the cache to avoid jackson JSON de-serialization for every IP lookup. The built in maxmind cache caches JsonNode objects. This requires de-serialization for every lookup, even if the object is found in cache. Profiling shows that is very expensive (CPU). The cache will now consist of the fully de-serialized objects. Profiling shows that the new footprint for the CityDB is ~6KB per cache entry. This may result in ~6MB increase with the 1000 entry default. The performance has been measured up to 40% faster on a modern 4 core/8 thread CPU for an ingest (minimal indexing) workflow. Further, the since prior implementation cached the JsonNode objects, and there is not a 1:1 relationship between an IP lookup / JsonNode object, the default cache size was most likely too small to be very effective. While this change does not change the 1000 default cache size, it will now cache more since there is now a 1:1 relationship between an IP lookup and value in the cache.

jakelandis added >enhancement v7.0.0 v6.5.0 labels Aug 21, 2018

jakelandis requested review from rjernst and talevy August 21, 2018 14:44

jakelandis added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Aug 21, 2018

jakelandis commented Aug 21, 2018

View reviewed changes

rjernst approved these changes Sep 4, 2018

View reviewed changes

Merge branch 'master' into geo-ip-performance

3614eec

move inner classes to static nested classes

f06a9e4

rjernst approved these changes Sep 5, 2018

View reviewed changes

fix exception message

61948d2

jakelandis merged commit 6f9c9ab into elastic:master Sep 5, 2018

jakelandis added the backport pending label Sep 5, 2018

jakelandis mentioned this pull request Sep 5, 2018

[ingest] 6.x backport - geo-ip performance improvements #33446

Merged

jakelandis removed the backport pending label Sep 6, 2018

Mpdreamz mentioned this pull request Dec 13, 2018

[meta] 6.5.0 Release elastic/elasticsearch-net#3457

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

joegallo mentioned this pull request Jan 8, 2024

Ingest geoip processor cache 'no results' from the database #104092

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ingest] geo-ip performance improvements #33029

[ingest] geo-ip performance improvements #33029

jakelandis commented Aug 21, 2018

elasticmachine commented Aug 21, 2018

jakelandis Aug 21, 2018

jakelandis commented Aug 21, 2018

talevy commented Aug 23, 2018

jakelandis commented Sep 4, 2018

rjernst left a comment

rjernst Sep 4, 2018

jakelandis Sep 5, 2018

rjernst Sep 5, 2018

jakelandis Sep 5, 2018

rjernst Sep 4, 2018

jakelandis Sep 5, 2018

rjernst Sep 4, 2018

jakelandis Sep 5, 2018

rjernst Sep 5, 2018

jakelandis Sep 5, 2018

jakelandis commented Sep 5, 2018

rjernst left a comment

rjernst Sep 5, 2018

[ingest] geo-ip performance improvements #33029

[ingest] geo-ip performance improvements #33029

Conversation

jakelandis commented Aug 21, 2018

Profiling before this change:

Profiling this change:

Rally benchmark

elasticmachine commented Aug 21, 2018

Choose a reason for hiding this comment

jakelandis commented Aug 21, 2018

talevy commented Aug 23, 2018

jakelandis commented Sep 4, 2018

rjernst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakelandis commented Sep 5, 2018

rjernst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment