Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percolate performance difference in 1.0.0 and 1.2.2 #6806

Closed
quhar opened this issue Jul 9, 2014 · 10 comments · Fixed by #7081

Comments

@quhar
Copy link

commented Jul 9, 2014

Hi,

I was doing some test with percolate and I noticed huge difference in performance of percolate query between 1.0.0 and 1.2.2. In 1.0.0 indexing of percolate queries takes much longer than in 1.2.2 (around 10 times), but percolate query is much faster. With 400k queries I get tens of ms in 1.0.0 to percolate document and couple of seconds in 1.2.2. I performed all tests with default settings always on clean ES instance. Percolate document looks like:

{
"query": {
   "filtered": {
      "filter": {   
         "and": [      
            {
               "geo_distance": {
                  "distance": "94km",
                  "location": {
                     "lat": "97",
                     "lon": "-76"
                  }       
               }       
            }       
         ]       
      },
      "query": {
         "match": {
            "_all": "note father surprise"
         }       
      }       
   }       
},
"type": "offer"
}

and percolate query:

GET /test-index/offer/_percolate
{
    "doc": {
        "name": "note",
        "location": [-7,-80]
    }
}

Has something changes with default setting of ES between these versions regarding percolate? I couldn't find anything in release notes.

@martijnvg

This comment has been minimized.

Copy link
Member

commented Jul 10, 2014

I can't reproduce the slowdown that you're reporting. I benchmarked with different types of queries (term, range and geo_distance) between 1.0 and 1.2.2. In fact on 1.2.2 I get a slightly better performance. Can you share a more detailed reproduction of the big performance difference that you're experiencing?

The indexing slowdown in 1.0.0 was due to [1] and has been resolved in 1.0.2.
1: #5339

@quhar

This comment has been minimized.

Copy link
Author

commented Jul 10, 2014

I've performed additional tests. I've used 1.0.2 (as inserting percolate queries was very slow) and 1.2.2. Difference is much smaller (previously i had difference in nr of queries), but still 1.2.2 is 3 times slower.
Test script
I did two tests, one with 200k queries, second one with 300k. In both cases for test I percolated same document 100 times and computed average. Here are detailed results.

With 1.0.2 I got average 198 ms per percolate with 200k queries and 312 ms with 300k queries.
With 1.2.2 values are 667 and 1008 respectively.

Edit:
In both cases it's standard ES from deb without any config or runtime changes.

@martijnvg

This comment has been minimized.

Copy link
Member

commented Jul 11, 2014

@quhar I can confirm the performance regression:

1.0.3:
No handlers could be found for logger "elasticsearch"
Indexed 300000 queries into precolator, took 234.444020987 seconds
After 100 tests, average query time: 288.63

1.2.2:
No handlers could be found for logger "elasticsearch"
Indexed 300000 queries into precolator, took 250.046157837 seconds
After 100 tests, average query time: 846.89

The following change introduced in 1.2.2 is causing this slowdown:
#6578

On 1.2.1 there is similar performance as is on 1.0.3:

1.2.1:
No handlers could be found for logger "elasticsearch"
Indexed 300000 queries into precolator, took 217.544229984 seconds
After 100 tests, average query time: 270.79

This change disabled caching for both filter cache and field data at all times in the percolator. In general this is a good improvement, since the caching for an in memory index make no sense. However the disabling of the field data caching has a drawback in that even constructing data structures for the percolator in memory index is relatively expensive, and that is what you're noticing if you percolating across 300k docs with a geo_distance filter that relies on field data.

So the percolator is creating the same data structure 300k times, which is causing the performance regression. Instead the percolator should temporarily cache field data during the execution of a percolator request and that should fix the performance regression.

@martijnvg

This comment has been minimized.

Copy link
Member

commented Jul 13, 2014

The performance regression only occurs if percolator queries contain filters that rely on field data. (e.g. geo_distance filter). The best way to get around the field data loading overhead costs is to enable doc values for geo fields.

@quhar If you set doc_values to true on the location field the performance is similar to 1.0.3:

mappings = {
    'mappings': {
        TYPE: {
            'properties': {
                'name': {
                    u'type': u'string',
                },
                'location': {
                    u'type': u'geo_point',
                    u'doc_values': u'true',
                }
            }
        }
    }
}

By enabling doc_values the cost of building field data is done only once instead of 300k times. I think this the best way to get around this performance regression.

@quhar

This comment has been minimized.

Copy link
Author

commented Jul 14, 2014

Thanks. I will take a look at doc_value then

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Aug 4, 2014
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries.
Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call.

Closes elastic#6806
Closes elastic#7081
martijnvg added a commit that referenced this issue Aug 4, 2014
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries.
Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call.

Closes #6806
Closes #7081
martijnvg added a commit that referenced this issue Aug 4, 2014
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries.
Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call.

Closes #6806
Closes #7081
martijnvg added a commit that referenced this issue Sep 8, 2014
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries.
Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call.

Closes #6806
Closes #7081
@dennisgorelik

This comment has been minimized.

Copy link

commented Feb 16, 2015

We found that when we use doc_values, percolator finds about 30% less matches (queries).
Percolator (against index with doc_values setting on Location property) does not return matches that it clearly should have find (query is searching for keyword that is present in percolated item; percolated item is within location range specified by the query).

Execution time was smaller by about 30% too.

We do batch percolation.
We percolate 50 items per batch.
It took:
28.225s to percolate 50 items without "doc_values" option (found 14787 matching queries).
19.855s to percolate 50 items with "doc_values" option (found 10781 matching queries).

We made this change to our index:
Replaced:

""Location"": {
    ""type"": ""geo_point""
}

with:

""Location"": {
    ""type"": ""geo_point"",
    ""doc_values"": ""true""
}

Number of queries to percolate against: 154,195
Version: ElasticSearch 1.2.2

@martijnvg

This comment has been minimized.

Copy link
Member

commented Feb 16, 2015

@dennisgorelik ouch, does this also occur with a more recent version of ES? (1.3.8 / 1.4.3) If so can you open a new issue for this? and if possible share a smaller reproduction of the bug?

@dennisgorelik

This comment has been minimized.

Copy link

commented Feb 16, 2015

Martin,
We are going to install and test doc_values effect on ElasticSearch 1.4.3 (probably in a day or two).
If we reproduce the same percolation mismatch issue on a smaller set or data we will file a separate issue.

@dennisgorelik

This comment has been minimized.

Copy link

commented Feb 17, 2015

Martin,
We were able to reproduce that doc_value mismatch.
I created separate issue for that:
#9714

@dennisgorelik

This comment has been minimized.

Copy link

commented Feb 17, 2015

We tried performance of the same percolate batch (of 50 items) on ElasticSearch 1.4.3.
The performance improved by ~25%
28.225s on ElasticSearch 1.2.2
21.270s on ElasticSearch 1.4.3

Using doc_value=true on ElasticSearch 1.4.3 did not improve performance (but caused percolator to miss ~30% of queries that it should have found - the same issue as in ElasticSearch 1.2.2).

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
Before the index reader used by the percolator didn't allow to register a CoreCloseListener, but now it does, making it safe to cache index field data cache entries.
Creating field data structures is relatively expensive and caching them can save a lot of noise if many queries are evaluated in a percolator call.

Closes elastic#6806
Closes elastic#7081
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.