Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

field collapsing very slow #24900

Closed
zplzpl opened this issue May 26, 2017 · 7 comments
Closed

field collapsing very slow #24900

zplzpl opened this issue May 26, 2017 · 7 comments

Comments

@zplzpl
Copy link

zplzpl commented May 26, 2017

Elasticsearch version: v5.3.2

Plugins installed: [ik,pinyin]

JVM version (java -version): openjdk version "1.8.0_121"

OS version (uname -a if on a Unix-like system): Linux es1 2.6.32-431.11.25.el6.ucloud.x86_64 #1 SMP Tue Jul 19 10:06:12 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Only one field is used to collapsing . the field type is keyword.

hits.total = 1130

The same request does not contain the collapsing that are only used for 13 milliseconds, using a collapsing using 130 milliseconds. (This time is CURL starttransfer time)

10x time.

Using the restful api.

@clintongormley
Copy link

how many hits are you asking for?

@clintongormley
Copy link

and how well distributed are your collapse field values. it has to keep pulling hits until it finds size distinct values.

and are you asking for inner hits as well?

that's why we ask for a recreation

@zplzpl
Copy link
Author

zplzpl commented May 26, 2017

my doc count: 12000

this my collapse DSL:

"collapse": {
      "field": "spu",
      "inner_hits": {
         "name": "skus",
         "size": 50,
         "_source": {
            "includes": [
               "name",
               "url_path",
               "image_url"
            ]
         }
      },
      "max_concurrent_group_searches": 1
   }

@zplzpl
Copy link
Author

zplzpl commented May 26, 2017

how many hits are you asking for?

my hits.total 1130

@clintongormley
Copy link

You don't say how what you set size to on the search query, but let's assume it uses the default of 10. For each of those hits, it has to do a subsequent query to return the top 50 inner hits, plus you're telling it that it can't do those 10 search requests in parallel.

Your collapsed query is doing much more work than your other query, so it is not surprising it is slower

@jimczi
Copy link
Contributor

jimczi commented May 26, 2017

@zplzpl the took time in the response does not take in account the inner_hits retrieval. I am working on a fix for this discrepancy but as @clintongormley said a size of 50 means that you retrieve 10*50+10 hits instead of 10 so it is indeed slower.

@zplzpl
Copy link
Author

zplzpl commented May 26, 2017

@clintongormley
@jimczi

query size:36

Thank you for reply.
good people.

jimczi added a commit to jimczi/elasticsearch that referenced this issue May 26, 2017
The took time computed for search requests does not take in account the expand search phase.
This change delays the computation to after the expand phase finishes.

Relates elastic#24900
jimczi added a commit that referenced this issue May 31, 2017
The took time computed for search requests does not take in account the expand search phase.
This change delays the computation to after the expand phase finishes.

Relates #24900
jimczi added a commit that referenced this issue May 31, 2017
The took time computed for search requests does not take in account the expand search phase.
This change delays the computation to after the expand phase finishes.

Relates #24900
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants