Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of Unified Highlighter is significantly worse than Postings highlighter in Elasticsearch 5.4.1 #25699

Closed
ShilpiRachna opened this issue Jul 13, 2017 · 6 comments
Labels
:Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@ShilpiRachna
Copy link

ShilpiRachna commented Jul 13, 2017

Elasticsearch version: 5.5.0
JVM version: 1.8.0_72
OS version: Windows 10

Description of the problem including expected versus actual behavior:
We indexed around 1 million documents in a single shard, zero replica index with _source enabled. Each document has around 40 to 50 fields. There is only one mapping used for all documents and it has around 500 fields. We highlight only 12 fields. I did a multi-match query with Unified and Postings highlighter and the performance with Unified is much worse than with Postings highlighter:

Comparison of performance for Multi-match query:

No highlighter: 4 ms
Postings highlighter: 250 ms
Unified Highlighter: 4.5 - 7 s

Interestingly, the performance of match_all query with Unified Highlighter is also much worse ( ~ 15 s) - shouldn't the highlighter not impact query performance when there are no terms to highlight?

The timings in the table above are fairly consistent even when the same queries are fired back to back multiple times.

Is this a known issue? We are forced to go back to Postings highlighter till this issue is fixed.

--

@danielmitterdorfer
Copy link
Member

@jimczi can you please comment on that?

@danielmitterdorfer danielmitterdorfer added the :Search Relevance/Highlighting How a query matched a document label Jul 13, 2017
@jimczi
Copy link
Contributor

jimczi commented Jul 13, 2017

@ShilpiRachna thanks for reporting. Can you share your mapping and an example doc ?
The performance of the unified and postings highlighter should be similar if the field is indexed with index_options:offsets. Some benchmarks showed that but you're maybe hitting a bug or a regression so more details would help.

@ShilpiRachna
Copy link
Author

@jimczi Have uploaded the final mapping and a sample document.

We are blocked because of this issue. We cannot revert back to postings highlighter due to another issue fixed only in unified: #25131

SampleDocument.txt
Mapping.txt

Thanks!

jimczi added a commit to jimczi/elasticsearch that referenced this issue Jul 17, 2017
…mapping

This commit changes how the offset source is picked for each field using the es mapping rather than the underlying Lucene field infos.
It's mandatory for large mappings where field infos retrieval can be costly (the global field infos is merged for each highlighted field in every hit by the Lucene impl).

Fixes elastic#25699
@jimczi
Copy link
Contributor

jimczi commented Jul 17, 2017

Thanks @ShilpiRachna
I opened #25747 to speed up the highlighting for big mappings with a lot of fields. Though the number of fields in your mapping is problematic for other operations so I'd advise you to change your design to use less fields.

jimczi added a commit that referenced this issue Jul 17, 2017
…mapping (#25747)

This commit changes how the offset source is picked for each field using the es mapping rather than the underlying Lucene field infos.
It's mandatory for large mappings where field infos retrieval can be costly (the global field infos is merged for each highlighted field in every hit by the Lucene impl).

Fixes #25699
jimczi added a commit that referenced this issue Jul 17, 2017
…mapping (#25747)

This commit changes how the offset source is picked for each field using the es mapping rather than the underlying Lucene field infos.
It's mandatory for large mappings where field infos retrieval can be costly (the global field infos is merged for each highlighted field in every hit by the Lucene impl).

Fixes #25699
@bittusarkar
Copy link

Thank you @jimczi for taking the fix for this. We understand that having a lot of fields is a little problematic and we've tried our best to address those issues. We've redesigned our solution to query only on a small number of fields (~10) and also highlight on a small number of fields (~10). Queries are still allowed on any individual field though. By doing this we improved our query and highlighting performances respectively. What other problems that you know of can we face due to large number of fields?

@ShilpiRachna
Copy link
Author

Thanks a lot @jimczi for the fix. Is there any tentative release date for Elasticsearch 5.6.0? It would help us plan if we could get the release date. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

5 participants