-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of Unified Highlighter is significantly worse than Postings highlighter in Elasticsearch 5.4.1 #25699
Comments
@jimczi can you please comment on that? |
@ShilpiRachna thanks for reporting. Can you share your mapping and an example doc ? |
@jimczi Have uploaded the final mapping and a sample document. We are blocked because of this issue. We cannot revert back to postings highlighter due to another issue fixed only in unified: #25131 SampleDocument.txt Thanks! |
…mapping This commit changes how the offset source is picked for each field using the es mapping rather than the underlying Lucene field infos. It's mandatory for large mappings where field infos retrieval can be costly (the global field infos is merged for each highlighted field in every hit by the Lucene impl). Fixes elastic#25699
Thanks @ShilpiRachna |
…mapping (#25747) This commit changes how the offset source is picked for each field using the es mapping rather than the underlying Lucene field infos. It's mandatory for large mappings where field infos retrieval can be costly (the global field infos is merged for each highlighted field in every hit by the Lucene impl). Fixes #25699
…mapping (#25747) This commit changes how the offset source is picked for each field using the es mapping rather than the underlying Lucene field infos. It's mandatory for large mappings where field infos retrieval can be costly (the global field infos is merged for each highlighted field in every hit by the Lucene impl). Fixes #25699
Thank you @jimczi for taking the fix for this. We understand that having a lot of fields is a little problematic and we've tried our best to address those issues. We've redesigned our solution to query only on a small number of fields (~10) and also highlight on a small number of fields (~10). Queries are still allowed on any individual field though. By doing this we improved our query and highlighting performances respectively. What other problems that you know of can we face due to large number of fields? |
Thanks a lot @jimczi for the fix. Is there any tentative release date for Elasticsearch 5.6.0? It would help us plan if we could get the release date. Thanks again! |
Elasticsearch version: 5.5.0
JVM version: 1.8.0_72
OS version: Windows 10
Description of the problem including expected versus actual behavior:
We indexed around 1 million documents in a single shard, zero replica index with _source enabled. Each document has around 40 to 50 fields. There is only one mapping used for all documents and it has around 500 fields. We highlight only 12 fields. I did a multi-match query with Unified and Postings highlighter and the performance with Unified is much worse than with Postings highlighter:
Comparison of performance for Multi-match query:
No highlighter: 4 ms
Postings highlighter: 250 ms
Unified Highlighter: 4.5 - 7 s
Interestingly, the performance of match_all query with Unified Highlighter is also much worse ( ~ 15 s) - shouldn't the highlighter not impact query performance when there are no terms to highlight?
The timings in the table above are fairly consistent even when the same queries are fired back to back multiple times.
Is this a known issue? We are forced to go back to Postings highlighter till this issue is fixed.
--
The text was updated successfully, but these errors were encountered: