Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a best_speed value for index.codec setting #71788

Closed
alexklibisz opened this issue Apr 17, 2021 · 3 comments
Closed

Add a best_speed value for index.codec setting #71788

alexklibisz opened this issue Apr 17, 2021 · 3 comments
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@alexklibisz
Copy link

alexklibisz commented Apr 17, 2021

ES lets you configure codec compression via the "index.codec" setting and/or via a MapperService. You can use "default" or "best_compression", but you can't use "best_speed". I'd really like a way to set "best_speed" and have it apply to binary doc values. This would really simplify the elastiknn plugin, which uses binary doc values to store vectors. Currently I'm implementing my own EngineFactory, just to provide a CodecService which uses the default Lucene87Codec to encode binary doc values. Providing my own EngineFactory actually seems to interfere with some other Elasticsearch functionality. Instead, it would be a lot better and simpler to do one of two things:

  1. (preferred) Allow a "best_speed" option for the index.codec setting. Under the hood you would simply instantiate a new Lucene87Codec(BEST_SPEED).
  2. Provide a way for a custom MappedFieldType to expose its compression level. Then my plugin could decide the compression level for a given type.

Some related observations:

Binary doc values are hard-coded to use best compression: code

Lucene 8.8 recently added a BEST_SPEED setting which makes a big difference for binary doc values. Best speed is actually the default for Lucene87Codec: code

Not sure if it's intentional, but if the MapperService is null, ES will actually use BEST_SPEED: code (since the default constructor for the 87 codec uses BEST_SPEED).

You can technically already do this by setting index.codec to lucene_default, again because the default constructor uses BEST_SPEED, but there are several places in the docs which discourage users from using lucene_default outside of testing.

@alexklibisz alexklibisz added >enhancement needs:triage Requires assignment of a team area label labels Apr 17, 2021
@alexklibisz alexklibisz changed the title Allow best_speed setting for binary doc values Add a best_speed value for index.codec setting Apr 17, 2021
@jtibshirani
Copy link
Contributor

Hello @alexklibisz ! After more consideration, Lucene developers decided to remove compression on binary doc values (https://issues.apache.org/jira/browse/LUCENE-9843). So there will no longer be a best_speed/ best_compression option for doc values, the behavior will essentially always be best_speed.

Once we upgrade that Lucene version, it should simplify your set-up. In the meantime, I'd prefer to wait for that change to land instead of introducing a temporary configuration option.

@jtibshirani jtibshirani added :Search Foundations/Mapping Index mappings, including merging and defining field types and removed needs:triage Requires assignment of a team area label labels Apr 22, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Apr 22, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@alexklibisz
Copy link
Author

Hello @alexklibisz ! After more consideration, Lucene developers decided to remove compression on binary doc values (https://issues.apache.org/jira/browse/LUCENE-9843). So there will no longer be a best_speed/ best_compression option for doc values, the behavior will essentially always be best_speed.

Once we upgrade that Lucene version, it should simplify your set-up. In the meantime, I'd prefer to wait for that change to land instead of introducing a temporary configuration option.

Thanks for linking that issue. I was not aware of it. Totally understand and agree. I'll close and just wait until that change lands in Lucene.

@javanna javanna added Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

4 participants