Switch to Lucene's new IntField/LongField/FloatField/DoubleField. #93165

jpountz · 2023-01-23T15:06:16Z

Lucene introduced new numeric fields that index both points and doc values. This has the same semantics as indexing one field for points and another one for doc values as we did before, but covering both data structures in a single field yielded a speedup in Lucene's nightly benchmarks (see annotation AH) which would be interesting to get too.

This commit does not switch to factory methods for queries such as LongField#newRangeQuery for now, we'll need to look into it in a follow-up.

Lucene introduced new numeric fields that index both points and doc values. This has the same semantics as indexing one field for points and another one for doc values as we did before, but covering both data structures in a single field yielded a speedup in Lucene's nightly benchmarks (see annotation [AH](http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#index_throughput)) which would be interesting to get too. This commit does not switch to factory methods for queries such as `LongField#newRangeQuery` for now, we'll need to look into it in a follow-up.

elasticsearchmachine · 2023-01-23T15:10:38Z

Hi @jpountz, I've created a changelog YAML for you.

elasticsearchmachine · 2023-01-23T15:10:39Z

Pinging @elastic/es-search (Team:Search)

benwtrent

We should add to UnsignedLongFieldMapper as well I think. It also has doc values and is indexed.

benwtrent · 2023-01-23T15:57:16Z

server/src/main/java/org/elasticsearch/index/mapper/DateFieldMapper.java

+        } else {
            context.addToFieldNames(fieldType().name());
        }


Previously, we only called addToFieldNames if at least store was true. Now, we call addToFieldNames no matter the store value. Is this intentional?

Thanks for catching this, I had in my mind that we should add to field names whenever doc values is false, but this field and others do indeed only do this if the field is either indexed or stored as well.

javanna

LGTM

javanna · 2023-01-27T17:24:26Z

the bwc test failure is one that has happened before, does not look like it's caused by your test, see #92058 . The percolator failure seems legit :)

jpountz · 2023-01-27T17:36:56Z

The percolator failure seems legit

Absolutely, it's caused by apache/lucene#12109. The test failure should go away when we upgrade to the final release.

javanna · 2023-01-31T20:25:25Z

@elasticmachine update branch

javanna · 2023-01-31T21:57:45Z

@jpountz you were right about the percolator test failure, also auto-merge for the win here :)

…astic#93165) Lucene introduced new numeric fields that index both points and doc values. This has the same semantics as indexing one field for points and another one for doc values as we did before, but covering both data structures in a single field yielded a speedup in Lucene's nightly benchmarks (see annotation [AH](http://people.apache.org/~mikemccand/lucenebench/sparseResults.html#index_throughput)) which would be interesting to get too. This commit does not switch to factory methods for queries such as `LongField#newRangeQuery` for now, we'll need to look into it in a follow-up.

jpountz added 2 commits January 23, 2023 15:57

Space consistency.

5ba0668

elasticsearchmachine added needs:triage Requires assignment of a team area label v8.7.0 labels Jan 23, 2023

jpountz added >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) labels Jan 23, 2023

elasticsearchmachine added the Team:Search Meta label for search team label Jan 23, 2023

Update docs/changelog/93165.yaml

878a389

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jan 23, 2023

benwtrent reviewed Jan 23, 2023

View reviewed changes

feedback + tests

8fcee69

benwtrent approved these changes Jan 23, 2023

View reviewed changes

jpountz added 2 commits January 24, 2023 10:22

Merge branch 'main' into lucene_new_fields

e7816e6

Test fixes.

30b301a

javanna approved these changes Jan 27, 2023

View reviewed changes

Merge branch 'main' into lucene_new_fields

81be1c1

elasticsearchmachine merged commit c21ee47 into elastic:main Jan 31, 2023

jpountz deleted the lucene_new_fields branch January 31, 2023 21:10

jpountz mentioned this pull request Feb 27, 2023

Indexing speed ups #94154

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to Lucene's new IntField/LongField/FloatField/DoubleField. #93165

Switch to Lucene's new IntField/LongField/FloatField/DoubleField. #93165

jpountz commented Jan 23, 2023

elasticsearchmachine commented Jan 23, 2023

elasticsearchmachine commented Jan 23, 2023

benwtrent left a comment

benwtrent Jan 23, 2023

jpountz Jan 23, 2023

javanna left a comment

javanna commented Jan 27, 2023

jpountz commented Jan 27, 2023

javanna commented Jan 31, 2023

javanna commented Jan 31, 2023

Switch to Lucene's new IntField/LongField/FloatField/DoubleField. #93165

Switch to Lucene's new IntField/LongField/FloatField/DoubleField. #93165

Conversation

jpountz commented Jan 23, 2023

elasticsearchmachine commented Jan 23, 2023

elasticsearchmachine commented Jan 23, 2023

benwtrent left a comment

Choose a reason for hiding this comment

benwtrent Jan 23, 2023

Choose a reason for hiding this comment

jpountz Jan 23, 2023

Choose a reason for hiding this comment

javanna left a comment

Choose a reason for hiding this comment

javanna commented Jan 27, 2023

jpountz commented Jan 27, 2023

javanna commented Jan 31, 2023

javanna commented Jan 31, 2023