Change numeric data types to use SORTED_NUMERIC docvalues type #6967

rmuir · 2014-07-22T18:49:13Z

Change numeric data types to use SORTED_NUMERIC docvalues type instead of a custom encoding in BINARY.

In low level benchmarks this is 2x to 5x faster: its also optimized for the common case where fields actually only contain at most one value for each document.

Additionally SORTED_NUMERIC doesn't lose values if they appear more than once, so mathematical computations such as averages are correct.

instead of a custom encoding in BINARY. In low level benchmarks this is 2x to 5x faster: its also optimized for the common case where fields actually only contain at most one value for each document. Additionally SORTED_NUMERIC doesn't lose values if they appear more than once, so mathematical computations such as averages are correct.

s1monw · 2014-07-22T19:26:55Z

I did a review and it looks great. One thing that I really would want to see here is a BWC test that creates & uses the numeric variants with DV on a mixed version cluster and then upgrades the cluster and checks if we are still operating fine. One way of doing this is to simply add some sorting with doubles / longs to BasicBackwardsCompatibilityTest#testIndexRollingUpgrade as well as BasicBackwardsCompatibilityTest#testIndexAndSearch the dynamic index template should take care of randomly selecting docvalues there so simple sorting or faceting should do the job

Today we only do count searches to ensure sane results are returned after upgrading etc. This change adds sorting to the picture asserting on simple numeric sorting that uses field data etc. after upgrading. Relates to elastic#6967

jpountz · 2014-07-22T20:36:04Z

src/main/java/org/elasticsearch/index/fielddata/plain/DocValuesIndexFieldData.java

@@ -107,7 +109,13 @@ public Builder numericType(NumericType type) {
                assert !numericType.isFloatingPoint();
                return new NumericDVIndexFieldData(index, fieldNames, mapper.fieldDataType());
            } else if (numericType != null) {
-                return new BinaryDVNumericIndexFieldData(index, fieldNames, numericType, mapper.fieldDataType());
+                Version version = indexSettings.getAsVersion(IndexMetaData.SETTING_VERSION_CREATED, org.elasticsearch.Version.CURRENT);


There is the Version.indexCreated method that does the same with additional assertions

thanks I will switch!

Today we only do count searches to ensure sane results are returned after upgrading etc. This change adds sorting to the picture asserting on simple numeric sorting that uses field data etc. after upgrading. Relates to #6967

rmuir · 2014-07-23T03:29:06Z

@jpountz spotted a horrible backwards break here, I'm too used to the lucene system. I will make sure back compat tests are failing with the current PR first and then update...

rmuir · 2014-07-23T16:22:41Z

I fixed the back compat: backwards tests pass now.

s1monw · 2014-07-23T16:43:30Z

LGTM @jpountz can you take another look please

jpountz · 2014-07-23T16:43:55Z

I just started. :)

jpountz · 2014-07-23T16:50:02Z

src/main/java/org/elasticsearch/index/mapper/core/NumberFieldMapper.java

@@ -189,6 +199,8 @@ protected NumberFieldMapper(Names names, int precisionStep, float boost, FieldTy
        }
        this.ignoreMalformed = ignoreMalformed;
        this.coerce = coerce;
+        Version v = indexSettings == null ? Version.CURRENT : Version.indexCreated(indexSettings);


I think we should open an issue about these null settings, that's worrying!

I opened #6993 as a followup!

jpountz · 2014-07-23T16:50:57Z

LGTM

jpountz · 2014-07-23T16:51:24Z

sorry, closed by mistake. Just reopened

instead of a custom encoding in BINARY. In low level benchmarks this is 2x to 5x faster: its also optimized for the common case where fields actually only contain at most one value for each document. Additionally SORTED_NUMERIC doesn't lose values if they appear more than once, so mathematical computations such as averages are correct. Closes #6967

rmuir added v1.4.0 labels Jul 22, 2014

s1monw removed the review label Jul 22, 2014

s1monw mentioned this pull request Jul 22, 2014

Test: Add simple sort assertions for bwc tests #6968

Merged

jpountz reviewed Jul 22, 2014
View reviewed changes

rmuir added 2 commits July 23, 2014 06:48

Merge branch 'master' into sortednumerics

8450e3a

use indexCreated macro and add correct back compat

231be2d

rmuir added the review label Jul 23, 2014

jpountz reviewed Jul 23, 2014
View reviewed changes

jpountz closed this Jul 23, 2014

jpountz reopened this Jul 23, 2014

rmuir mentioned this pull request Jul 23, 2014

Investigate null Settings in mappers #6993

Closed

rmuir added 3 commits July 23, 2014 13:38

Merge branch 'master' into sortednumerics

1af6b46

Merge branch 'master' into sortednumerics

d4a1b5c

syncup to comparator api

e58f9d5

rmuir closed this in 66825ac Jul 23, 2014

jpountz removed the review label Jul 24, 2014

clintongormley changed the title ~~Change numeric data types to use SORTED_NUMERIC docvalues type~~ Internal: Change numeric data types to use SORTED_NUMERIC docvalues type Sep 8, 2014

clintongormley added the >enhancement label Sep 8, 2014

clintongormley added the :Core/Infra/Core Core issues without another label label Jun 7, 2015

clintongormley changed the title ~~Internal: Change numeric data types to use SORTED_NUMERIC docvalues type~~ Change numeric data types to use SORTED_NUMERIC docvalues type Jun 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change numeric data types to use SORTED_NUMERIC docvalues type #6967

Change numeric data types to use SORTED_NUMERIC docvalues type #6967

rmuir commented Jul 22, 2014

s1monw commented Jul 22, 2014

jpountz Jul 22, 2014

rmuir Jul 22, 2014

rmuir commented Jul 23, 2014

rmuir commented Jul 23, 2014

s1monw commented Jul 23, 2014

jpountz commented Jul 23, 2014

jpountz Jul 23, 2014

rmuir Jul 23, 2014

jpountz commented Jul 23, 2014

jpountz commented Jul 23, 2014

Change numeric data types to use SORTED_NUMERIC docvalues type #6967

Change numeric data types to use SORTED_NUMERIC docvalues type #6967

Conversation

rmuir commented Jul 22, 2014

s1monw commented Jul 22, 2014

jpountz Jul 22, 2014

Choose a reason for hiding this comment

rmuir Jul 22, 2014

Choose a reason for hiding this comment

rmuir commented Jul 23, 2014

rmuir commented Jul 23, 2014

s1monw commented Jul 23, 2014

jpountz commented Jul 23, 2014

jpountz Jul 23, 2014

Choose a reason for hiding this comment

rmuir Jul 23, 2014

Choose a reason for hiding this comment

jpountz commented Jul 23, 2014

jpountz commented Jul 23, 2014