-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change numeric data types to use SORTED_NUMERIC docvalues type #6967
Conversation
instead of a custom encoding in BINARY. In low level benchmarks this is 2x to 5x faster: its also optimized for the common case where fields actually only contain at most one value for each document. Additionally SORTED_NUMERIC doesn't lose values if they appear more than once, so mathematical computations such as averages are correct.
I did a review and it looks great. One thing that I really would want to see here is a BWC test that creates & uses the numeric variants with DV on a mixed version cluster and then upgrades the cluster and checks if we are still operating fine. One way of doing this is to simply add some sorting with doubles / longs to |
Today we only do count searches to ensure sane results are returned after upgrading etc. This change adds sorting to the picture asserting on simple numeric sorting that uses field data etc. after upgrading. Relates to elastic#6967
@@ -107,7 +109,13 @@ public Builder numericType(NumericType type) { | |||
assert !numericType.isFloatingPoint(); | |||
return new NumericDVIndexFieldData(index, fieldNames, mapper.fieldDataType()); | |||
} else if (numericType != null) { | |||
return new BinaryDVNumericIndexFieldData(index, fieldNames, numericType, mapper.fieldDataType()); | |||
Version version = indexSettings.getAsVersion(IndexMetaData.SETTING_VERSION_CREATED, org.elasticsearch.Version.CURRENT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is the Version.indexCreated method that does the same with additional assertions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks I will switch!
Today we only do count searches to ensure sane results are returned after upgrading etc. This change adds sorting to the picture asserting on simple numeric sorting that uses field data etc. after upgrading. Relates to #6967
@jpountz spotted a horrible backwards break here, I'm too used to the lucene system. I will make sure back compat tests are failing with the current PR first and then update... |
I fixed the back compat: backwards tests pass now. |
LGTM @jpountz can you take another look please |
I just started. :) |
@@ -189,6 +199,8 @@ protected NumberFieldMapper(Names names, int precisionStep, float boost, FieldTy | |||
} | |||
this.ignoreMalformed = ignoreMalformed; | |||
this.coerce = coerce; | |||
Version v = indexSettings == null ? Version.CURRENT : Version.indexCreated(indexSettings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should open an issue about these null settings, that's worrying!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #6993 as a followup!
LGTM |
sorry, closed by mistake. Just reopened |
instead of a custom encoding in BINARY. In low level benchmarks this is 2x to 5x faster: its also optimized for the common case where fields actually only contain at most one value for each document. Additionally SORTED_NUMERIC doesn't lose values if they appear more than once, so mathematical computations such as averages are correct. Closes #6967
Change numeric data types to use SORTED_NUMERIC docvalues type instead of a custom encoding in BINARY.
In low level benchmarks this is 2x to 5x faster: its also optimized for the common case where fields actually only contain at most one value for each document.
Additionally SORTED_NUMERIC doesn't lose values if they appear more than once, so mathematical computations such as averages are correct.