Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change numeric data types to use SORTED_NUMERIC docvalues type #6967

Closed
wants to merge 6 commits into from

Conversation

Projects
None yet
4 participants
@rmuir
Copy link
Contributor

commented Jul 22, 2014

Change numeric data types to use SORTED_NUMERIC docvalues type instead of a custom encoding in BINARY.

In low level benchmarks this is 2x to 5x faster: its also optimized for the common case where fields actually only contain at most one value for each document.

Additionally SORTED_NUMERIC doesn't lose values if they appear more than once, so mathematical computations such as averages are correct.

Change numeric data types to use SORTED_NUMERIC docvalues type
instead of a custom encoding in BINARY.

In low level benchmarks this is 2x to 5x faster: its also optimized
for the common case where fields actually only contain at most one
value for each document.

Additionally SORTED_NUMERIC doesn't lose values if they appear more
than once, so mathematical computations such as averages are correct.

@rmuir rmuir added v1.4.0 labels Jul 22, 2014

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2014

I did a review and it looks great. One thing that I really would want to see here is a BWC test that creates & uses the numeric variants with DV on a mixed version cluster and then upgrades the cluster and checks if we are still operating fine. One way of doing this is to simply add some sorting with doubles / longs to BasicBackwardsCompatibilityTest#testIndexRollingUpgrade as well as BasicBackwardsCompatibilityTest#testIndexAndSearch the dynamic index template should take care of randomly selecting docvalues there so simple sorting or faceting should do the job

@s1monw s1monw removed the review label Jul 22, 2014

s1monw added a commit to s1monw/elasticsearch that referenced this pull request Jul 22, 2014

[TEST] Add simple sort assertions for bwc tests
Today we only do count searches to ensure sane results are returned
after upgrading etc. This change adds sorting to the picture asserting
on simple numeric sorting that uses field data etc. after upgrading.

Relates to elastic#6967
@@ -107,7 +109,13 @@ public Builder numericType(NumericType type) {
assert !numericType.isFloatingPoint();
return new NumericDVIndexFieldData(index, fieldNames, mapper.fieldDataType());
} else if (numericType != null) {
return new BinaryDVNumericIndexFieldData(index, fieldNames, numericType, mapper.fieldDataType());
Version version = indexSettings.getAsVersion(IndexMetaData.SETTING_VERSION_CREATED, org.elasticsearch.Version.CURRENT);

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 22, 2014

Contributor

There is the Version.indexCreated method that does the same with additional assertions

This comment has been minimized.

Copy link
@rmuir

rmuir Jul 22, 2014

Author Contributor

thanks I will switch!

default:
assert !numericType.isFloatingPoint();
return new LongValuesComparatorSource(this, missingValue, sortMode);
}

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 22, 2014

Contributor

Do you think we would gain something by using directly the long bits to sort? (in the future, we need to switch to the lucene comparators first)

This comment has been minimized.

Copy link
@rmuir

rmuir Jul 22, 2014

Author Contributor

I'm not sure: we should benchmark that it actually would matter, because it would introduce some complexity. The lucene comparators don't even do this, they do the same thing being done here at the moment for float and double types.

s1monw added a commit that referenced this pull request Jul 22, 2014

[TEST] Add simple sort assertions for bwc tests
Today we only do count searches to ensure sane results are returned
after upgrading etc. This change adds sorting to the picture asserting
on simple numeric sorting that uses field data etc. after upgrading.

Relates to #6967
@rmuir

This comment has been minimized.

Copy link
Contributor Author

commented Jul 23, 2014

@jpountz spotted a horrible backwards break here, I'm too used to the lucene system. I will make sure back compat tests are failing with the current PR first and then update...

@rmuir rmuir added the review label Jul 23, 2014

@rmuir

This comment has been minimized.

Copy link
Contributor Author

commented Jul 23, 2014

I fixed the back compat: backwards tests pass now.

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2014

LGTM @jpountz can you take another look please

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2014

I just started. :)

@@ -189,6 +199,8 @@ protected NumberFieldMapper(Names names, int precisionStep, float boost, FieldTy
}
this.ignoreMalformed = ignoreMalformed;
this.coerce = coerce;
Version v = indexSettings == null ? Version.CURRENT : Version.indexCreated(indexSettings);

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 23, 2014

Contributor

I think we should open an issue about these null settings, that's worrying!

This comment has been minimized.

Copy link
@rmuir

rmuir Jul 23, 2014

Author Contributor

I opened #6993 as a followup!

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2014

LGTM

@jpountz jpountz closed this Jul 23, 2014

@jpountz jpountz reopened this Jul 23, 2014

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2014

sorry, closed by mistake. Just reopened

@rmuir rmuir closed this in 66825ac Jul 23, 2014

rmuir added a commit that referenced this pull request Jul 23, 2014

Change numeric data types to use SORTED_NUMERIC docvalues type
instead of a custom encoding in BINARY.

In low level benchmarks this is 2x to 5x faster: its also optimized
for the common case where fields actually only contain at most one
value for each document.

Additionally SORTED_NUMERIC doesn't lose values if they appear more
than once, so mathematical computations such as averages are correct.

Closes #6967

@jpountz jpountz removed the review label Jul 24, 2014

@clintongormley clintongormley changed the title Change numeric data types to use SORTED_NUMERIC docvalues type Internal: Change numeric data types to use SORTED_NUMERIC docvalues type Sep 8, 2014

@clintongormley clintongormley changed the title Internal: Change numeric data types to use SORTED_NUMERIC docvalues type Change numeric data types to use SORTED_NUMERIC docvalues type Jun 7, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.