Sort: Support "missing" specific handling, include _last, _first, and custom value (for string values) #896

Closed
jfiedler opened this Issue May 2, 2011 · 13 comments

Comments

Projects
None yet
Contributor

jfiedler commented May 2, 2011

This is a follow up to #772 that supports special sorting for 'null' values for numeric fields. The same would be very useful for (not analyzed) string fields as well. If feasible, special handling for empty and blank strings would be useful. The latter is however optional as it is possible to achieve uniform handling by filtering out blank values at indexing time.

fabian commented Aug 17, 2011

+1

This is a great suggestion, and fairly inevitable, I would imagine.

+1

ro-ka commented Nov 29, 2011

+1

mleglise commented Mar 2, 2012

+1

@ghost

ghost commented Sep 20, 2012

+1

+1. Just wondering if anyone is working on this? thanks!

kretes commented Jan 25, 2013

When this is handled - it should be for all types, e.g. dates, not just string fields

It supports dates already, the documentation probably needs to be updated.

Owner

clintongormley commented Apr 4, 2013

As of 0.90.0.RC2, this is supported for numbers, dates, strings, geo-locations

Does anyone know if there is a way to make the default be {missing: '_last'}?

Owner

clintongormley commented May 23, 2013

Apparently I was incorrect about this being supported for strings already, so reopening this issue

@ghost ghost assigned martijnvg May 23, 2013

+1

+1

@ghost ghost assigned bleskes Jul 23, 2013

mobilutz commented Aug 5, 2013

+1

@ghost ghost assigned jpountz Aug 6, 2013

jpountz added a commit to jpountz/elasticsearch that referenced this issue Aug 26, 2013

Configurable sort order for missing string values.
This commit allows for configuring the sort order of missing values in BytesRef
comparators (used for strings) with the following options:
 - _first: missing values will appear in the first positions,
 - _last: missing values will appear in the last positions (default),
 - <any value>: documents with missing sort value will use the given value when
   sorting.

Since the default is _last, sorting by string value will have a different
behavior than in previous versions of elasticsearch which used to insert missing
value in the first positions when sorting in ascending order.

Implementation notes:
 - Nested sorting is supported through the implementation of
   NestedWrappableComparator,
 - BytesRefValComparator was mostly broken since no field data implementation
   was using it, it is now tested through NoOrdinalsStringFieldDataTests,
 - Specialized BytesRefOrdValComparators have been removed now that the ordinals
   rely on packed arrays instead of raw arrays,
 - Field data tests hierarchy has been changed so that the numeric tests don't
   inherit from the string tests anymore,
 - When _first or _last is used, internally the comparators are told to use
   null or BytesRefFieldComparatorSource.MAX_TERM to replace missing values
   (depending on the sort order),
 - BytesRefValComparator just replaces missing values with the provided value
   and uses them for comparisons,
 - BytesRefOrdValComparator multiplies ordinals by 4 so that it can find
   ordinals for the missing value and the bottom value which are directly
   comparable with the segment ordinals. For example, if the segment values and
   ordinals are (a,1) and (b,2), they will be stored internally as (a,4) and
   (b,8) and if the missing value is 'ab', it will be assigned 6 as an ordinal,
   since it is between 'a' and 'b'. Then if the bottom value is 'abc', it will
   be assigned 7 as an ordinal since if it between 'ab' and 'b'.

Closes #896

@jpountz jpountz closed this in db46946 Aug 27, 2013

jpountz added a commit that referenced this issue Aug 27, 2013

Configurable sort order for missing string values.
This commit allows for configuring the sort order of missing values in BytesRef
comparators (used for strings) with the following options:
 - _first: missing values will appear in the first positions,
 - _last: missing values will appear in the last positions (default),
 - <any value>: documents with missing sort value will use the given value when
   sorting.

Since the default is _last, sorting by string value will have a different
behavior than in previous versions of elasticsearch which used to insert missing
value in the first positions when sorting in ascending order.

Implementation notes:
 - Nested sorting is supported through the implementation of
   NestedWrappableComparator,
 - BytesRefValComparator was mostly broken since no field data implementation
   was using it, it is now tested through NoOrdinalsStringFieldDataTests,
 - Specialized BytesRefOrdValComparators have been removed now that the ordinals
   rely on packed arrays instead of raw arrays,
 - Field data tests hierarchy has been changed so that the numeric tests don't
   inherit from the string tests anymore,
 - When _first or _last is used, internally the comparators are told to use
   null or BytesRefFieldComparatorSource.MAX_TERM to replace missing values
   (depending on the sort order),
 - BytesRefValComparator just replaces missing values with the provided value
   and uses them for comparisons,
 - BytesRefOrdValComparator multiplies ordinals by 4 so that it can find
   ordinals for the missing value and the bottom value which are directly
   comparable with the segment ordinals. For example, if the segment values and
   ordinals are (a,1) and (b,2), they will be stored internally as (a,4) and
   (b,8) and if the missing value is 'ab', it will be assigned 6 as an ordinal,
   since it is between 'a' and 'b'. Then if the bottom value is 'abc', it will
   be assigned 7 as an ordinal since if it between 'ab' and 'b'.

Closes #896

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Configurable sort order for missing string values.
This commit allows for configuring the sort order of missing values in BytesRef
comparators (used for strings) with the following options:
 - _first: missing values will appear in the first positions,
 - _last: missing values will appear in the last positions (default),
 - <any value>: documents with missing sort value will use the given value when
   sorting.

Since the default is _last, sorting by string value will have a different
behavior than in previous versions of elasticsearch which used to insert missing
value in the first positions when sorting in ascending order.

Implementation notes:
 - Nested sorting is supported through the implementation of
   NestedWrappableComparator,
 - BytesRefValComparator was mostly broken since no field data implementation
   was using it, it is now tested through NoOrdinalsStringFieldDataTests,
 - Specialized BytesRefOrdValComparators have been removed now that the ordinals
   rely on packed arrays instead of raw arrays,
 - Field data tests hierarchy has been changed so that the numeric tests don't
   inherit from the string tests anymore,
 - When _first or _last is used, internally the comparators are told to use
   null or BytesRefFieldComparatorSource.MAX_TERM to replace missing values
   (depending on the sort order),
 - BytesRefValComparator just replaces missing values with the provided value
   and uses them for comparisons,
 - BytesRefOrdValComparator multiplies ordinals by 4 so that it can find
   ordinals for the missing value and the bottom value which are directly
   comparable with the segment ordinals. For example, if the segment values and
   ordinals are (a,1) and (b,2), they will be stored internally as (a,4) and
   (b,8) and if the missing value is 'ab', it will be assigned 6 as an ordinal,
   since it is between 'a' and 'b'. Then if the bottom value is 'abc', it will
   be assigned 7 as an ordinal since if it between 'ab' and 'b'.

Closes #896
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment