Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort: Support "missing" specific handling, include _last, _first, and custom value (for string values) #896

Closed
jfiedler opened this issue May 2, 2011 · 13 comments

Comments

Projects
None yet
@jfiedler
Copy link
Contributor

commented May 2, 2011

This is a follow up to #772 that supports special sorting for 'null' values for numeric fields. The same would be very useful for (not analyzed) string fields as well. If feasible, special handling for empty and blank strings would be useful. The latter is however optional as it is possible to achieve uniform handling by filtering out blank values at indexing time.

@fabian

This comment has been minimized.

Copy link

commented Aug 17, 2011

+1

@nickhoffman

This comment has been minimized.

Copy link

commented Nov 16, 2011

This is a great suggestion, and fairly inevitable, I would imagine.

+1

@ro-ka

This comment has been minimized.

Copy link

commented Nov 29, 2011

+1

@mleglise

This comment has been minimized.

Copy link

commented Mar 2, 2012

+1

1 similar comment
@ghost

This comment has been minimized.

Copy link

commented Sep 20, 2012

+1

@ajhalani

This comment has been minimized.

Copy link

commented Dec 24, 2012

+1. Just wondering if anyone is working on this? thanks!

@kretes

This comment has been minimized.

Copy link

commented Jan 25, 2013

When this is handled - it should be for all types, e.g. dates, not just string fields

@ajhalani

This comment has been minimized.

Copy link

commented Jan 25, 2013

It supports dates already, the documentation probably needs to be updated.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Apr 4, 2013

As of 0.90.0.RC2, this is supported for numbers, dates, strings, geo-locations

@matthuhiggins

This comment has been minimized.

Copy link

commented May 9, 2013

Does anyone know if there is a way to make the default be {missing: '_last'}?

@clintongormley

This comment has been minimized.

Copy link
Member

commented May 23, 2013

Apparently I was incorrect about this being supported for strings already, so reopening this issue

@ghost ghost assigned martijnvg May 23, 2013

@bgadoury

This comment has been minimized.

Copy link

commented Jul 16, 2013

+1

1 similar comment
@tommymonk

This comment has been minimized.

Copy link

commented Jul 20, 2013

+1

@ghost ghost assigned bleskes Jul 23, 2013

@mobilutz

This comment has been minimized.

Copy link

commented Aug 5, 2013

+1

@ghost ghost assigned jpountz Aug 6, 2013

jpountz added a commit to jpountz/elasticsearch that referenced this issue Aug 26, 2013

Configurable sort order for missing string values.
This commit allows for configuring the sort order of missing values in BytesRef
comparators (used for strings) with the following options:
 - _first: missing values will appear in the first positions,
 - _last: missing values will appear in the last positions (default),
 - <any value>: documents with missing sort value will use the given value when
   sorting.

Since the default is _last, sorting by string value will have a different
behavior than in previous versions of elasticsearch which used to insert missing
value in the first positions when sorting in ascending order.

Implementation notes:
 - Nested sorting is supported through the implementation of
   NestedWrappableComparator,
 - BytesRefValComparator was mostly broken since no field data implementation
   was using it, it is now tested through NoOrdinalsStringFieldDataTests,
 - Specialized BytesRefOrdValComparators have been removed now that the ordinals
   rely on packed arrays instead of raw arrays,
 - Field data tests hierarchy has been changed so that the numeric tests don't
   inherit from the string tests anymore,
 - When _first or _last is used, internally the comparators are told to use
   null or BytesRefFieldComparatorSource.MAX_TERM to replace missing values
   (depending on the sort order),
 - BytesRefValComparator just replaces missing values with the provided value
   and uses them for comparisons,
 - BytesRefOrdValComparator multiplies ordinals by 4 so that it can find
   ordinals for the missing value and the bottom value which are directly
   comparable with the segment ordinals. For example, if the segment values and
   ordinals are (a,1) and (b,2), they will be stored internally as (a,4) and
   (b,8) and if the missing value is 'ab', it will be assigned 6 as an ordinal,
   since it is between 'a' and 'b'. Then if the bottom value is 'abc', it will
   be assigned 7 as an ordinal since if it between 'ab' and 'b'.

Closes elastic#896

@jpountz jpountz closed this in db46946 Aug 27, 2013

jpountz added a commit that referenced this issue Aug 27, 2013

Configurable sort order for missing string values.
This commit allows for configuring the sort order of missing values in BytesRef
comparators (used for strings) with the following options:
 - _first: missing values will appear in the first positions,
 - _last: missing values will appear in the last positions (default),
 - <any value>: documents with missing sort value will use the given value when
   sorting.

Since the default is _last, sorting by string value will have a different
behavior than in previous versions of elasticsearch which used to insert missing
value in the first positions when sorting in ascending order.

Implementation notes:
 - Nested sorting is supported through the implementation of
   NestedWrappableComparator,
 - BytesRefValComparator was mostly broken since no field data implementation
   was using it, it is now tested through NoOrdinalsStringFieldDataTests,
 - Specialized BytesRefOrdValComparators have been removed now that the ordinals
   rely on packed arrays instead of raw arrays,
 - Field data tests hierarchy has been changed so that the numeric tests don't
   inherit from the string tests anymore,
 - When _first or _last is used, internally the comparators are told to use
   null or BytesRefFieldComparatorSource.MAX_TERM to replace missing values
   (depending on the sort order),
 - BytesRefValComparator just replaces missing values with the provided value
   and uses them for comparisons,
 - BytesRefOrdValComparator multiplies ordinals by 4 so that it can find
   ordinals for the missing value and the bottom value which are directly
   comparable with the segment ordinals. For example, if the segment values and
   ordinals are (a,1) and (b,2), they will be stored internally as (a,4) and
   (b,8) and if the missing value is 'ab', it will be assigned 6 as an ordinal,
   since it is between 'a' and 'b'. Then if the bottom value is 'abc', it will
   be assigned 7 as an ordinal since if it between 'ab' and 'b'.

Closes #896

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Configurable sort order for missing string values.
This commit allows for configuring the sort order of missing values in BytesRef
comparators (used for strings) with the following options:
 - _first: missing values will appear in the first positions,
 - _last: missing values will appear in the last positions (default),
 - <any value>: documents with missing sort value will use the given value when
   sorting.

Since the default is _last, sorting by string value will have a different
behavior than in previous versions of elasticsearch which used to insert missing
value in the first positions when sorting in ascending order.

Implementation notes:
 - Nested sorting is supported through the implementation of
   NestedWrappableComparator,
 - BytesRefValComparator was mostly broken since no field data implementation
   was using it, it is now tested through NoOrdinalsStringFieldDataTests,
 - Specialized BytesRefOrdValComparators have been removed now that the ordinals
   rely on packed arrays instead of raw arrays,
 - Field data tests hierarchy has been changed so that the numeric tests don't
   inherit from the string tests anymore,
 - When _first or _last is used, internally the comparators are told to use
   null or BytesRefFieldComparatorSource.MAX_TERM to replace missing values
   (depending on the sort order),
 - BytesRefValComparator just replaces missing values with the provided value
   and uses them for comparisons,
 - BytesRefOrdValComparator multiplies ordinals by 4 so that it can find
   ordinals for the missing value and the bottom value which are directly
   comparable with the segment ordinals. For example, if the segment values and
   ordinals are (a,1) and (b,2), they will be stored internally as (a,4) and
   (b,8) and if the missing value is 'ab', it will be assigned 6 as an ordinal,
   since it is between 'a' and 'b'. Then if the bottom value is 'abc', it will
   be assigned 7 as an ordinal since if it between 'ab' and 'b'.

Closes elastic#896
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.