Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fielddata: Remove custom comparators and use Lucene's instead #6981

Closed
wants to merge 3 commits into from

Conversation

Projects
None yet
4 participants
@jpountz
Copy link
Contributor

commented Jul 23, 2014

This commit removes custom comparators in favor of the ones that are in Lucene.

The major change is for nested documents: instead of having a comparator wrapper
that deals with nested documents, this is done at the fielddata level by having
a selector that returns the value to use for comparison.

Sorting with custom missing string values might be slower since it is using
TermValComparator since Lucene's TermOrdValComparator only supports sorting
missing values first or last. But other than this particular case, this change
will allow us to benefit from improvements on comparators from the Lucene side.

Close #5980

Fielddata: goodbye comparators.
This commit removes custom comparators in favor of the ones that are in Lucene.

The major change is for nested documents: instead of having a comparator wrapper
that deals with nested documents, this is done at the fielddata level by having
a selector that returns the value to use for comparison.

Sorting with custom missing string values might be slower since it is using
TermValComparator since Lucene's TermOrdValComparator only supports sorting
missing values first or last. But other than this particular case, this change
will allow us to benefit from improvements on comparators from the Lucene side.

Close #5980

@jpountz jpountz added v2.0.0 labels Jul 23, 2014

* parent + 1, or 0 if there is no previous parent, and R (excluded).
*/
// TODO: better name
public static class NestedLayout {

This comment has been minimized.

Copy link
@s1monw

s1monw Jul 23, 2014

Contributor

just call it Nested maybe?

}

final IndexFieldData.XFieldComparatorSource fieldComparatorSource;
if ("string".equals(type)) {

This comment has been minimized.

Copy link
@s1monw

s1monw Jul 23, 2014

Contributor

should we use constants here?

This comment has been minimized.

Copy link
@s1monw

s1monw Jul 23, 2014

Contributor

oh java 7 supports switch case on strings btw.

}
}
}
if (cmpValue == null) {
cmpValue = missingValue;
if ("_first".equals(missingValue)) {

This comment has been minimized.

Copy link
@s1monw

s1monw Jul 23, 2014

Contributor

maybe use switch / case here too

This comment has been minimized.

Copy link
@jpountz

jpountz Jul 23, 2014

Author Contributor

unfortunately missingValue is an Object since it can also be a BytesRef in case of custom missing value

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2014

I left a bunch of cosmetic comments - this looks very clean and a nice reduction of complexity :)

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Jul 23, 2014

@s1monw pushed a new commit to address your comments

} else {
final FixedBitSet fixedBitSet = new FixedBitSet(maxDoc);
fixedBitSet.or(set.iterator());
return fixedBitSet;

This comment has been minimized.

Copy link
@rmuir

rmuir Jul 23, 2014

Contributor

can we deprecate the method so it doesnt spread? If it spreads somewhere else, then it would need more logic to handle all the cases (e.g. DISI.iterator() == null etc)

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2014

LGTM maybe @rmuir can take another look?

return new BytesRefOrdValComparator((IndexOrdinalsFieldData) indexFieldData, numHits, sortMode, missingBytes);
// The ordinal-based comparator only supports sorting missing values first or last so when
// a missing value is provided we fall back to the (slow) value-based comparator
if (sortMissingFirst(missingValue) || sortMissingLast(missingValue)) {

This comment has been minimized.

Copy link
@rmuir

rmuir Jul 23, 2014

Contributor

Can you add a TODO/followup here to just make SortedDocValues "view" that exposes 'missing' as arbitrary value? I can work on it, just so its always fast...

private final int missingSortCmp;

// TODO: add missing first/last support here?

This comment has been minimized.

Copy link
@rmuir

rmuir Jul 23, 2014

Contributor

mental note to remove this outdated TODO in lucene

return new DoubleValuesComparator(indexFieldData, dMissingValue, numHits, sortMode);
// NOTE: it's important to pass null as a missing value in the constructor so that
// the comparator doesn't check docsWithField
return new FieldComparator.DoubleComparator(numHits, null, null, null) {

This comment has been minimized.

Copy link
@rmuir

rmuir Jul 23, 2014

Contributor

maybe just extend the note to say: doesn't check docsWithField since we replace missing values in select() ?
Took me a while to figure out why this was safe

final Long dMissingValue = (Long) missingObject(missingValue, reversed);
// NOTE: it's important to pass null as a missing value in the constructor so that
// the comparator doesn't check docsWithField
return new FieldComparator.LongComparator(numHits, null, null, null) {

This comment has been minimized.

Copy link
@rmuir

rmuir Jul 23, 2014

Contributor

just looking at this one again, we (lucene) have a little trap in that things are setup to use fieldcache in the 4.x branch unless docvalues are present. This is fixed in trunk totally, but for now, maybe we should have something in the tests that asserts we are never accidentally using fieldcache? just a check in tearDown() that asserts FieldCache.getCacheEntries().length == 0 or something? Or maybe we already have this?

} else {
accumulated = apply(accumulated, clone);
}
}

This comment has been minimized.

Copy link
@rmuir

rmuir Jul 23, 2014

Contributor

should apply() do the deepCopyOf only for the one it needs to return?

This comment has been minimized.

Copy link
@rmuir

rmuir Jul 23, 2014

Contributor

or maybe we just only clone if accumulated == innerValue after calling apply?

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2014

Awesome, this looks great. I just added minor questions/comments/TODOs, but i think its ready.

@jpountz

This comment has been minimized.

Copy link
Contributor Author

commented Jul 23, 2014

@rmuir pushed a new commit

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Jul 23, 2014

+1

@jpountz jpountz closed this Jul 23, 2014

@jpountz jpountz deleted the jpountz:fix/comparators branch Jul 23, 2014

@jpountz jpountz removed review labels Jul 24, 2014

@jpountz jpountz changed the title Fielddata: goodbye comparators. Fielddata: remove custom comparators and use Lucene's. Jul 24, 2014

@clintongormley clintongormley changed the title Fielddata: remove custom comparators and use Lucene's. Fielddata: Remove custom comparators and use Lucene's instead Sep 10, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.