New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wildcard field names in highlighting should only return fields that can be highlighted #11364

Merged
merged 1 commit into from May 27, 2015

Conversation

Projects
None yet
4 participants
@brwe
Contributor

brwe commented May 27, 2015

When we highlight on fields using wildcards then fields might match that cannot
be highlighted by the specified highlighter. The whole search request then
failed. Instead, check that the field can be highlighted and ignore the field
if it can't.

In addition ignore the exception thrown by plain highlighter if a field conatins
terms larger than 32766.

closes #9881

highlightFields.put(highlightField.name(), highlightField);
}
} catch (FetchPhaseExecutionException e) {
if (fieldNameContainsWildcards && e.getCause() instanceof BytesRefHash.MaxBytesLengthExceededException) {

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

is this a plain highlighter only problem? Not sure, but if so maybe we could handle it in the PlainHighlighter directly rather than in the HighlightPhase?

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

Yes makes sense. I'll make it so that this Exception will always be caught even if no wildcards are used in the field name.

@@ -26,4 +28,6 @@
String[] names();
HighlightField highlight(HighlighterContext highlighterContext);
public boolean cannotHighlight(FieldMapper fieldMapper);

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

I find the negation a bit weird, but I see why you went for it. Still, last chance to change it to canHighlight , would it be bad?

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

Ha! Just saw that @eebbrr started working on that already a year ago and did the exact same thing! You are two already so yeah, I'll rename to canHighlight.

@@ -164,6 +164,11 @@ public int compare(TextFragment o1, TextFragment o2) {
return null;
}
@Override
public boolean cannotHighlight(FieldMapper fieldMapper) {
return false;

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

as said above, the double negation here feels weird to me, but again maybe it's just me

if (highlightField != null) {
highlightFields.put(highlightField.name(), highlightField);
if (highlighter.cannotHighlight(fieldMapper) && fieldNameContainsWildcards) {

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

thinking out loud, maybe I am getting confused, but in order for a field to get highlighted, doesn't it need to be stored too or we need to have the _source at least? but metadata fields, which match * are not part of the _source hence they need to be stored or excluded from highlighting by definition. I have the feeling we should do something more to address that...

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

When source is disabled and field is not stored highlighter.highlight() just returns null for plain highlighter. This has not caused any problems. Should I add a dedicated test for fields that have no source and are not stored? (updated because the initial reply was a little confused)

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

I see, I was wonder if it even makes sense to call highlight when we know that the source is not available, but I'm forgetting custom highlighters that might want to retrieve field values from other places, so this should still be implementation dependent, I think it should stay as-is.

I think we should already have dedicated tests for fields that have no source and are not stored, can you double check?

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

I did not find any in HighlighterSearchTests.

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

oh boy... would be great to have them then

@@ -91,6 +93,7 @@ public void hitExecute(SearchContext context, HitContext hitContext) {
}
}
boolean fieldNameContainsWildcards = field.field().contains("*");

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

I think we should try and use the cannotHighlight method also when it comes to choosing the type of highlighter based on the field mapper (when no highlighter type is specified). Thoughts?

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

yeah, using the cannotHighlight methods now

This comment has been minimized.

@nik9000

nik9000 May 27, 2015

Contributor

There should be a "resolve" button in github like there is in googledocs so we can ignore long resolved recommendations.

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

totally!

@javanna

This comment has been minimized.

Member

javanna commented May 27, 2015

I left some comments ;)

@clintongormley clintongormley changed the title from highlighting: don't fail search request when name of highlighted fiel… to Wildcard field names in highlighting should only return fields that can be highlighted May 27, 2015

@brwe

This comment has been minimized.

Contributor

brwe commented May 27, 2015

@javanna thanks a lot for the review! addressed all comments. want to have another look?

if (useFastVectorHighlighter) {
highlighterType = "fvh";
} else if (fieldMapper.fieldType().indexOptions() == IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS) {
} else if (highlighters.get("postings").canHighlight(fieldMapper)) {

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

maybe overkill, but is it worth keeping a list of highlighters ordered by precedence instead? it could even allow plugins to provide their own precedence, but that's maybe too much? I am not sure, just asking

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

If anyone actually needs that maybe have another issue for it?

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

alright, that makes sense, apart from the feature for plugins, it might still be nicer to have a sorted list and go through the available highlighters, call canHighlight and highlight using the first one that returns true? I mean rather than getting them by name one by one? Do you like this idea?

This comment has been minimized.

@nik9000

nik9000 May 27, 2015

Contributor

It does seem a bit cleaner to use a list.

For what its worth my highlighter would just return true from canHighlight because its defaults let it pick its hit detection strategy based on what is in the mapping.

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

ok, added a new commit that adds a list. is that what you meant?

// if several fieldnames matched the wildcard then we want to skip those that we cannot highlight
continue;
}
logger.info("using highlighter type {} for field {}", highlighterType, fieldName);

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

debug or info? I would be more for debug

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

uh.. this is actually a leftover from debugging. I'll remove it.

@@ -126,6 +126,10 @@ public int compare(Snippet o1, Snippet o2) {
return null;
}
public boolean canHighlight(FieldMapper fieldMapper) {

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

missing @Override

@javanna

This comment has been minimized.

Member

javanna commented May 27, 2015

looks great, left a few more comments, I think this should marked as breaking as it breaks plugins that plug in custom highlighters, maybe @nik9000 would like to have a look too?

@brwe brwe added the >breaking label May 27, 2015

@nik9000

This comment has been minimized.

Contributor

nik9000 commented May 27, 2015

looks great, left a few more comments, I think this should marked as breaking as it breaks plugins that plug in custom highlighters, maybe @nik9000 would like to have a look too?

Just did.

I think its fine. I agree with your point about a sorted list being easier to read. I also think its worth moving any concerns about highlighter plugins registering themselves to the list to another issue. At this point I don't think that is really an important feature.

sortedListBuilder.add(highlighters.get("fvh"));
sortedListBuilder.add(highlighters.get("postings"));
sortedListBuilder.add(highlighters.get("plain"));
for (Map.Entry highlighter : highlighters.entrySet()) {

This comment has been minimized.

@nik9000

nik9000 May 27, 2015

Contributor

This probably isn't required - other highlighters can just be called out in the request if you want them.

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

agreed we can keep custom highlighters out of the loop for now, anyways they can currently be used only by referring to them with highlighter_type parameter if I remember correctly

This comment has been minimized.

@nik9000

nik9000 May 27, 2015

Contributor

That's right - you need to call them out explicitly.

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

ok, removed them again

@brwe

This comment has been minimized.

Contributor

brwe commented May 27, 2015

ok, changed that now to use a list. want to take another look?

@Inject
public HighlightPhase(Settings settings, Highlighters highlighters) {
super(settings);
this.highlighters = highlighters;
// TODO: would be nice if each highlighter would have a precedence and we could just sort here by this value
ImmutableList.Builder sortedListBuilder = ImmutableList.builder();

This comment has been minimized.

@nik9000

nik9000 May 27, 2015

Contributor

This is what I meant, yeah. I'd have made it a private static final ImmutableList<String> instead of Immutable<Highlighter> but it doesn't make much difference.

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

+1 that is what I would do too

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

ok, done. But not static, that seems wrong and highlighters is not static either.

@brwe

This comment has been minimized.

Contributor

brwe commented May 27, 2015

all done. want to take another look?

@Inject
public HighlightPhase(Settings settings, Highlighters highlighters) {
super(settings);
this.highlighters = highlighters;
// TODO: would be nice if each highlighter would have a precedence and we could just sort here by this value
// we could then also add custom highlighters to this list and they could be selected automatically depending on precendence

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

can you remove this TODO? I'm not sure we are going to implement this after all, nobody needs it for now :)

sortedListBuilder.add("fvh");
sortedListBuilder.add("postings");
sortedListBuilder.add("plain");
sortedHighlighters = sortedListBuilder.build();

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

don't you have generic warnings here given that you don't specify the <String> type? I would maybe do ImmutableList.<String>of("fvh","postings","plain") ? why not static out of curiosity? I think it would be ok, list is going to be the same anyway on all nodes in the same jvm?

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

ok, was still thinking about the list with the custom highlighters. but now they are removed static is fine indeed.

@@ -26,4 +28,6 @@
String[] names();
HighlightField highlight(HighlighterContext highlighterContext);
public boolean canHighlight(FieldMapper fieldMapper);

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

you can drop public here

@@ -45,4 +47,8 @@ public Highlighter get(String type) {
return parsers.get(type);
}
public ImmutableSet<Map.Entry<String, Highlighter>> entrySet() {

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

is this still needed?

@javanna

This comment has been minimized.

Member

javanna commented May 27, 2015

done another review, few more comments :)

@brwe

This comment has been minimized.

Contributor

brwe commented May 27, 2015

pushed another commit

@@ -63,7 +63,7 @@ public HighlightField highlight(HighlighterContext highlighterContext) {
FetchSubPhase.HitContext hitContext = highlighterContext.hitContext;
FieldMapper mapper = highlighterContext.mapper;
if (!(mapper.fieldType().storeTermVectors() && mapper.fieldType().storeTermVectorOffsets() && mapper.fieldType().storeTermVectorPositions())) {
if (canHighlight(mapper) == false) {

This comment has been minimized.

@nik9000

nik9000 May 27, 2015

Contributor

!canHighlight(mapper) ?

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

the == false is done on purpose to make these comparisons more explicit

This comment has been minimized.

@nik9000

nik9000 May 27, 2015

Contributor

Fair enough.

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

I get the comment usually the other way round with the argument that '== false' is much easier to read and a '!' is easy to miss so I'd rather leave it this way.

if (highlighters.get(highlighterCandidate).canHighlight(fieldMapper)) {
highlighterType = highlighterCandidate;
break;
}
}

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

can we add an assert to make sure that highlighterType != null here? it really should since we know that plain highlighter always returns true, but the assert would make it more explicit that it is expected

This comment has been minimized.

@nik9000

nik9000 May 27, 2015

Contributor

Makes sense to me. The random null pointer exception you'd get later on if this went wrong would be unpleasant to users. Probably best to use an explicit check rather than an assert.

This comment has been minimized.

@brwe

brwe May 27, 2015

Contributor

I'd rather make it an assertion. it is an internal thing and should not happen at all. I have trouble coming up with a proper message that would even explain what is happening to a user...

@@ -46,6 +45,7 @@
public class HighlightPhase extends AbstractComponent implements FetchSubPhase {
private final Highlighters highlighters;
private static final ImmutableList<String> sortedHighlighters = ImmutableList.of("fvh", "postings", "plain");

This comment has been minimized.

@javanna

javanna May 27, 2015

Member

rename to STANDARD_HIGHLIGHTERS_BY_PRECEDENCE or something? I would like to remark in the name that they are sorted by precedence and its only the standard highlighters, not custom ones (maybe a comment would do it too).

This comment has been minimized.

@nik9000

nik9000 May 27, 2015

Contributor

Yeah - at least needs the ALL_UPPER_CASE naming. Probably should be moved to the top of the class too because that's where I usually look for static stuff.

@javanna

This comment has been minimized.

Member

javanna commented May 27, 2015

LGTM besides the two very minor comments I left, thanks for taking care of this @brwe !

@nik9000

This comment has been minimized.

Contributor

nik9000 commented May 27, 2015

LGTM besides the two very minor comments I left, thanks for taking care of this @brwe !

I left another distinct minor comment but yeah, its great! This'll make highlighting so much less blowy-upy.

@brwe

This comment has been minimized.

Contributor

brwe commented May 27, 2015

addressed all comments

@javanna

This comment has been minimized.

Member

javanna commented May 27, 2015

LGTM

highlighting: don't fail search request when name of highlighted fiel…
…d contains wildcards

When we highlight on fields using wildcards then fields might match that cannot
be highlighted by the specified highlighter. The whole search request then
failed. Instead, check that the field can be highlighted and ignore the field
if it can't.
In addition ignore the exception thrown by plain highlighter if a field conatins
terms larger than 32766.

closes #9881

brwe added a commit that referenced this pull request May 27, 2015

Merge pull request #11364 from brwe/highlighter-wildcard
Wildcard field names in highlighting should only return fields that can be highlighted

@brwe brwe merged commit ceb0782 into elastic:master May 27, 2015

1 check passed

CLA Commit author is a member of Elasticsearch
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment