New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phrase Suggester: Add option to filter out phrase suggestions #6773
Conversation
…ching any documents for a given query The newly added filter option will let the user provide a template query which will be executed for every phrase suggestions generated to ensure that the suggestion matches at least one document for the query. The filter query is only executed on the local node for now. When the new filter option is used, the size of the suggestion is restricted to 20. Closes elastic#3482
} | ||
CompiledScript compiledScript = suggester.scriptService().compile("mustache", templateNameOrTemplateContent); | ||
suggestion.setFilterQueryScript(compiledScript); | ||
if (suggestion.getSize() >= MAX_RESULT_SIZE_FILTER) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we should do this.... we don't introduce limits elsewhere though let them figure that out themself :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed limits check
I like this PR a lot. left a bunch of comments |
Updated PR; incorporated feedback and Thanks for the review! |
/** | ||
* Routing Preference Type | ||
*/ | ||
public final class PreferenceType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd name it just Preference
...maybe we can make this an enum and have a parse method on it like:
public Preference parse(String string) {
switch (valueBeforeColon(string)) {
case "_shards":
return SHARDS;
case "_prefer_node":
case "_preferNode":
return PREFER_NODE;
//...
}
}
/**
* this splits off the values ie in _prefer_node / _shards*/
public abstract String values();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to Preference
and changed it to an enum, but it seems the way I have it currently, there is more boiler plate added rather then being a clean solution. Examples:
Preference.ONLY_LOCAL.type()
is needed for the string representation of '_local_only'.
The values() business feels wrong on the enum, some options have to have values others don't and its not obvious wants happening.
I got rid of all the changes to the unrelated files for Preference
, but still left the implementation for the feedback and how to move forward with it.
I left some more comments. This looks great though. I'd like to get @clintongormley to give some feedback to make sure the REST interface is good |
… API; Added support for accepting filters and template params
I have incorporated the feedback (would like to get some comments on the After discussion on the REST interface, this is what it looks like now: curl -XPOST 'localhost:9200/_search' -d {
"suggest": {
"text": "Xor the Got-Jewel",
"simple_phrase": {
"phrase": {
"field": "body",
"size": 5,
"shard_size": 10,
"confidence": 2,
"collate": {
"query": {
"body": "{{suggestion}}"
},
"preference": "_primary"
}
}
}
}
}
|
case "_onlyLocal": | ||
return ONLY_LOCAL; | ||
case "_only_node": | ||
return ONLY_NODE.setValue(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default case should throw an exception instead of return null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
I left some cosmetic comments. I think this looks very good @clintongormley WDYT... In a second PR we could add the ability to just mark the results if the query / filter matched or not instead of filtering that might help folks to see what is going on. |
@@ -161,6 +161,54 @@ can contain misspellings (See parameter descriptions below). | |||
in a row are changed the entire phrase of changed tokens | |||
is wrapped rather than each token. | |||
|
|||
`collate`:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a suggestion to make the docs slightly easier to read:
`collate`::
Checks each suggestion against the specified `query` or `filter` to
prune suggestions for which not matching docs exist in the index. Either
a `query` or a `filter` must be specified, and it is run as a <<query-
dsl-template-query,`template` query>>. The current suggestion is
automatically made available as the `{{suggestion}}` variable, which
should be used in your query/filter. You can still specify your own
template `params` -- the `suggestion` value will be added to the
variables you specify. You can also specify a `preference` to control
on which shards the query is executed (see <<search-request-preference).
The default value is `_only_local`.
+
[source,js]
--------------------------------------------------
curl -XPOST 'localhost:9200/_search' -d {
"suggest" : {
"text" : "Xor the Got-Jewel",
"simple_phrase" : {
"phrase" : {
"field" : "bigram",
"size" : 1,
"direct_generator" : [ {
"field" : "body",
"suggest_mode" : "always",
"min_word_length" : 1
} ],
"collate": {
"query": { <1>
"match": {
"{{field_name}}" : "{{suggestion}}" <2>
}
},
"params": {"field_name" : "title"}, <3>
"preference": "_primary", <4>
}
}
}
}
}
--------------------------------------------------
<1> This query will be run once for every suggestion.
<2> The `{{suggestion}}` variable will be replaced by the text
of each suggestion.
<3> An additional `field_name` variable has been specified in
`params` and is used by the `match` query.
<4> The default `preference` has been changed to `_primary`.
I also removed most of the options from the suggest request which are not relevant to collate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @clintongormley, I have updated the docs
Suggested some doc changes, but otherwise LGTM |
I have updated the PR with the feedback. I think the enhancement is ready but still not happy with the state of |
I didn't see a comment, what are you not happy with? this LGTM actually and I think we should move forward with it unless the issue is major which I didn't see yet? |
It seems the way I have Preference.ONLY_LOCAL.type() is needed for '_local_only'. The values() business feels wrong on the enum, some options have to have values others don't and its not obvious wants happening. That was my concern, I think if we plan to have it as is, I will atleast want to use the new |
Updated PR:
|
} | ||
if ("_local".equals(preference)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think now that we have an enum we should just use a switch / case statement here?
LGTM please squash and push |
The newly added filter option will let the user provide a template query which will be executed for every
phrase suggestions generated to ensure that the suggestion matches at least one document for the query.
The filter query is only executed on the local node for now. When the new filter option is used, the
size of the suggestion is restricted to 20.
Closes #3482