Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Exact Match option to Suggest #15410

Closed
wants to merge 4 commits into from

Conversation

Projects
None yet
6 participants
@ricardocerq
Copy link

commented Dec 13, 2015

Solves #11579.

Added option to term suggest that returns the input term if it exists. Turned off by default so it doesn't break normal behaviour.

The example query given in #11579 would now be:

"Suggest": {
      "term": {
        "field": "word",
        "suggest_mode": "popular",
        "size": 2,
        "prefix_len": 1,
        "analyzer": "default",
        "exact_matching": true
      },
      "text": "Software"
    }

And the result:

"Suggest": [
      {
         "text": "software",
         "offset": 0,
         "length": 8,
         "options": [
            {
               "text": "software",
               "score": 1,
               "freq": 1
            }
         ]
      }
   ]

The output now clearly indicates if the term is present. If it is, its score is always 1, representing an exact match, which differentiates this result from other suggestions.

@clintongormley

This comment has been minimized.

Copy link
Member

commented Dec 14, 2015

@areek could you have a look at this please.
@ricardocerq you'll also need a test in the PR before it will be merged.

@@ -56,7 +59,10 @@ public TermSuggestion innerExecute(String name, TermSuggestionContext suggestion
);
Text key = new BytesText(new BytesArray(token.term.bytes()));
TermSuggestion.Entry resultEntry = new TermSuggestion.Entry(key, token.startOffset, token.endOffset - token.startOffset);
for (SuggestWord suggestWord : suggestedWords) {
if(suggestion.getDirectSpellCheckerSettings().exactMatch()){
addExactMatch( suggestion, indexReader, token, resultEntry);

This comment has been minimized.

Copy link
@areek

areek Dec 14, 2015

Contributor

I think we can remove addExactMatch method altogether and just use the TermsEnum#seekExact to see if the term is present in the term dictionary:

final TermsEnum termsEnum = MultiFields.getTerms(indexReader, token.term.field()).iterator();
if (termsEnum.seekExact(token.term.bytes())) {
    Text word = new StringText(token.term.text());
    resultEntry.addOption(new TermSuggestion.Entry.Option(word, termsEnum.docFreq(), 1f));
}
@@ -56,7 +59,10 @@ public TermSuggestion innerExecute(String name, TermSuggestionContext suggestion
);
Text key = new BytesText(new BytesArray(token.term.bytes()));
TermSuggestion.Entry resultEntry = new TermSuggestion.Entry(key, token.startOffset, token.endOffset - token.startOffset);
for (SuggestWord suggestWord : suggestedWords) {
if(suggestion.getDirectSpellCheckerSettings().exactMatch()){

This comment has been minimized.

Copy link
@areek

areek Dec 14, 2015

Contributor

code formatting:
if (suggestion.getDirectSpellCheckerSettings().exactMatch()) {
instead of if(suggestion.getDirectSpellCheckerSettings().exactMatch()){.

@@ -97,5 +103,36 @@ private Token(Term term, int startOffset, int endOffset) {
}

}

private void addExactMatch(TermSuggestionContext suggestion, IndexReader indexReader, Token token, TermSuggestion.Entry resultEntry) throws IOException{

This comment has been minimized.

Copy link
@areek

areek Dec 14, 2015

Contributor

See the comment about using TermsEnum#seekExact instead of manually iterating through the term dictionary.

@@ -108,3 +108,6 @@ doesn't take the query into account that is part of request.
usually spelled correctly on top of this also improves the spellcheck
performance. The shard level document frequencies are used for this
option.
`exact_match`::
Indicate whether to include the text input as a suggest candidate
if a term is found to match it exactly.

This comment has been minimized.

Copy link
@areek

areek Dec 14, 2015

Contributor

we could add Defaults to False to indicate the default for this setting

@areek

This comment has been minimized.

Copy link
Contributor

commented Dec 14, 2015

Thanks @ricardocerq for the PR :). I left some comments. Would be awesome if we could have a test for the new setting too. You could have a look at SuggestSearchTests.java for some inspiration.

@ricardocerq

This comment has been minimized.

Copy link
Author

commented Dec 15, 2015

Thanks for the attention and the feedback! I'm making the necessary changes. :)

@clintongormley

This comment has been minimized.

Copy link
Member

commented Mar 10, 2016

@areek could you review this again please?

@dakrone

This comment has been minimized.

Copy link
Member

commented Apr 6, 2016

pinging @areek again for review :)

@@ -56,7 +59,14 @@ public TermSuggestion innerExecute(String name, TermSuggestionContext suggestion
);
Text key = new BytesText(new BytesArray(token.term.bytes()));
TermSuggestion.Entry resultEntry = new TermSuggestion.Entry(key, token.startOffset, token.endOffset - token.startOffset);
for (SuggestWord suggestWord : suggestedWords) {
if (suggestion.getDirectSpellCheckerSettings().exactMatch()){
final TermsEnum termsEnum = MultiFields.getTerms(indexReader, token.term.field()).iterator();

This comment has been minimized.

Copy link
@areek

areek Apr 7, 2016

Contributor

indentation is off here

@areek

This comment has been minimized.

Copy link
Contributor

commented Apr 7, 2016

@ricardocerq sorry for the delay, this LGTM, would you mind updating the PR with master?

@dakrone

This comment has been minimized.

Copy link
Member

commented Sep 12, 2016

@ricardocerq are you able to update this by rebasing or merging master in so it can be merged?

@ricardocerq

This comment has been minimized.

Copy link
Author

commented Oct 2, 2016

Sorry about the delay :(. I was unable to retest the changes. If more changes are necessary, please let me know.

@worldsayshi

This comment has been minimized.

Copy link

commented Oct 13, 2016

@areek @clintongormley Can this be merged?

@s1monw
Copy link
Contributor

left a comment

the merge wasn't really successful afaik. There are about 1400 lines of unnecessary code in this PR. Otherwise it looks ok but I can't merge it like this. I left comments inline

@@ -0,0 +1,301 @@
/*

This comment has been minimized.

Copy link
@s1monw

s1monw Oct 13, 2016

Contributor

this entire class seems unused?

@@ -137,6 +139,14 @@ public void minDocFreq(float minDocFreq) {
this.minDocFreq = minDocFreq;
}

public boolean exactMatch() {

This comment has been minimized.

Copy link
@s1monw

s1monw Oct 13, 2016

Contributor

can we have javadocs on this?

@@ -0,0 +1,1329 @@
/*

This comment has been minimized.

Copy link
@s1monw

s1monw Oct 13, 2016

Contributor

this is a full copy of our tests that has been moved back to core into org.elasticsearch.search.suggest.SuggestSearchIT as far as I can tell only testExactMatch should be added?

@clintongormley

This comment has been minimized.

Copy link
Member

commented Nov 6, 2016

@ricardocerq are you still interested in completing this PR?

@clintongormley

This comment has been minimized.

Copy link
Member

commented Feb 15, 2017

Closing in favour of #22902

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.