Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phrase suggest direct generator possibly not obeying min_word_len 0.90 #3037

Closed
jtreher opened this Issue May 14, 2013 · 10 comments

Comments

Projects
None yet
3 participants
@jtreher
Copy link

commented May 14, 2013

I ran into an issue where the phrase suggester does not seem to be generating terms for words of length less than the default of four even with the min_word_len set to 0,1,2, or 3. When I run a term suggest, the term comes back as expected.

Here is a gist reproducing the issue:
https://gist.github.com/jtreher/5577747

@clintongormley

This comment has been minimized.

Copy link
Member

commented May 15, 2013

The parameter is prefix_len not prefix_leng:

@clintongormley

This comment has been minimized.

Copy link
Member

commented May 15, 2013

    curl -XPOST '127.0.0.1:9200/test/_suggest?pretty' -d'{
      "did_you_mean": {
        "text": "ice",
        "term": {
          "field": "name",
          "max_edits":2,
          "suggest_mode": "always",
          "min_word_len": 0,
          "prefix_len": 0
        }
      }
    }'
    {
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "failed" : 0
      },
      "did_you_mean" : [ {
        "text" : "ice",
        "offset" : 0,
        "length" : 3,
        "options" : [ {
          "text" : "iced",
          "score" : 0.6666666,
          "freq" : 1
        } ]
      } ]
    }
@jtreher

This comment has been minimized.

Copy link
Author

commented May 15, 2013

@clintongormley While I did have a typo in the term suggest, the phrase suggest example is working and demonstrates the issue. Could you reopen?

I will clarify that the gist was demonstrating that the term suggest is providing the term "iced" but I believe the candidate generator in the phrase suggest is not provided the term "iced" for the phrase suggest to consider because of the word length.

@clintongormley

This comment has been minimized.

Copy link
Member

commented May 15, 2013

@jtreher sorry - got that completely wrong. I'll reopen

I'm seeing the same thing you're seeing.

@s1monw ?

@ghost ghost assigned s1monw May 15, 2013

@s1monw

This comment has been minimized.

Copy link
Contributor

commented May 15, 2013

this seems to be a bug in the min_doc_freq smoothing. The good thing is that this only happens if your query term has a freq = 1 and the replacement has a freq = 1 as well. So in practice this might not be an issue. I will have a fix soon, in the meanwhile this should help:

curl -XPOST 'localhost:9200/test/_suggest?pretty=true' -d '{                                                                                                                              
  "text": "ice tea",
  "did_you_mean": {
    "phrase": {
      "field": "name_shingled",
      "gram_size": 3,
      "direct_generator": [
        {
          "field": "name",
          "max_edits": 2,
          "suggest_mode": "always",
          "min_word_len": 0,
          "prefix_len": 0,
          "min_doc_freq": 1.0
        }
      ]
    }
  }
}'

@s1monw s1monw closed this in 8235b89 May 15, 2013

s1monw added a commit that referenced this issue May 15, 2013

Don't apply min frequency smoothing if suggest type is 'always'
Using an automatically detected 'min_doc_freq' if suggest type is set to
'always' is counter intuitive. If we suggest always ignore the frequency and
set threshold frequency to 0 to allow all possible candidates to be drawn if
they are within the given bounds.

Closes #3037
@jtreher

This comment has been minimized.

Copy link
Author

commented May 15, 2013

@s1monw Is there any chance that max_term_freq is not being obeyed as well with "always?" While this patch fixed this test issue, I actually have a situation where ice appears thousands of times and iced several hundred. I see "iced" appear from the term suggest, but it's like the phrase suggest never gets it.

@s1monw

This comment has been minimized.

Copy link
Contributor

commented May 15, 2013

max_term_freq is the maximum threshold (default: 0.01f) of documents a query term can appear in order to provide suggestions. Which means that if you won't even get a candidate for ice if the freq exceeds max_term_freq. Maybe I am not understanding you question right?

@jtreher

This comment has been minimized.

Copy link
Author

commented May 16, 2013

@s1monw I have to set max_term_freq in the term suggest to 0.999 (99.9%) to have term show up. However, when I do this in phrase suggest, it's as though the candidate is not generated.

@jtreher

This comment has been minimized.

Copy link
Author

commented Jun 7, 2013

@s1monw This did fix the issue I had. It seems to respect max_term_freq now with 0.9.1

@s1monw

This comment has been minimized.

Copy link
Contributor

commented Jun 7, 2013

@jtreher I think I did!

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Don't apply min frequency smoothing if suggest type is 'always'
Using an automatically detected 'min_doc_freq' if suggest type is set to
'always' is counter intuitive. If we suggest always ignore the frequency and
set threshold frequency to 0 to allow all possible candidates to be drawn if
they are within the given bounds.

Closes elastic#3037
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.