Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sometimes CALL SUGGEST does not suggest a correction of a single letter typo #271

Closed
Maximusya opened this issue Oct 16, 2019 · 9 comments
Closed

Comments

@Maximusya
Copy link

Maximusya commented Oct 16, 2019

Manticore Search version:
Manticore 3.1.2

OS version:
Image used at https://play.manticoresearch.com/didyoumean/
Linux didyoumean-557fd77c77-j52qr 4.15.0-50-generic #54-Ubuntu SMP Mon May 6 18:46:08 UTC 2019 x86_64 GNU/Linux

Build version:
Manticore 3.1.2 47b6bc2@190822 release

Description of the issue:
CALL SUGGEST('ravge', 'movies'); does not suggest range
yet both CALL SUGGEST('rvnge', 'movies'); and CALL SUGGEST('ranve', 'movies'); do suggest range.

Steps to reproduce:

  1. Start course https://play.manticoresearch.com/didyoumean/
  2. open mysql shell:
    # mysql -P9306 -h0
  3. find suggestions for rvnge (1 letter misspelling of range)
MySQL [(none)]> CALL SUGGEST('rvnge','movies');   
 --------- ---------- ------    
| suggest | distance | docs |   
 --------- ---------- ------    
| range   | 1        | 6    |   
| revenge | 2        | 77   |   
| ranger  | 2        | 7    |   
| orange  | 2        | 3    |   
| binge   | 2        | 1    |   
 --------- ---------- ------    
5 rows in set (0.00 sec)   
  1. find suggestions for ravge (1 letter misspelling of range)
MySQL [(none)]> CALL SUGGEST('ravge','movies');   
 --------- ---------- ------    
| suggest | distance | docs |   
 --------- ---------- ------    
| rave    | 1        | 2    |   
| grave   | 2        | 8    |   
| raven   | 2        | 7    |   
| brave   | 2        | 4    |   
| raver   | 2        | 2    |   
 --------- ---------- ------    
5 rows in set (0.01 sec)   
  1. find suggestions for ranve (1 letter misspelling of range)
MySQL [(none)]> CALL SUGGEST('ranve','movies');   
 --------- ---------- ------    
| suggest | distance | docs |   
 --------- ---------- ------    
| range   | 1        | 6    |   
| france  | 2        | 33   |   
| randy   | 2        | 16   |   
| ranch   | 2        | 8    |   
| ranger  | 2        | 7    |   
 --------- ---------- ------    
5 rows in set (0.00 sec)   

The config for the index is reportedly this:

index movies   
 {   
    type            = plain   
    path            = /var/lib/manticore/data/movies   
    source          = movies   
    min_infix_len   = 3   
 }   
@Maximusya
Copy link
Author

FYI: found same issue reported @ http://sphinxsearch.com/forum/view.html?id=16301

@Maximusya
Copy link
Author

The issue is reproduced on the latest 3.2.0 e526a01@191017 release docker image

@tomatolog
Copy link
Contributor

I will look at issue and report that is from bug of code or wrong default settings of CALL SUGGEST

@geseq
Copy link

geseq commented Feb 25, 2020

Any update on this?

@manticoresearch
Copy link
Contributor

CALL SUGGEST('ravge', 'movies'); does not suggest range
yet both CALL SUGGEST('rvnge', 'movies'); and CALL SUGGEST('ranve', 'movies'); do suggest range.

How it works internally is:

  • "range" gets converted into "ran ang nge"
  • "ravge" => "rav avg vge" - neither of the trigrams matches with "ran ang nge"
  • "rvnge" => "rvn vng nge" - "nge" matches
  • "ranve" => "ran anv nve" - "ran" matches

So it's somewhat an expected behaviour, however I agree it's more of a bug as "ravge" should match with "range". We might find time to improve the algorithm in future, but right now it's not in our nearest plans (of course unless you or your company is ready to sponsor the development - https://manticoresearch.com/services/)

@tomatolog
Copy link
Contributor

we could switch to 2symbol ngram generation for suggest word in case of result is empty or in case a new suggest query option force this behavior

@pbabkin
Copy link

pbabkin commented Feb 26, 2023

we could switch to 2symbol ngram generation

Any update on this? Is there some settings\params we could specify on our own for this ?

I faced the same issue. May be it will usefull. Changing Index setting "morphology" to lemmatizer does not change situation.

drop table if exists t; create table t(f text) min_infix_len='2' morphology='stem_ru'; insert into t(f) values('опель'); insert into t(f) values('поло'); insert into t(f) values('гольф'); insert into t(f) values('полоса');
/* this is ok */
CALL SUGGEST( 'опель', 't');
CALL SUGGEST( 'пель', 't'); 
CALL SUGGEST( 'опел', 't');
CALL SUGGEST( 'апел', 't');
CALL SUGGEST( 'опелк', 't');
/* but */
CALL SUGGEST( 'опэль', 't'); 
CALL SUGGEST( 'ополь', 't'); 
CALL SUGGEST( 'опуль', 't');

@sanikolaev
Copy link
Collaborator

Any update on this? Is there some settings\params we could specify on our own for this ?

Unfortunately not. There have been more pressing issues to deal with. We'll try our best to prioritize it for the next release, but I can't make any promises.

@tomatolog
Copy link
Contributor

should be fixed at ec19c5b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants