Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percolate query doesn't support cyrillic tokenization #194

Closed
yharahuts opened this issue May 17, 2019 · 2 comments
Closed

Percolate query doesn't support cyrillic tokenization #194

yharahuts opened this issue May 17, 2019 · 2 comments
Labels

Comments

@yharahuts
Copy link

yharahuts commented May 17, 2019

Hello!

Using Manticore 3.0.0 95141ec@190509 release, I got this issue: when using CALL PQ with cyrillic (russian) text, the search query is not stemmed.

I have folowing PQ index:

            
| id   | query            | tags | filters |
            
|  813 | test             |      |         |
            
|  814 | проверка         |      |         |
            

When I'm calling pq with latin text, everything works as expected, searching with other word form too:

            
CALL PQ ('pq_test', 'testing', 0 AS docs_json );
            
 ------ 
            
| id   |
            
 ------ 
            
|  813 |
            
 ------ 
            

But when I'm searching cyrillic test, it's only found with exact match:

            
CALL PQ ('pq_test', 'проверка', 0 AS docs_json );
            
 ------ 
            
| id   |
            
 ------ 
            
|  814 |
            
 ------ 
            

And when I'm searching with oner word form, it returns empty set:

            
CALL PQ ('pq_test', 'проверки', 0 AS docs_json );         
            
Empty set (0.00 sec)
            

In my config I have base index:

            
index __base {
            
    min_stemming_len    = 2
            
    min_word_len        = 1
            
    index_exact_words   = 1
            
    min_infix_len       = 2
            
    expand_keywords     = 1
            

            
    docinfo         = extern
            
    dict            = keywords
            
    mlock           = 0
            
    morphology      = stem_ru
            

            
    blend_chars = -
            
    ignore_chars = U 0021, U 0023..U 002C, U 002F, U 003A..U 0040, U 0060, U 2019, U 00AB, U 00BB, U 0027
            
    charset_table = 0..9, _, A..Z->a..z, a..z,  \
            
            U 410..U 418->U 430..U 438,  \
            
            U 41A..U 42A->U 43A..U 44A,  \
            
            U 42C->U 44C,  \
            
            U 42E..U 42F->U 44E..U 44F,  \
            
            U 404->U 454,  \
            
            U 406->U 438,  \
            
            U 456->U 438,  \
            
            U 407->U 438,  \
            
            U 457->U 438,  \
            
            U 439->U 438,  \
            
            U 419->U 438,  \
            
            U 42B->U 438,  \
            
            U 44B->U 438,  \
            
            U 490->U 433,  \
            
            U 491->U 433,  \
            
            U 401->U 435,  \
            
            U 451->U 435,  \
            
            U 42D->U 435,  \
            
            U 44D->U 435,  \
            
            U 404->U 435,  \
            
            U 454->U 435,  \
            
            U 430..U 438,  \
            
            U 43A..U 44A,  \
            
            U 44C,         \
            
            U 44E..U 44F    
            
}
            

And pq, which extends it:

            
index pq_test : __base {
            
    path            = /var/lib/manticore/data/pq_test
            
    type            = percolate
            
    rt_field = Content
            
}
            

I tried to debug this using call keywords, but

            
call keywords( 'слово', 'pq_test' );
            
ERROR 1064 (42000): not implemented
            

Any ideas?

@tomatolog
Copy link
Contributor

to use CALL KEYWORDS on index you might set your base to

index base1
{
	type = template
...
index pq_test : base1
{  
    type            = percolate  

I tested you case and see issue here that I'm going to investigate. I'll inform you on fix.

@tomatolog
Copy link
Contributor

I've just fixed at commit 6b8c424 percolate index to handle stemmers \ morphology option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants