Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub#109 ⁃ Adding wordform for "quaranta" prevents "quaranta*" matching exact word #109

Closed
malaire opened this Issue Aug 23, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@malaire
Copy link

malaire commented Aug 23, 2018

Manticore Search version: 2.7.1
OS version: Debian Stretch
Build version: manticore_2.7.1-180704-458e9c6-release-stemmer.stretch_amd64-bin.deb

Adding wordform prevents quaranta* matching quaranta even though *quaranta* and *quaranta still do match quaranta:

WITHOUT wordform quaranta > 40:

*quaranta* matches quaranta, quarantaquattro
*quaranta  matches quaranta
quaranta*  matches quaranta, quarantaquattro

WITH wordform quaranta > 40:

*quaranta* matches quaranta, quarantaquattro
*quaranta  matches quaranta
quaranta*  matches quarantaquattro

Note the difference in last query quaranta*.

Index configuration:

index SongSearch
{
  type                   = plain
  source                 = SongSearch
  path                   = /var/lib/manticore/data/SongSearch
  dict                   = keywords
  morphology             = none
  wordforms              = /etc/sphinxsearch/wordforms.txt
  min_word_len           = 1
  min_infix_len          = 2
  index_exact_words      = 1
  preopen                = 1
}

Changelog: fixed matches of prefixes at query for index with only wordforms and index_exact_words=1

@malaire

This comment has been minimized.

Copy link
Author

malaire commented Aug 27, 2018

Minimal testcase (with english words this time):

Index definitions:

index test
{
  type                   = rt
  path                   = /var/lib/manticore/data/test
  rt_field               = content
  rt_attr_uint           = dummy
  min_infix_len          = 2
  index_exact_words      = 1
}
index test2 : test
{
  path                   = /var/lib/manticore/data/test2
  wordforms              = /etc/sphinxsearch/test2_wordforms.txt
}

test2_wordforms.txt:

forty > 40

Queries to reproduce:

MySQL> INSERT INTO test VALUES(1, 'forty', 1);
MySQL> INSERT INTO test VALUES(2, 'fortyfour', 1);
MySQL> INSERT INTO test2 VALUES(1, 'forty', 1);
MySQL> INSERT INTO test2 VALUES(2, 'fortyfour', 1);

MySQL> SELECT id FROM test WHERE MATCH('forty*');
+------+
| id   |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)

MySQL> SELECT id FROM test2 WHERE MATCH('forty*');
+------+
| id   |
+------+
|    2 |
+------+
1 row in set (0.00 sec)

@airolg airolg added the in backlog label Aug 30, 2018

@airolg

This comment has been minimized.

Copy link

airolg commented Aug 30, 2018

Added to backlog, thank you

@tomatolog

This comment has been minimized.

Copy link
Contributor

tomatolog commented Feb 21, 2019

seems some kind of clash between index_exact_words=1 and only wordforms set causes no exact forms of token stored.

In case I'd add morphology = stem_ru that affects no tokens but cause exact form of tokens stored well when query SELECT id FROM test2 WHERE MATCH('forty*') return both documents.

I'm going to investigate and fix issue. I'll inform you on fix

@githubmanticore githubmanticore changed the title Adding wordform for "quaranta" prevents "quaranta*" matching exact word GitHub#109 ⁃ Adding wordform for "quaranta" prevents "quaranta*" matching exact word Feb 22, 2019

@githubmanticore githubmanticore added the bug label Feb 22, 2019

@githubmanticore

This comment has been minimized.

Copy link
Contributor

githubmanticore commented Feb 22, 2019

➤ Stan commented:

I've just pushed the fix 0721696 and you have to rebuild your searchd to make prefix match exact forms for index with only wordforms and index_exact_words=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.