Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub#109 ⁃ Adding wordform for "quaranta" prevents "quaranta*" matching exact word #109

Closed
malaire opened this issue Aug 23, 2018 · 4 comments

Comments

@malaire
Copy link

@malaire malaire commented Aug 23, 2018

Manticore Search version: 2.7.1
OS version: Debian Stretch
Build version: manticore_2.7.1-180704-458e9c6-release-stemmer.stretch_amd64-bin.deb

Adding wordform prevents quaranta* matching quaranta even though *quaranta* and *quaranta still do match quaranta:

WITHOUT wordform quaranta > 40:

*quaranta* matches quaranta, quarantaquattro
*quaranta  matches quaranta
quaranta*  matches quaranta, quarantaquattro

WITH wordform quaranta > 40:

*quaranta* matches quaranta, quarantaquattro
*quaranta  matches quaranta
quaranta*  matches quarantaquattro

Note the difference in last query quaranta*.

Index configuration:

index SongSearch
{
  type                   = plain
  source                 = SongSearch
  path                   = /var/lib/manticore/data/SongSearch
  dict                   = keywords
  morphology             = none
  wordforms              = /etc/sphinxsearch/wordforms.txt
  min_word_len           = 1
  min_infix_len          = 2
  index_exact_words      = 1
  preopen                = 1
}

Changelog: fixed matches of prefixes at query for index with only wordforms and index_exact_words=1

@malaire
Copy link
Author

@malaire malaire commented Aug 27, 2018

Minimal testcase (with english words this time):

Index definitions:

index test
{
  type                   = rt
  path                   = /var/lib/manticore/data/test
  rt_field               = content
  rt_attr_uint           = dummy
  min_infix_len          = 2
  index_exact_words      = 1
}
index test2 : test
{
  path                   = /var/lib/manticore/data/test2
  wordforms              = /etc/sphinxsearch/test2_wordforms.txt
}

test2_wordforms.txt:

forty > 40

Queries to reproduce:

MySQL> INSERT INTO test VALUES(1, 'forty', 1);
MySQL> INSERT INTO test VALUES(2, 'fortyfour', 1);
MySQL> INSERT INTO test2 VALUES(1, 'forty', 1);
MySQL> INSERT INTO test2 VALUES(2, 'fortyfour', 1);

MySQL> SELECT id FROM test WHERE MATCH('forty*');
+------+
| id   |
+------+
|    1 |
|    2 |
+------+
2 rows in set (0.00 sec)

MySQL> SELECT id FROM test2 WHERE MATCH('forty*');
+------+
| id   |
+------+
|    2 |
+------+
1 row in set (0.00 sec)

Loading

@airolg
Copy link

@airolg airolg commented Aug 30, 2018

Added to backlog, thank you

Loading

@tomatolog
Copy link
Contributor

@tomatolog tomatolog commented Feb 21, 2019

seems some kind of clash between index_exact_words=1 and only wordforms set causes no exact forms of token stored.

In case I'd add morphology = stem_ru that affects no tokens but cause exact form of tokens stored well when query SELECT id FROM test2 WHERE MATCH('forty*') return both documents.

I'm going to investigate and fix issue. I'll inform you on fix

Loading

@githubmanticore githubmanticore changed the title Adding wordform for "quaranta" prevents "quaranta*" matching exact word GitHub#109 ⁃ Adding wordform for "quaranta" prevents "quaranta*" matching exact word Feb 22, 2019
@githubmanticore
Copy link
Contributor

@githubmanticore githubmanticore commented Feb 22, 2019

➤ Stan commented:

I've just pushed the fix 0721696 and you have to rebuild your searchd to make prefix match exact forms for index with only wordforms and index_exact_words=1

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants