Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of stems in finder index in contrast to fulltext search #17085

Closed
StefanLindner opened this issue Jul 12, 2017 · 7 comments
Closed

Use of stems in finder index in contrast to fulltext search #17085

StefanLindner opened this issue Jul 12, 2017 · 7 comments

Comments

@StefanLindner
Copy link

StefanLindner commented Jul 12, 2017

Steps to reproduce the issue

  1. Install second language german
  2. make german the default language
  3. Create two articles. Place the word "Heiterkeit" in the first one and "Heiterung" in the second one. Both words have the stem "heiter".
  4. configure finder indexer to use stem and snowball
  5. install a stemmer implementation for php
  6. Build finder index
  7. Type "Heiter" into finder's search field. You will see "Heiterkeit" and "Heiterung" as proposal.
  8. Ich you search for "Heiterkeit" you will also find the article with "Heiterung"
  9. But if you search for "Heiter" you will not find any of the above created articles
  10. If you use a fulltext search in contrast (joomla's search component)and you search for "heiter" you will find both
  11. If you use a fulltext search and search for "Heiterkeit" you will only find on article, but not the one including "Heiterung"

Expected result

Using finder for searching should find both articles if searchd for stem "Heiter"
Using finder should not find "Heiterung" if searching for "Heiterkeit"

Actual result

No results for search

System information (as much as possible)

Joomla only.

Additional comments

Is this the intedend behavior of the finder component? Means: it's not a but, it's the intended behavior? If it is not the intende behavior, the fix dould be done with one line of code in file administrator/components/com_finder/helpers/indexer/query.php

Short summary:

Findex Index

search for "Heiter": no results
search for "Heiterkeit": finds "Heiterkeit" and "Heiterung" both words fully marked as result

Search

search for "Heiter": finds "Heiterkeit" and "Heiterung" in both words only "Heiter" marked as result
search for "Heiterkeit": finds only "Heiterkeit"

@brianteeman
Copy link
Contributor

install a stemmer implementation for php

What stemmer did you install?

@StefanLindner
Copy link
Author

For PHP 5 we use pecl's stem from https://pecl.php.net

pecl install stem

For PHP 7 pecl no longer provides a stem implementation. In this case we build it like this:

mkdir /tmp/stem && cd /tmp/stem && \
curl -s -L http://pecl.php.net/get/stem-1.5.1.tgz | tar xzf - && \
cd stem-1.5.1 && \
curl -s -L "https://bugs.php.net/patch-display.php?bug_id=71091&patch=feature-support-php5-and-php7&revision=1457366472&download=1" | patch -p1 && \
sed "s/821813b35d88263c9b8b43b5202ca67a/$(md5sum stem.c | cut -d ' ' -f1)/" ../package.xml > package.xml && \
yes yes | pecl install package.xml && \
cd /tmp && \
rm -Rf stem && \
echo "extension=stem.so" > /etc/php.d/stem.ini
```<hr /><sub>This comment was created with the <a href="https://github.com/joomla/jissues">J!Tracker Application</a> at <a href="https://issues.joomla.org/tracker/joomla-cms/17085">issues.joomla.org/tracker/joomla-cms/17085</a>.</sub>

@Hackwar
Copy link
Member

Hackwar commented May 21, 2018

#20391 will fix this (partially). Can this issue be assigned to me, so that I can include this in one of the future improvements for com_finder?

@brianteeman
Copy link
Contributor

@Hackwar we can only assign people who are part of the maintainers team - limitation of github :(

@Hackwar
Copy link
Member

Hackwar commented Jun 3, 2018

After some considerations, I would say that this is the way it is supposed to work. In this specific situation, I expect that the stemmer identified the "-er" of "Heiter" as a suffix that can be removed and thus it wont find any results. Thus I would say that this is intended behavior.

@joomla-cms-bot
Copy link

Set to "closed" on behalf of @franz-wohlkoenig by The JTracker Application at issues.joomla.org/joomla-cms/17085

@ghost
Copy link

ghost commented Jun 4, 2018

closed as expected Behaviour as stated above. Issue can always reopened for ongoing Discussion.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/17085.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants