Skip to content


Subversion checkout URL

You can clone with
Download ZIP


Error when PyStemmer is not installed and stemming is still enabled #827

romanchyla opened this Issue · 2 comments

2 participants


Originally on 2011-09-30

I don't know (yet) the cause, but when the global index (in the demo site) has stemming enabled AND if PyStemmer is not enabled, the index will contain only stemmed values, but not original tokens.

To reproduce:

  1. load demo records
  2. search for ellis [0 hits]
  3. search for elli [11 hits]


  1. load demo records
  2. change configuration of global index (deactivate stemming)
  3. search for ellis [11 hits]

The issue is solved by installing PyStemmer, but PyStemmer is only recommended.

I have to find out what is doing the stemming instead and why it is not indexing also the original words.


Originally on 2011-09-30

When PyStemmer is not installed, Invenio will fall-back on a pythonic implementation of the Porter stemming algorithm for English, and will still apply stemming if required.

When stemming is enabled on an index, only the stemmed word is stored, not the original one. Therefore I don't see anything going wrong in what you mention above.

If you actually disable stemming (regardless of the status of the installation of PyStemmer), then the original term will instead be stored in the index...



Originally on 2011-09-30

Actually Ludmila pointed me out the fact that you are actually doing a high-level searching (when replying you I had in mind just low level fiddling with the indexing tables). So yep, indeed what you say underlines there might be a bug between what the indexing engine does WRT stemming when PyStemmer is not installed Vs. what the search engine does WRT stemming in the same situation. And it might well be that the search engine simply assume no stemming in case of no PyStememer (which is wrong WRT the indexing layer...)

I will look more into this...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.