You can clone with
HTTPS or Subversion.
Originally on 2011-09-30
I don't know (yet) the cause, but when the global index (in the demo site) has stemming enabled AND if PyStemmer is not enabled, the index will contain only stemmed values, but not original tokens.
The issue is solved by installing PyStemmer, but PyStemmer is only recommended.
I have to find out what is doing the stemming instead and why it is not indexing also the original words.
When PyStemmer is not installed, Invenio will fall-back on a pythonic implementation of the Porter stemming algorithm for English, and will still apply stemming if required.
When stemming is enabled on an index, only the stemmed word is stored, not the original one. Therefore I don't see anything going wrong in what you mention above.
If you actually disable stemming (regardless of the status of the installation of PyStemmer), then the original term will instead be stored in the index...
Actually Ludmila pointed me out the fact that you are actually doing a high-level searching (when replying you I had in mind just low level fiddling with the indexing tables). So yep, indeed what you say underlines there might be a bug between what the indexing engine does WRT stemming when PyStemmer is not installed Vs. what the search engine does WRT stemming in the same situation. And it might well be that the search engine simply assume no stemming in case of no PyStememer (which is wrong WRT the indexing layer...)
I will look more into this...