Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

WebSearch: use index-time word breaking information during seach time as well #915

Closed
tiborsimko opened this Issue · 1 comment

1 participant

@tiborsimko
Owner

Originally on 2012-02-27

In demo site, when searching for "spectrum.", one gets a warning phrase:

No exact match found for spectrum., using spectrum instead...

followed by two hits.

Considering that dot is stripped away from indexed terms at the index time, see CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS and CFG_BIBINDEX_CHARS_PUNCTUATION and friends, it should not be necessary for the search engine to look for the dotted version at the search time.

The purpose of this ticket is to take advantage of CFG_BIBINDEX_CHARS_PUNCTUATION and friends also during search time. I.e. if a character is stripped away during indexing-time, then strip it away also during search-time, when looking for words. (Not for phrases or regexps.) We can amend search_unit_in_bibwords to this effect so that incoming terms to look for will be washed similarly as during the indexing process.

Note that this may concern stemming and stopwords and such, but we have another ticket to take care of centralising indexing configurations, so further improvements could be dealt with there. See #852.

@tiborsimko
Owner

#852 is implemented, but unless some service needs this for master, I propose to tackle this via elasticsearch in next.

@tiborsimko tiborsimko closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.