I just sent the following to a customer who was having trouble searching for journals matching "plos" and "ieee" - I suggest that the tweaks I give below should be the default in release versions (indeed we use it as the default for EPServices builds):
The key is to understand that extract_words (archives/jdb/cfg/cfg.d/indexing.pl) is used to both (a) index each item and (b) process search terms.
In particular:
# remove trailing lowercase 's'
$word =~ s/s$//;
# lowercase the word if it contains at least 1 lowercase letter
if( $word =~ m/[a-z]/ )
{
$word = lc $word;
}
So to take your examples:
PLoS
"PLoS" will be indexed as "plos" (no trailing lowercase 's' but there is at least 1 lowercase letter)
When you search for...
"PLoS" will search for "plos" and match the index
"plos" will search for "plo" (trailing lowercase 's' removed) and not match the index
"PLos" - will search for "plo" and not match
"PLOS" - will search for "PLOS" (no lowercase letters) and not match
"ploS" - will search for "plos" and match
IEEE
"IEEE" will be indexed as "IEEE" (no trailing lowercase 's', no lowercase letters)
When you search for..
"IEEE" will search for "IEEE" and match the index
"ieee" will search for "ieee" and not match
"ieEe" will search for "ieee" and not match
What I would suggest is that you change indexing.pl as follows:
# ALWAYS lowercase the word
$word = lc $word;
# remove trailing lowercase 's' AFTER lowercasing the word
$word =~ s/s$//;
You'd then need to update the index (bin/epadmin reindex).
You would then find that ALL the search examples above will match.
The text was updated successfully, but these errors were encountered:
I just sent the following to a customer who was having trouble searching for journals matching "plos" and "ieee" - I suggest that the tweaks I give below should be the default in release versions (indeed we use it as the default for EPServices builds):
The key is to understand that extract_words (archives/jdb/cfg/cfg.d/indexing.pl) is used to both (a) index each item and (b) process search terms.
In particular:
So to take your examples:
"PLoS" will be indexed as "plos" (no trailing lowercase 's' but there is at least 1 lowercase letter)
When you search for...
"PLoS" will search for "plos" and match the index
"plos" will search for "plo" (trailing lowercase 's' removed) and not match the index
"PLos" - will search for "plo" and not match
"PLOS" - will search for "PLOS" (no lowercase letters) and not match
"ploS" - will search for "plos" and match
"IEEE" will be indexed as "IEEE" (no trailing lowercase 's', no lowercase letters)
When you search for..
"IEEE" will search for "IEEE" and match the index
"ieee" will search for "ieee" and not match
"ieEe" will search for "ieee" and not match
What I would suggest is that you change indexing.pl as follows:
You'd then need to update the index (bin/epadmin reindex).
You would then find that ALL the search examples above will match.
The text was updated successfully, but these errors were encountered: