Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested change to cfg.d/indexing.pl #146

Open
drtjmb opened this issue Oct 22, 2013 · 0 comments
Open

Suggested change to cfg.d/indexing.pl #146

drtjmb opened this issue Oct 22, 2013 · 0 comments

Comments

@drtjmb
Copy link
Member

@drtjmb drtjmb commented Oct 22, 2013

I just sent the following to a customer who was having trouble searching for journals matching "plos" and "ieee" - I suggest that the tweaks I give below should be the default in release versions (indeed we use it as the default for EPServices builds):

The key is to understand that extract_words (archives/jdb/cfg/cfg.d/indexing.pl) is used to both (a) index each item and (b) process search terms.

In particular:

    # remove trailing lowercase 's'
    $word =~ s/s$//;

    # lowercase the word if it contains at least 1 lowercase letter
           if( $word =~ m/[a-z]/ )
           {
                   $word = lc $word;
           }

So to take your examples:

  1. PLoS

"PLoS" will be indexed as "plos" (no trailing lowercase 's' but there is at least 1 lowercase letter)

When you search for...

"PLoS" will search for "plos" and match the index
"plos" will search for "plo" (trailing lowercase 's' removed) and not match the index
"PLos" - will search for "plo" and not match
"PLOS" - will search for "PLOS" (no lowercase letters) and not match
"ploS" - will search for "plos" and match

  1. IEEE

"IEEE" will be indexed as "IEEE" (no trailing lowercase 's', no lowercase letters)

When you search for..

"IEEE" will search for "IEEE" and match the index
"ieee" will search for "ieee" and not match
"ieEe" will search for "ieee" and not match

What I would suggest is that you change indexing.pl as follows:

    # ALWAYS lowercase the word
           $word = lc $word;

    # remove trailing lowercase 's' AFTER lowercasing the word
    $word =~ s/s$//;

You'd then need to update the index (bin/epadmin reindex).

You would then find that ALL the search examples above will match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant