Skip to content

Commit

Permalink
Added Solr partial string match configuration
Browse files Browse the repository at this point in the history
Steps on how to configure the Solr server to return partial string matches on keyword searches.
  • Loading branch information
sammyjava committed Mar 17, 2020
1 parent 01c458f commit 603d8fe
Showing 1 changed file with 48 additions and 0 deletions.
48 changes: 48 additions & 0 deletions docs/webapp/keyword-search/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,3 +97,51 @@ Solr
See :doc:`/system-requirements/software/solr` for details on how to install Solr.

.. index:: keyword search, quick search, search, Solr, Lucene

Solr Partial String Match Configuration
---------------------------------------
In its default configuration, Solr will not match partial search terms. For example a gene named *REVOLUTA* will be
returned in the search results for search term "REVOLUTA" but not for search term "REV." In order to have Solr return
partial string matches, you must edit its configuration on the Solr server:

1. ADD the following to /var/solr/data/[mine]-search/conf/managed-schema. (This example implements it for hits
against Gene.primaryIdentifier and Gene.secondaryIdentifier.)

```xml
<fieldType name="text_ngram" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="1" maxGramSize="50"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<field name="gene_primaryidentifier" type="text_ngram" indexed="true" stored="true"/>
<field name="gene_secondaryidentifier" type="text_ngram" indexed="true" stored="true"/>
```

2. REMOVE the gene_primaryidentifier and gene_secondaryidentifier field definitions from the earlier part of the file.
They look like this:

```xml
<field name="gene_primaryidentifier" type="analyzed_string" multiValued="true" indexed="true" required="false" stored="false"/>
<field name="gene_secondaryidentifier" type="analyzed_string" multiValued="true" indexed="true" required="false" stored="false"/>
```

3. RESTART Solr to load the new config, e.g. under System V:

```
$ systemctl restart solr
```

4. REBUILD the search index using the Solr-related postprocesses:

```
./gradlew postprocess -Pprocess=create-search-index
```

Your keyword search will now return results on partial matches for the attributes that you configured in
Solr (Gene.primaryIdentifier and Gene.secondaryIdentifier in this example).

0 comments on commit 603d8fe

Please sign in to comment.