Skip to content
This repository

Solr in Blacklight

Setting up Solr

Blacklight uses Solr as its "search engine". More information about Solr is available at the Solr web site ( http://lucene.apache.org/solr/)

There are three sections to this document:

  • Getting Solr
  • Configuring Solr
    • schema.xml
    • solrconfig.xml
  • SolrMARC

Getting Solr

Blacklight distributes a pre-configured version of Solr (with the Jetty container) as blacklight-jetty.

You can also use an existing Solr index (with some minor modifications). If you want to start from a new version of Solr, follow the directions from the Solr tutorial

You should now have a usable copy of Solr.

Configuring Solr

Solr Schema.xml

Between the schema.xml and solrconfig.xml you can change and tune the search behavior following directions from the Solr wiki. Solr comes with example schema and solrconfig files, which you can use as a starting point for configuring your local Solr application.

Blacklight expects a uniqueKey field within your Solr index, traditionally called id. The name of the unique key field can be configured in your application's SolrDocument.

Blacklight community "best practices"

Solr uses a schema.xml file to define document fields (among other things). These fields store data for searching and for result display. You can find the example/solr/conf/schema.xml file in the Solr distribution you just downloaded and uncompressed.

Documentation about the Solr schema.xml file is available at (http://wiki.apache.org/solr/SchemaXml).

The default schema.xml file comes with some preset fields made to work with the example data. If you don't already have a schema.xml setup, we recommend using a simplified "fields" section like this:

    <fields>
        <field name="id" type="string" indexed="true" stored="true" required="true" />
        <field name="text" type="text" indexed="true" stored="false" multiValued="true"/>
        <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/>
        <field name="spell" type="textSpell" indexed="true" stored="true" multiValued="true"/>
        <dynamicField name="*_i"  type="sint"    indexed="true"  stored="true"/>
        <dynamicField name="*_s"  type="string"  indexed="true"  stored="true" multiValued="true"/>
        <dynamicField name="*_l"  type="slong"   indexed="true"  stored="true"/>
        <dynamicField name="*_t"  type="text"    indexed="true"  stored="true" multiValued="true"/>
        <dynamicField name="*_b"  type="boolean" indexed="true"  stored="true"/>
        <dynamicField name="*_f"  type="sfloat"  indexed="true"  stored="true"/>
        <dynamicField name="*_d"  type="sdouble" indexed="true"  stored="true"/>
        <dynamicField name="*_dt" type="date"    indexed="true"  stored="true"/>
        <dynamicField name="random*" type="random" />
        <dynamicField name="*_facet" type="string" indexed="true" stored="true" multiValued="true" />
        <dynamicField name="*_display" type="string" indexed="false" stored="true" />
    </fields>

Additionally, replace all of the tags after the "fields" section, and before the </schema> tag with this:

    <uniqueKey>id</uniqueKey>
    <defaultSearchField>text</defaultSearchField>
    <solrQueryParser defaultOperator="OR"/>
    <copyField source="*_facet" dest="text"/>

Now you have a basic schema.xml file ready. Other fields can be specified, including a primary document title (title_display) and format (format), but these are easily configured in your application's CatalogController.

Fields that are "indexed" are searchable.

Fields that are "stored" are can be viewed/displayed from the Solr search results.

The fields with asterisks ('*') in their names are "dynamic" fields. These allow you to create arbitrary tags at index time.

The *_facet field can be used for creating your facets. When you index, simply define a field with _facet on the end: category_facet

The *_display field can be used for storing text that doesn't need to be indexed. An example would be the raw MARC for a record's detail view: raw_marc_display

For text that will be queried (and possibly displayed), use the *_t type field for tokenized text (text broken into pieces/words) or the *_s type for queries that should exactly match the field contents: description_t url_s

The Blacklight application is generic enough to work with any Solr schema, but to manipulate the search results and single record displays, you'll need to know the stored fields in your indexed documents.

For more information, refer to the Solr documentation: http://wiki.apache.org/solr/SchemaXml

solrconfig.xml

Solr uses the solrconfig.xml file to define searching configurations, set cache options, etc. You can find the examples/solr/conf/solrconfig.xml in the distribution directory you just uncompressed.

Documentation about the solrconfig.xml file is available at (http://wiki.apache.org/solr/SolrConfigXml).

Blacklight expects two request handlers to be defined -- one to handle general search requests and one to handle single-document lookup. The names of these request handlers are configurable, but are called "search" and "document" respectively, out of the box.

Solr Search Request Handlers

When Blacklight does a collection search, it sends a request to a Solr request handler named "search". The most important settings in this handler definition are the "fl" param (field list) and the facet params.

The "fl" param specifies which fields are returned in a Solr response. The facet related params set up the faceting mechanism.

Find out more about the basic params: http://wiki.apache.org/solr/DisMaxRequestHandler

Find out more about the faceting params: http://wiki.apache.org/solr/SimpleFacetParameters

How the "fl" param works in Blacklight's request handlers

Blacklight comes with a set of "default" views for rendering each document in a search results page. This view simply loops through all of the fields returned in each document in the Solr response. The "fl" (field list) param tells Solr which fields to include in the documents in the response ... and these are the fields rendered in the Blacklight default views.
Thus, the fields you want rendered must be specified in "fl". Note that only "stored" fields will be available; if you want a field to be rendered in the result, it must be "stored" per the field definition in schema.xml.

The "fl" parameter definition in the "search" handler looks like this:

    <str name="fl">id,score,author_display,(....lots of other fields)</str>

You may also use an asterisk plus "score":

    <str name="fl">*,score</str>
How the facet params work in Blacklight's request handlers

In the search results view, Blacklight will look into the Solr response for facets. If you specify any facet.field params in your "search" handler, they will automatically get displayed in the facets list:

    <str name="facet.field">format</str>
    <str name="facet.field">language_facet</str>
Blacklight's "search" request handler: for search results

When Blacklight displays a list of search results, it uses a Solr request handler named "search." Thus, the field list (fl param) for the "search" request handler should be tailored to what will be displayed in a search results page. Generally, this will not include fields containing a large quantity of text. The facet param should contain the facets to be displayed with the search results.

    <requestHandler name="search" class="solr.SearchHandler" >
        <lst name="defaults">
            <str name="defType">dismax</str>
            <str name="echoParams">explicit</str>
            <!-- list fields to be returned in the "fl" param -->
            <str name="fl">*,score</str>

            <str name="facet">on</str>
            <str name="facet.mincount">1</str>
            <str name="facet.limit">10</str>

            <!-- list fields to be displayed as facets here. -->
            <str name="facet.field">format</str>
            <str name="facet.field">language_facet</str>

            <str name="q.alt">*:*</str>
        </lst>
    </requestHandler>
Blacklight's "document" request handler: for a single record

When Blacklight displays a single record it uses a Solr request handler named "document". The "document" handler doesn't necessarily need to be different than the "search" handler, but it can be used to control which fields are available to display a single document. In the example below, there is no faceting set (facets are not displayed with a single record) and the "rows" param is set to 1 (since there will only be a single record). Also, the field list ("fl" param) could include fields containing large text values if they are desired for record display. Is is acceptable to include large amounts of data, because this handler should only be used to query for one document:

<requestHandler name="document" class="solr.SearchHandler">
    <lst name="defaults">
        <str name="echoParams">explicit</str>
        <str name="fl">*</str>
        <str name="rows">1</str>
        <str name="q">{!raw f=id v=$id}</str>
        <!-- use id=blah instead of q=id:blah -->
    </lst>
</requestHandler>

A Solr query for a single record might look like this: http://(yourSolrBaseUrl)/solr/select?id=my_doc_id&qt=document

Blacklight Solr Schema and Solrconfig File Templates

Blacklight provides schema.xml and solrconfig.xml files as starting points:

https://github.com/projectblacklight/blacklight-jetty/blob/master/solr/blacklight-core/conf/schema.xml

https://github.com/projectblacklight/blacklight-jetty/blob/master/solr/blacklight-core/conf/solrconfig.xml

SolrMARC: from Marc data to Solr documents

The SolrMARC project is designed to create a Solr index from raw MARC data.
It can be configured easily and used with the basic parsing and indexing supplied. It is also readily customized for a site's unique requirements.

The project software and documentation is available at http://code.google.com/p/solrmarc

Blacklight comes with an embedded SolrMarc, with some default config that matches the default Blacklight setup, and provides some rake tasks to easily index docs with SolrMarc according to your app's environment. There is no need to manually install/configure SolrMarc yourself. From your application's home directory simply run:

  rake solr:marc:index:info

to see options. Run rake solr:marc:index to actually do indexing. Like all rake tasks, by default this will use your 'development' environment; add "RAILS_ENV=production" to instead index to the solr you've labelled production in your config/solr.yml file.

The solrmarc config files are in your app's config/SolrMarc directory, you can edit them there for local config.

If you'd like to use a different or more recent version of SolrMarc.jar, you can put it in your app at ./solr_marc/SolrMarc.jar, and the built-in rake tasks will use your local SolrMarc.jar instead of the one bundled with Blacklight.

Something went wrong with that request. Please try again.