Solr River plugin for elasticsearch
Java
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
src
.travis.yml
CHANGES.txt
LICENSE.txt
NOTICE.txt
README.md
pom.xml

README.md

Solr River Plugin for ElasticSearch

The Solr River plugin allows to import data from Apache Solr to elasticsearch.

Deprecation warning: rivers are deprecated in elasticsearch, hence the Solr River plugin is not maintained anymore. Read this article for more info on the rivers deprecation.

In order to install the latest version of the plugin, simply run: bin/plugin install river-solr -url http://bit.ly/1qzA7lB. You can copy paste the url of a specific version from the table below, depending on the elasticsearch version you're running.

Versions

Solr River Plugin Elasticsearch
master 1.3.x -> 1.4.x
2.1 1.3.x -> 1.4.x
2.0 1.0.x -> 1.2.x
1.1 0.90.x
1.0.4 0.90.0
1.0.3 0.20.0 -> 0.20.6
1.0.2 0.20.0 -> 0.20.6
1.0.1 0.19.3 -> 0.19.12
1.0.0 0.19.3 -> 0.19.12

Getting Started

The Solr River allows to query a running Solr instance and index the returned documents in elasticsearch. It retrieves documents via json response writer, through http get requests (solrj is not used anymore to communicate with Solr).

All the common query parameters are supported.

The solr river is not meant to keep solr and elasticsearch in sync, that's why it automatically deletes itself on completion, so that the river doesn't start up again at every node restart. This is the default behaviour, which can be disabled through the close_on_completion parameter.

Installation

Here is how you can easily create the river and index data from Solr, just providing the solr url and the query to execute:

curl -XPUT localhost:9200/_river/solr_river/_meta -d '
{
    "type" : "solr",
    "solr" : {
        "url" : "http://localhost:8080/solr/",
        "q" : "*:*"
    }
}'

All supported parameters are optional. The following example request contains all the parameters that are supported together with the corresponding default values applied when not present.

{
    "type" : "solr",
    "close_on_completion" : "true",
    "solr" : {
        "url" : "http://localhost:8983/solr/",
        "q" : "*:*",
        "fq" : "",
        "fl" : ""
        "qt" : "",
        "uniqueKey" : "id",
        "rows" : 10
    },
    "index" : {
        "index" : "solr",
        "type" : "import",
        "bulk_size" : 100,
        "max_concurrent_bulk" : 10,
        "mapping" : "",
        "settings": ""
    }
}

The fq and fl parameters can be provided as either an array or a single value.

You can provide your own mapping while creating the river, as well as the index settings, which will be used when creating the new index if needed.

The index is created when not already existing, otherwise the documents are added to the existing one with the configured name.

The documents are indexed using the bulk api. You can control the size of each bulk (default 100) and the maximum number of concurrent bulk operations (default is 10). Once the limit is reached the indexing will slow down, waiting for one of the bulk operations to finish its work; no documents will be lost.

Transform documents

Since version 1.0.3 it's possible to transform the documents via scripting. The feature works exactly as the update api. The needed parameters can be specified within the transform section while registering the river, like this:

{
    "type" : "solr",
    "solr" : {
        "url" : "http://localhost:8983/solr/",
        "q" : "*:*",
    },
    "index" : {
        "index" : "solr",
        "type" : "import",
    },
    "transform" : {
        "script" : "ctx._source.counter += count",
        "params" : {
            "count" : 4
        }
    }
}

The example above increments by 4 the content of the counter field for every document right before the indexing process in elasticsearch. Note that dynamic scripting needs to be enabled for the above to work.

Limitations

  • only stored fields can be retrieved from Solr, therefore indexed in elasticsearch
  • the river is not meant to keep elasticsearch in sync with Solr, but only to import data once. It's possible to register the river multiple times in order to import different sets of documents though, even from different solr instances.
  • it's recommended to create the mapping given the existing solr schema in order to apply the correct text analysis while importing the documents. In the future there might be an option to auto generating it from the Solr schema.

License

This software is licensed under the Apache 2 license, quoted below.

Copyright 2015 Luca Cavanna

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.