Skip to content

Commit

Permalink
Updating README and converting to markdown
Browse files Browse the repository at this point in the history
  • Loading branch information
awead committed Oct 4, 2013
1 parent 82071cc commit a222cf3
Show file tree
Hide file tree
Showing 2 changed files with 252 additions and 249 deletions.
252 changes: 252 additions & 0 deletions README.md
@@ -0,0 +1,252 @@
# solrizer

[![Build Status](https://travis-ci.org/projecthydra/solrizer.png?branch=master)](https://travis-ci.org/projecthydra/solrizer)
[![Gem Version](https://badge.fury.io/rb/solrizer.png)](http://badge.fury.io/rb/solrizer)

A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from
the command line, or as a JMS listener.

Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
data source and write solr documents into a solr instance, you need to use an implementation specific gem, such as
"solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a
fedora repository and writing to a solr instance.


## Installation

The gem is hosted on [rubygems.org](http://rubygems.org/gems/solrizer). The best way to manage the gems for your project
is to use bundler. Create a Gemfile in the root of your application and include the following:


source "http://rubygems.org"
gem 'solrizer'

Then:

bundle install

## Usage

### Fire up the console:

The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer.

Start up a console and load solrizer:

> irb
> require "rubygems"
> require "solrizer"

### Field Mapper

The `FieldMapper` maps term names and values to Solr fields, based on the term's data type and any index_as options.
Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr
[schema.xml](https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml).

More information on the conventions followed for the dynamic solr fields is on the
[wiki page](https://github.com/projecthydra/hydra-head/wiki/Solr-Schema).

To examine all of Solrizer's field names, open up a ruby console:


> require 'solrizer'
=> true
> default_mapper = Solrizer::FieldMapper.new
=> #<Solrizer::FieldMapper:0x007fb47a273770 @id_field="id">
> default_mapper.solr_name("foo",:searchable, type: :string)
=> "foo_teim"
> default_mapper.solr_name("foo",:searchable, type: :date)
=> "foo_dtim"
> default_mapper.solr_name("foo",:searchable, type: :integer)
=> "foo_iim"
> default_mapper.solr_name("foo",:facetable, type: :string)
=> "foo_sim"
> default_mapper.solr_name("foo",:facetable, type: :integer)
=> "foo_sim"
> default_mapper.solr_name("foo",:sortable, type: :string)
=> "foo_si"
> default_mapper.solr_name("foo",:displayable, type: :string)
=> "foo_ssm"

### Default indexing strategies

> solr_doc = Hash.new
> Solrizer.insert_field(solr_doc, 'title', 'whatever', :stored_searchable)
=> {"title_tesim"=>["whatever"]}

> Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable)
=> {"pub_date_si"=>"Nov 2012", "pub_date_ssm"=>["Nov 2012"]}

### Indexing dates

as a date:

> solr_doc = {}
> Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
=> {"pub_date_dtim"=>["2012-11-07T00:00:00Z"]}

or as a string:

> solr_doc = {}
> Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
=> {"pub_date_dti"=>"2012-11-07T00:00:00Z", "pub_date_ssm"=>["2012-11-07"]}

or a string that is stored as a date:

> solr_doc = {}
> Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
=> {"pub_date_dtsim"=>["2013-01-29T00:00:00Z"]}

### Custom indexing strategies

#### Create your own index descriptor

> solr_doc = {}
> displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
> Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
=> {"some_count_isi"=>"45"}

#### Override the defaults

We can override the default indexing methods within `Solrizer::DefaultDescriptors`

Here's the default behavior:

> solr_doc = {}
> Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
=> {"title_sim"=>["foobar"]}

But let's override that by redefining `:facetable`

module Solrizer
module DefaultDescriptors
def self.facetable
Descriptor.new(:string, :indexed, :stored)
end
end
end

Now, `:facetable` will return something different:

> solr_doc = {}
> Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
=> {"title_ssi"=>"foobar"}

#### Creating your own indexers

module MyMappers
def self.mapper_one
Solrizer::Descriptor.new(:string, :indexed, :stored)
end
end

Now, set Solrizer's field mapper to use our new module:

> solr_doc = {}
> Solrizer::FieldMapper.descriptors = [MyMappers]
=> [MyMappers]
> Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
=> {"title_ssi"=>"foobar"}

### Using OM

t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }

But now you may also pass an Descriptor instance if that works for you:

indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }

### Extractor and Extractor Mixins

Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:

> extractor = Solrizer::Extractor.new
> solr_doc = Hash.new
> extractor.format_node_value(["foo ","\n bar"])
=> "foo bar"
> extractor.insert_solr_field_value(solr_doc, "foo","bar")
=> {"foo"=>"bar"}
> extractor.insert_solr_field_value(solr_doc,"foo","baz")
=> {"foo"=>["bar", "baz"]}
> extractor.insert_solr_field_value(solr_doc, "boo","hoo")
=> {"foo"=>["bar", "baz"], "boo"=>"hoo"}

#### Solrizer provides some default mixins:

`Solrizer::HTML::Extractor` provides html_to_solr method and `Solrizer::XML::Extractor` provides xml_to_solr method:

> Solrizer::XML::Extractor
> extractor = Solrizer::Extractor.new
> xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
> extractor.xml_to_solr(xml)
=> {:foo_tesim=>"bar", :bar_tesim=>"baz"}

#### Solrizer::XML::TerminologyBasedSolrizer

Another powerful mixin for use with classes that include the `OM::XML::Document` module is
`Solrizer::XML::TerminologyBasedSolrizer`. The methods provided by this module map provides a robust way of mapping
terms and solr fields via om terminologies. A notable example can be found in `ActiveFedora::NokogiriDatatstream`.

## JMS Listener for Hydra Rails Applications

### The executables: solrizer and solrizerd

The solrizer gem provides two executables:

* solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
* solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.

### Usage

The usage for solrizerd is as follows:

solrizerd command --hydra_home PATH [options]

The commands are as follows:
* start start an instance of the application
* stop stop all instances of the application
* restart stop all instances and restart them afterwards
* status show status (PID) of application instances

Required parameters:

--hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.

The options:
* -p, --port Stomp port 61613
* -o, --host Host to connect to localhost
* -u, --user User name for stomp listener
* -w, --password Password for stomp listener
* -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
* -h, --help Display this screen

Note:

Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.

## Note on Patches/Pull Requests

* Fork the project.
* Make your feature addition or bug fix.
* Add tests for it. This is important so I don't break it in a
future version unintentionally.
* Commit, do not mess with rake file, version, or history.
(if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
* Send me a pull request. Bonus points for topic branches.

## Acknowledgments

### Technical Lead

Matt Zumwalt ("MediaShelf":http://yourmediashelf.com)

### Thanks to

* Douglas Kim, who created the initial code base for Solrizer.
* Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
* Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.

## Copyright

Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.

0 comments on commit a222cf3

Please sign in to comment.