Permalink
Browse files

Updating README and converting to markdown

  • Loading branch information...
awead committed Oct 4, 2013
1 parent 82071cc commit a222cf32a70fe672b143872e99565c295145ab1c
Showing with 252 additions and 249 deletions.
  1. +252 −0 README.md
  2. +0 −249 README.textile
View
252 README.md
@@ -0,0 +1,252 @@
+# solrizer
+
+[![Build Status](https://travis-ci.org/projecthydra/solrizer.png?branch=master)](https://travis-ci.org/projecthydra/solrizer)
+[![Gem Version](https://badge.fury.io/rb/solrizer.png)](http://badge.fury.io/rb/solrizer)
+
+A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from
+the command line, or as a JMS listener.
+
+Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
+data source and write solr documents into a solr instance, you need to use an implementation specific gem, such as
+"solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a
+fedora repository and writing to a solr instance.
+
+
+## Installation
+
+The gem is hosted on [rubygems.org](http://rubygems.org/gems/solrizer). The best way to manage the gems for your project
+is to use bundler. Create a Gemfile in the root of your application and include the following:
+
+
+ source "http://rubygems.org"
+ gem 'solrizer'
+
+Then:
+
+ bundle install
+
+## Usage
+
+### Fire up the console:
+
+The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer.
+
+Start up a console and load solrizer:
+
+ > irb
+ > require "rubygems"
+ > require "solrizer"
+
+### Field Mapper
+
+The `FieldMapper` maps term names and values to Solr fields, based on the term's data type and any index_as options.
+Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr
+[schema.xml](https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml).
+
+More information on the conventions followed for the dynamic solr fields is on the
+[wiki page](https://github.com/projecthydra/hydra-head/wiki/Solr-Schema).
+
+To examine all of Solrizer's field names, open up a ruby console:
+
+
+ > require 'solrizer'
+ => true
+ > default_mapper = Solrizer::FieldMapper.new
+ => #<Solrizer::FieldMapper:0x007fb47a273770 @id_field="id">
+ > default_mapper.solr_name("foo",:searchable, type: :string)
+ => "foo_teim"
+ > default_mapper.solr_name("foo",:searchable, type: :date)
+ => "foo_dtim"
+ > default_mapper.solr_name("foo",:searchable, type: :integer)
+ => "foo_iim"
+ > default_mapper.solr_name("foo",:facetable, type: :string)
+ => "foo_sim"
+ > default_mapper.solr_name("foo",:facetable, type: :integer)
+ => "foo_sim"
+ > default_mapper.solr_name("foo",:sortable, type: :string)
+ => "foo_si"
+ > default_mapper.solr_name("foo",:displayable, type: :string)
+ => "foo_ssm"
+
+### Default indexing strategies
+
+ > solr_doc = Hash.new
+ > Solrizer.insert_field(solr_doc, 'title', 'whatever', :stored_searchable)
+ => {"title_tesim"=>["whatever"]}
+
+ > Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable)
+ => {"pub_date_si"=>"Nov 2012", "pub_date_ssm"=>["Nov 2012"]}
+
+### Indexing dates
+
+as a date:
+
+ > solr_doc = {}
+ > Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
+ => {"pub_date_dtim"=>["2012-11-07T00:00:00Z"]}
+
+or as a string:
+
+ > solr_doc = {}
+ > Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
+ => {"pub_date_dti"=>"2012-11-07T00:00:00Z", "pub_date_ssm"=>["2012-11-07"]}
+
+or a string that is stored as a date:
+
+ > solr_doc = {}
+ > Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
+ => {"pub_date_dtsim"=>["2013-01-29T00:00:00Z"]}
+
+### Custom indexing strategies
+
+#### Create your own index descriptor
+
+ > solr_doc = {}
+ > displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
+ > Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
+ => {"some_count_isi"=>"45"}
+
+#### Override the defaults
+
+We can override the default indexing methods within `Solrizer::DefaultDescriptors`
+
+Here's the default behavior:
+
+ > solr_doc = {}
+ > Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
+ => {"title_sim"=>["foobar"]}
+
+But let's override that by redefining `:facetable`
+
+ module Solrizer
+ module DefaultDescriptors
+ def self.facetable
+ Descriptor.new(:string, :indexed, :stored)
+ end
+ end
+ end
+
+Now, `:facetable` will return something different:
+
+ > solr_doc = {}
+ > Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
+ => {"title_ssi"=>"foobar"}
+
+#### Creating your own indexers
+
+ module MyMappers
+ def self.mapper_one
+ Solrizer::Descriptor.new(:string, :indexed, :stored)
+ end
+ end
+
+Now, set Solrizer's field mapper to use our new module:
+
+ > solr_doc = {}
+ > Solrizer::FieldMapper.descriptors = [MyMappers]
+ => [MyMappers]
+ > Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
+ => {"title_ssi"=>"foobar"}
+
+### Using OM
+
+ t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }
+
+But now you may also pass an Descriptor instance if that works for you:
+
+ indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
+ t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
+
+### Extractor and Extractor Mixins
+
+Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
+
+ > extractor = Solrizer::Extractor.new
+ > solr_doc = Hash.new
+ > extractor.format_node_value(["foo ","\n bar"])
+ => "foo bar"
+ > extractor.insert_solr_field_value(solr_doc, "foo","bar")
+ => {"foo"=>"bar"}
+ > extractor.insert_solr_field_value(solr_doc,"foo","baz")
+ => {"foo"=>["bar", "baz"]}
+ > extractor.insert_solr_field_value(solr_doc, "boo","hoo")
+ => {"foo"=>["bar", "baz"], "boo"=>"hoo"}
+
+#### Solrizer provides some default mixins:
+
+`Solrizer::HTML::Extractor` provides html_to_solr method and `Solrizer::XML::Extractor` provides xml_to_solr method:
+
+ > Solrizer::XML::Extractor
+ > extractor = Solrizer::Extractor.new
+ > xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
+ > extractor.xml_to_solr(xml)
+ => {:foo_tesim=>"bar", :bar_tesim=>"baz"}
+
+#### Solrizer::XML::TerminologyBasedSolrizer
+
+Another powerful mixin for use with classes that include the `OM::XML::Document` module is
+`Solrizer::XML::TerminologyBasedSolrizer`. The methods provided by this module map provides a robust way of mapping
+terms and solr fields via om terminologies. A notable example can be found in `ActiveFedora::NokogiriDatatstream`.
+
+## JMS Listener for Hydra Rails Applications
+
+### The executables: solrizer and solrizerd
+
+The solrizer gem provides two executables:
+
+ * solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
+ * solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
+
+### Usage
+
+The usage for solrizerd is as follows:
+
+ solrizerd command --hydra_home PATH [options]
+
+The commands are as follows:
+ * start start an instance of the application
+ * stop stop all instances of the application
+ * restart stop all instances and restart them afterwards
+ * status show status (PID) of application instances
+
+Required parameters:
+
+--hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
+
+The options:
+ * -p, --port Stomp port 61613
+ * -o, --host Host to connect to localhost
+ * -u, --user User name for stomp listener
+ * -w, --password Password for stomp listener
+ * -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
+ * -h, --help Display this screen
+
+Note:
+
+Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
+
+## Note on Patches/Pull Requests
+
+* Fork the project.
+* Make your feature addition or bug fix.
+* Add tests for it. This is important so I don't break it in a
+ future version unintentionally.
+* Commit, do not mess with rake file, version, or history.
+ (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
+* Send me a pull request. Bonus points for topic branches.
+
+## Acknowledgments
+
+### Technical Lead
+
+Matt Zumwalt ("MediaShelf":http://yourmediashelf.com)
+
+### Thanks to
+
+* Douglas Kim, who created the initial code base for Solrizer.
+* Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
+* Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.
+
+## Copyright
+
+Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.
Oops, something went wrong.

0 comments on commit a222cf3

Please sign in to comment.