diff --git a/README.md b/README.md new file mode 100644 index 0000000..2a0cf6d --- /dev/null +++ b/README.md @@ -0,0 +1,252 @@ +# solrizer + +[![Build Status](https://travis-ci.org/projecthydra/solrizer.png?branch=master)](https://travis-ci.org/projecthydra/solrizer) +[![Gem Version](https://badge.fury.io/rb/solrizer.png)](http://badge.fury.io/rb/solrizer) + +A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from +the command line, or as a JMS listener. + +Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a +data source and write solr documents into a solr instance, you need to use an implementation specific gem, such as +"solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a +fedora repository and writing to a solr instance. + + +## Installation + +The gem is hosted on [rubygems.org](http://rubygems.org/gems/solrizer). The best way to manage the gems for your project +is to use bundler. Create a Gemfile in the root of your application and include the following: + + + source "http://rubygems.org" + gem 'solrizer' + +Then: + + bundle install + +## Usage + +### Fire up the console: + +The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer. + +Start up a console and load solrizer: + + > irb + > require "rubygems" + > require "solrizer" + +### Field Mapper + +The `FieldMapper` maps term names and values to Solr fields, based on the term's data type and any index_as options. +Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr +[schema.xml](https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml). + +More information on the conventions followed for the dynamic solr fields is on the +[wiki page](https://github.com/projecthydra/hydra-head/wiki/Solr-Schema). + +To examine all of Solrizer's field names, open up a ruby console: + + + > require 'solrizer' + => true + > default_mapper = Solrizer::FieldMapper.new + => # + > default_mapper.solr_name("foo",:searchable, type: :string) + => "foo_teim" + > default_mapper.solr_name("foo",:searchable, type: :date) + => "foo_dtim" + > default_mapper.solr_name("foo",:searchable, type: :integer) + => "foo_iim" + > default_mapper.solr_name("foo",:facetable, type: :string) + => "foo_sim" + > default_mapper.solr_name("foo",:facetable, type: :integer) + => "foo_sim" + > default_mapper.solr_name("foo",:sortable, type: :string) + => "foo_si" + > default_mapper.solr_name("foo",:displayable, type: :string) + => "foo_ssm" + +### Default indexing strategies + + > solr_doc = Hash.new + > Solrizer.insert_field(solr_doc, 'title', 'whatever', :stored_searchable) + => {"title_tesim"=>["whatever"]} + + > Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable) + => {"pub_date_si"=>"Nov 2012", "pub_date_ssm"=>["Nov 2012"]} + +### Indexing dates + +as a date: + + > solr_doc = {} + > Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable) + => {"pub_date_dtim"=>["2012-11-07T00:00:00Z"]} + +or as a string: + + > solr_doc = {} + > Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable) + => {"pub_date_dti"=>"2012-11-07T00:00:00Z", "pub_date_ssm"=>["2012-11-07"]} + +or a string that is stored as a date: + + > solr_doc = {} + > Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable) + => {"pub_date_dtsim"=>["2013-01-29T00:00:00Z"]} + +### Custom indexing strategies + +#### Create your own index descriptor + + > solr_doc = {} + > displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored) + > Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable) + => {"some_count_isi"=>"45"} + +#### Override the defaults + +We can override the default indexing methods within `Solrizer::DefaultDescriptors` + +Here's the default behavior: + + > solr_doc = {} + > Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable) + => {"title_sim"=>["foobar"]} + +But let's override that by redefining `:facetable` + + module Solrizer + module DefaultDescriptors + def self.facetable + Descriptor.new(:string, :indexed, :stored) + end + end + end + +Now, `:facetable` will return something different: + + > solr_doc = {} + > Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable) + => {"title_ssi"=>"foobar"} + +#### Creating your own indexers + + module MyMappers + def self.mapper_one + Solrizer::Descriptor.new(:string, :indexed, :stored) + end + end + +Now, set Solrizer's field mapper to use our new module: + + > solr_doc = {} + > Solrizer::FieldMapper.descriptors = [MyMappers] + => [MyMappers] + > Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one) + => {"title_ssi"=>"foobar"} + +### Using OM + + t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... } + +But now you may also pass an Descriptor instance if that works for you: + + indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored) + t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... } + +### Extractor and Extractor Mixins + +Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents: + + > extractor = Solrizer::Extractor.new + > solr_doc = Hash.new + > extractor.format_node_value(["foo ","\n bar"]) + => "foo bar" + > extractor.insert_solr_field_value(solr_doc, "foo","bar") + => {"foo"=>"bar"} + > extractor.insert_solr_field_value(solr_doc,"foo","baz") + => {"foo"=>["bar", "baz"]} + > extractor.insert_solr_field_value(solr_doc, "boo","hoo") + => {"foo"=>["bar", "baz"], "boo"=>"hoo"} + +#### Solrizer provides some default mixins: + +`Solrizer::HTML::Extractor` provides html_to_solr method and `Solrizer::XML::Extractor` provides xml_to_solr method: + + > Solrizer::XML::Extractor + > extractor = Solrizer::Extractor.new + > xml = "barbaz" + > extractor.xml_to_solr(xml) + => {:foo_tesim=>"bar", :bar_tesim=>"baz"} + +#### Solrizer::XML::TerminologyBasedSolrizer + +Another powerful mixin for use with classes that include the `OM::XML::Document` module is +`Solrizer::XML::TerminologyBasedSolrizer`. The methods provided by this module map provides a robust way of mapping +terms and solr fields via om terminologies. A notable example can be found in `ActiveFedora::NokogiriDatatstream`. + +## JMS Listener for Hydra Rails Applications + +### The executables: solrizer and solrizerd + +The solrizer gem provides two executables: + + * solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly. + * solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests. + +### Usage + +The usage for solrizerd is as follows: + + solrizerd command --hydra_home PATH [options] + +The commands are as follows: + * start start an instance of the application + * stop stop all instances of the application + * restart stop all instances and restart them afterwards + * status show status (PID) of application instances + +Required parameters: + +--hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies. + +The options: + * -p, --port Stomp port 61613 + * -o, --host Host to connect to localhost + * -u, --user User name for stomp listener + * -w, --password Password for stomp listener + * -d, --destination Topic to listen to (default: /topic/fedora.apim.update) + * -h, --help Display this screen + +Note: + +Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs. + +## Note on Patches/Pull Requests + +* Fork the project. +* Make your feature addition or bug fix. +* Add tests for it. This is important so I don't break it in a + future version unintentionally. +* Commit, do not mess with rake file, version, or history. + (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull) +* Send me a pull request. Bonus points for topic branches. + +## Acknowledgments + +### Technical Lead + +Matt Zumwalt ("MediaShelf":http://yourmediashelf.com) + +### Thanks to + +* Douglas Kim, who created the initial code base for Solrizer. +* Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks. +* Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer. + +## Copyright + +Copyright (c) 2010 Matt Zumwalt. See LICENSE for details. diff --git a/README.textile b/README.textile deleted file mode 100644 index 50e1c0d..0000000 --- a/README.textile +++ /dev/null @@ -1,249 +0,0 @@ -h1. solrizer - -A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from the command line, or as a JMS listener. - -Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a -datasource and write solr documents into a solr instance, you need to use an implementation specific gem, such as -"solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a fedora repository and writing to a solr instance. - - -h2. Installation - -The gem is hosted on rubygems.org. The best way to manage the gems for your project is to use bundler. Create a Gemfile in the root of your application and include the following: - -
-source "http://rubygems.org"
-
-gem 'solrizer'
-
- -Then: - -
bundle install
- -h2. Usage - -h3. Fire up the console: - -The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer. - -Start up a console and load solrizer: - -
-irb
-require "rubygems"
-require "solrizer"
-
- - -h3. Field Mapper - -The FieldMapper maps term names and values to Solr fields, based on the term’s data type and any index_as options. Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr schema.xml file. A copy of that is available : - https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml - -More information on the conventions followed for the dynamic solr fields is here: - https://github.com/projecthydra/hydra-head/wiki/Solr-Schema - -
-default_mapper = Solrizer::FieldMapper.new
-
-# some of the default mappings in solrizer
-default_mapper.solr_name("foo",:string,:searchable)     # returns foo_tesim
-default_mapper.solr_name("foo",:date,:searchable)       # returns foo_dtsim
-default_mapper.solr_name("foo",:integer,:searchable     # returns foo_isim
-default_mapper.solr_name("foo",:string,:facetable)      # returns foo_sim
-default_mapper.solr_name("foo",:integer,:facetable)     # returns foo_iim
-default_mapper.solr_name("foo",:string,:sortable)       # returns foo_si
-default_mapper.solr_name("foo",:string,:displayable)    # returns foo_ssm
-
- -## Using default indexing strategies - -
-solr_doc = {}
-Solrizer.insert_field(solr_doc, 'title', 'whatever', :searchable) 
-=> {"title_tesim"=>["whatever"]}
-
-Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable) 
-=> {"title_tesim"=>["whatever"], "pub_date_ssi"=>["Nov 2012"], "pub_date_ssm"=>["Nov 2012"]}
-
- -h5. You can also index dates -
-# as a date
-solr_doc = {}
-Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
-=> {"pub_date_dtsi"=>["2012-11-07T00:00:00Z"]}
-
-# or as a string
-solr_doc = {}
-Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
-=> {"pub_date_ssi"=>["2012-11-07"], "pub_date_ssm"=>["2012-11-07"]}
-
-# or a string that is stored as a date
-solr_doc = {}
-Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
-=> {"pub_date_dtsi"=>["2013-01-29T00:00:00Z"]}
-
- - -h3. Using a custom indexing strategy -All you have to do is create your own index descriptor: -
-solr_doc = {}
-displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
-Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
-{"some_count_isi"=>["45"]}
-
- -h5. Changing the behavior of a default descriptor - -Simply override the methods within Solrizer::DefaultDescriptors -
-# before
-solr_doc = {}
-Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
-=> {"title_sim"=>["foobar"]}
-
-# redefine facetable:
-module Solrizer
-  module DefaultDescriptors
-    def self.facetable
-      Descriptor.new(:string, :indexed, :stored)
-    end
-  end
-end
-
-# after
-solr_doc = {}
-Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
-=> {"title_ssi"=>["foobar"]}
-
- - -h3. Creating your own Indexers -
-module MyMappers
-  def self.mapper_one
-    Solrizer::Descriptor.new(:string, :indexed, :stored)
-  end
-end
-
-solr_doc = {}
-
-Solrizer::FieldMapper.descriptors = [MyMappers]
-=> [MyMappers]
-
-Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
-=> {"title_ssi"=>["foobar"]}
-
- -h3. Using OM -Same as it ever was: -
-    t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }
-
- -But now you may also pass an Descriptor instance if that works for you: -
-    indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
-    t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
-
-
- -h3. Extractor and Extractor Mixins - -Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents: - -
-extractor = Solrizer::Extractor.new
-
-extractor.format_node_value(["foo     ","\n      bar"])                     # returns "foo bar"
-
-solr_doc = Hash.new
-extractor.insert_solr_field_value(solr_doc, "foo","bar")         # solr_doc is now {"foo" => ["bar"]}
-extractor.insert_solr_field_value(solr_doc,"foo","baz")    # solr_doc is now {"foo" => ["bar","baz"]}
-extractor.insert_solr_field_value(solr_doc, "boo","hoo")         # solr_doc is now {"foo" => ["bar","baz"], "boo" => ["hoo"]}
-
- -h4. Solrizer provides some default mixins: - -* Solrizer::HTML::Extractor -=> provides html_to_solr method -* Solrizer::XML::Extractor -=> provides xml_to_solr method - -
-xml = "barbaz"
-
-extractor.xml_to_solr(xml)      # returns {:foo_tesim=>"bar", :bar_tesim=>"baz"}
-
- -h4. Solrizer::XML::TerminologyBasedSolrizer - -Another powerful mixin for use with classes that include the OM::XML::Document module is Solrizer::XML::TerminologyBasedSolrizer. -The methods provided by this module map provides a robust way of mapping terms and solr fields via om terminologies. A notable example -can be found in ActiveFedora::NokogiriDatatstream. - - -h2. JMS Listener for Hydra Rails Applications - -h3. The executables: solrizer and solrizerd - -The solrizer gem provides two executables: - - * solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly. - * solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests. - -h3. Usage - -The usage for solrizerd is as follows: - -
- solrizerd command --hydra_home PATH [options] 
-
- -The commands are as follows: - * start start an instance of the application - * stop stop all instances of the application - * restart stop all instances and restart them afterwards - * status show status (PID) of application instances - -Required parameters: - ---hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies. - -The options: - * -p, --port Stomp port 61613 - * -o, --host Host to connect to localhost - * -u, --user User name for stomp listener - * -w, --password Password for stomp listener - * -d, --destination Topic to listen to (default: /topic/fedora.apim.update) - * -h, --help Display this screen - -Note: - -Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs. - - -h2. Note on Patches/Pull Requests - -* Fork the project. -* Make your feature addition or bug fix. -* Add tests for it. This is important so I don't break it in a - future version unintentionally. -* Commit, do not mess with rake file, version, or history. - (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull) -* Send me a pull request. Bonus points for topic branches. - -h2. Acknowledgements - -Technical Lead: Matt Zumwalt ("MediaShelf":http://yourmediashelf.com) - -Thanks to - -Douglas Kim, who created the initial code base for Solrizer. -Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks. -Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer. - -h2. Copyright - -Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.