Skip to content
Browse files

Updating README and converting to markdown

  • Loading branch information...
1 parent 82071cc commit a222cf32a70fe672b143872e99565c295145ab1c @awead awead committed Oct 4, 2013
Showing with 252 additions and 249 deletions.
  1. +252 −0 README.md
  2. +0 −249 README.textile
View
252 README.md
@@ -0,0 +1,252 @@
+# solrizer
+
+[![Build Status](https://travis-ci.org/projecthydra/solrizer.png?branch=master)](https://travis-ci.org/projecthydra/solrizer)
+[![Gem Version](https://badge.fury.io/rb/solrizer.png)](http://badge.fury.io/rb/solrizer)
+
+A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from
+the command line, or as a JMS listener.
+
+Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
+data source and write solr documents into a solr instance, you need to use an implementation specific gem, such as
+"solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a
+fedora repository and writing to a solr instance.
+
+
+## Installation
+
+The gem is hosted on [rubygems.org](http://rubygems.org/gems/solrizer). The best way to manage the gems for your project
+is to use bundler. Create a Gemfile in the root of your application and include the following:
+
+
+ source "http://rubygems.org"
+ gem 'solrizer'
+
+Then:
+
+ bundle install
+
+## Usage
+
+### Fire up the console:
+
+The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer.
+
+Start up a console and load solrizer:
+
+ > irb
+ > require "rubygems"
+ > require "solrizer"
+
+### Field Mapper
+
+The `FieldMapper` maps term names and values to Solr fields, based on the term's data type and any index_as options.
+Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr
+[schema.xml](https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml).
+
+More information on the conventions followed for the dynamic solr fields is on the
+[wiki page](https://github.com/projecthydra/hydra-head/wiki/Solr-Schema).
+
+To examine all of Solrizer's field names, open up a ruby console:
+
+
+ > require 'solrizer'
+ => true
+ > default_mapper = Solrizer::FieldMapper.new
+ => #<Solrizer::FieldMapper:0x007fb47a273770 @id_field="id">
+ > default_mapper.solr_name("foo",:searchable, type: :string)
+ => "foo_teim"
+ > default_mapper.solr_name("foo",:searchable, type: :date)
+ => "foo_dtim"
+ > default_mapper.solr_name("foo",:searchable, type: :integer)
+ => "foo_iim"
+ > default_mapper.solr_name("foo",:facetable, type: :string)
+ => "foo_sim"
+ > default_mapper.solr_name("foo",:facetable, type: :integer)
+ => "foo_sim"
+ > default_mapper.solr_name("foo",:sortable, type: :string)
+ => "foo_si"
+ > default_mapper.solr_name("foo",:displayable, type: :string)
+ => "foo_ssm"
+
+### Default indexing strategies
+
+ > solr_doc = Hash.new
+ > Solrizer.insert_field(solr_doc, 'title', 'whatever', :stored_searchable)
+ => {"title_tesim"=>["whatever"]}
+
+ > Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable)
+ => {"pub_date_si"=>"Nov 2012", "pub_date_ssm"=>["Nov 2012"]}
+
+### Indexing dates
+
+as a date:
+
+ > solr_doc = {}
+ > Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
+ => {"pub_date_dtim"=>["2012-11-07T00:00:00Z"]}
+
+or as a string:
+
+ > solr_doc = {}
+ > Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
+ => {"pub_date_dti"=>"2012-11-07T00:00:00Z", "pub_date_ssm"=>["2012-11-07"]}
+
+or a string that is stored as a date:
+
+ > solr_doc = {}
+ > Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
+ => {"pub_date_dtsim"=>["2013-01-29T00:00:00Z"]}
+
+### Custom indexing strategies
+
+#### Create your own index descriptor
+
+ > solr_doc = {}
+ > displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
+ > Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
+ => {"some_count_isi"=>"45"}
+
+#### Override the defaults
+
+We can override the default indexing methods within `Solrizer::DefaultDescriptors`
+
+Here's the default behavior:
+
+ > solr_doc = {}
+ > Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
+ => {"title_sim"=>["foobar"]}
+
+But let's override that by redefining `:facetable`
+
+ module Solrizer
+ module DefaultDescriptors
+ def self.facetable
+ Descriptor.new(:string, :indexed, :stored)
+ end
+ end
+ end
+
+Now, `:facetable` will return something different:
+
+ > solr_doc = {}
+ > Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
+ => {"title_ssi"=>"foobar"}
+
+#### Creating your own indexers
+
+ module MyMappers
+ def self.mapper_one
+ Solrizer::Descriptor.new(:string, :indexed, :stored)
+ end
+ end
+
+Now, set Solrizer's field mapper to use our new module:
+
+ > solr_doc = {}
+ > Solrizer::FieldMapper.descriptors = [MyMappers]
+ => [MyMappers]
+ > Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
+ => {"title_ssi"=>"foobar"}
+
+### Using OM
+
+ t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }
+
+But now you may also pass an Descriptor instance if that works for you:
+
+ indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
+ t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
+
+### Extractor and Extractor Mixins
+
+Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
+
+ > extractor = Solrizer::Extractor.new
+ > solr_doc = Hash.new
+ > extractor.format_node_value(["foo ","\n bar"])
+ => "foo bar"
+ > extractor.insert_solr_field_value(solr_doc, "foo","bar")
+ => {"foo"=>"bar"}
+ > extractor.insert_solr_field_value(solr_doc,"foo","baz")
+ => {"foo"=>["bar", "baz"]}
+ > extractor.insert_solr_field_value(solr_doc, "boo","hoo")
+ => {"foo"=>["bar", "baz"], "boo"=>"hoo"}
+
+#### Solrizer provides some default mixins:
+
+`Solrizer::HTML::Extractor` provides html_to_solr method and `Solrizer::XML::Extractor` provides xml_to_solr method:
+
+ > Solrizer::XML::Extractor
+ > extractor = Solrizer::Extractor.new
+ > xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
+ > extractor.xml_to_solr(xml)
+ => {:foo_tesim=>"bar", :bar_tesim=>"baz"}
+
+#### Solrizer::XML::TerminologyBasedSolrizer
+
+Another powerful mixin for use with classes that include the `OM::XML::Document` module is
+`Solrizer::XML::TerminologyBasedSolrizer`. The methods provided by this module map provides a robust way of mapping
+terms and solr fields via om terminologies. A notable example can be found in `ActiveFedora::NokogiriDatatstream`.
+
+## JMS Listener for Hydra Rails Applications
+
+### The executables: solrizer and solrizerd
+
+The solrizer gem provides two executables:
+
+ * solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
+ * solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
+
+### Usage
+
+The usage for solrizerd is as follows:
+
+ solrizerd command --hydra_home PATH [options]
+
+The commands are as follows:
+ * start start an instance of the application
+ * stop stop all instances of the application
+ * restart stop all instances and restart them afterwards
+ * status show status (PID) of application instances
+
+Required parameters:
+
+--hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
+
+The options:
+ * -p, --port Stomp port 61613
+ * -o, --host Host to connect to localhost
+ * -u, --user User name for stomp listener
+ * -w, --password Password for stomp listener
+ * -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
+ * -h, --help Display this screen
+
+Note:
+
+Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
+
+## Note on Patches/Pull Requests
+
+* Fork the project.
+* Make your feature addition or bug fix.
+* Add tests for it. This is important so I don't break it in a
+ future version unintentionally.
+* Commit, do not mess with rake file, version, or history.
+ (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
+* Send me a pull request. Bonus points for topic branches.
+
+## Acknowledgments
+
+### Technical Lead
+
+Matt Zumwalt ("MediaShelf":http://yourmediashelf.com)
+
+### Thanks to
+
+* Douglas Kim, who created the initial code base for Solrizer.
+* Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
+* Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.
+
+## Copyright
+
+Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.
View
249 README.textile
@@ -1,249 +0,0 @@
-h1. solrizer
-
-A lightweight, configurable tool for indexing metadata into solr. Can be triggered from within your application, from the command line, or as a JMS listener.
-
-Solrizer provides the baseline and structures for the process of solrizing. In order to actually read objects from a
-datasource and write solr documents into a solr instance, you need to use an implementation specific gem, such as
-"solrizer-fedora":https://github.com/projecthydra/solrizer-fedora, which provides the mechanics for reading from a fedora repository and writing to a solr instance.
-
-
-h2. Installation
-
-The gem is hosted on rubygems.org. The best way to manage the gems for your project is to use bundler. Create a Gemfile in the root of your application and include the following:
-
-<pre>
-source "http://rubygems.org"
-
-gem 'solrizer'
-</pre>
-
-Then:
-
-<pre>bundle install</pre>
-
-h2. Usage
-
-h3. Fire up the console:
-
-The code snippets in the following sections can be cut/pasted into your console, giving you the opportunity to play with Solrizer.
-
-Start up a console and load solrizer:
-
-<pre>
-irb
-require "rubygems"
-require "solrizer"
-</pre>
-
-
-h3. Field Mapper
-
-The FieldMapper maps term names and values to Solr fields, based on the term’s data type and any index_as options. Solrizer comes with default mappings to dynamic field types defined in the Hydra Solr schema.xml file. A copy of that is available :
- https://github.com/projecthydra/hydra-head/blob/master/hydra-core/lib/generators/hydra/templates/solr_conf/conf/schema.xml
-
-More information on the conventions followed for the dynamic solr fields is here:
- https://github.com/projecthydra/hydra-head/wiki/Solr-Schema
-
-<pre>
-default_mapper = Solrizer::FieldMapper.new
-
-# some of the default mappings in solrizer
-default_mapper.solr_name("foo",:string,:searchable) # returns foo_tesim
-default_mapper.solr_name("foo",:date,:searchable) # returns foo_dtsim
-default_mapper.solr_name("foo",:integer,:searchable # returns foo_isim
-default_mapper.solr_name("foo",:string,:facetable) # returns foo_sim
-default_mapper.solr_name("foo",:integer,:facetable) # returns foo_iim
-default_mapper.solr_name("foo",:string,:sortable) # returns foo_si
-default_mapper.solr_name("foo",:string,:displayable) # returns foo_ssm
-</pre>
-
-## Using default indexing strategies
-
-<pre>
-solr_doc = {}
-Solrizer.insert_field(solr_doc, 'title', 'whatever', :searchable)
-=> {"title_tesim"=>["whatever"]}
-
-Solrizer.insert_field(solr_doc, 'pub_date', 'Nov 2012', :sortable, :displayable)
-=> {"title_tesim"=>["whatever"], "pub_date_ssi"=>["Nov 2012"], "pub_date_ssm"=>["Nov 2012"]}
-</pre>
-
-h5. You can also index dates
-<pre>
-# as a date
-solr_doc = {}
-Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :searchable)
-=> {"pub_date_dtsi"=>["2012-11-07T00:00:00Z"]}
-
-# or as a string
-solr_doc = {}
-Solrizer.insert_field(solr_doc, 'pub_date', Date.parse('Nov 7th 2012'), :sortable, :displayable)
-=> {"pub_date_ssi"=>["2012-11-07"], "pub_date_ssm"=>["2012-11-07"]}
-
-# or a string that is stored as a date
-solr_doc = {}
-Solrizer.insert_field(solr_doc, 'pub_date', 'Jan 29th 2013', :dateable)
-=> {"pub_date_dtsi"=>["2013-01-29T00:00:00Z"]}
-</pre>
-
-
-h3. Using a custom indexing strategy
-All you have to do is create your own index descriptor:
-<pre>
-solr_doc = {}
-displearchable = Solrizer::Descriptor.new(:integer, :indexed, :stored)
-Solrizer.insert_field(solr_doc, 'some_count', 45, displearchable)
-{"some_count_isi"=>["45"]}
-</pre>
-
-h5. Changing the behavior of a default descriptor
-
-Simply override the methods within Solrizer::DefaultDescriptors
-<pre>
-# before
-solr_doc = {}
-Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
-=> {"title_sim"=>["foobar"]}
-
-# redefine facetable:
-module Solrizer
- module DefaultDescriptors
- def self.facetable
- Descriptor.new(:string, :indexed, :stored)
- end
- end
-end
-
-# after
-solr_doc = {}
-Solrizer.insert_field(solr_doc, 'title', 'foobar', :facetable)
-=> {"title_ssi"=>["foobar"]}
-</pre>
-
-
-h3. Creating your own Indexers
-<pre>
-module MyMappers
- def self.mapper_one
- Solrizer::Descriptor.new(:string, :indexed, :stored)
- end
-end
-
-solr_doc = {}
-
-Solrizer::FieldMapper.descriptors = [MyMappers]
-=> [MyMappers]
-
-Solrizer.insert_field(solr_doc, 'title', 'foobar', :mapper_one)
-=> {"title_ssi"=>["foobar"]}
-</pre>
-
-h3. Using OM
-Same as it ever was:
-<pre>
- t.main_title(:index_as=>[:facetable],:path=>"title", :label=>"title") { ... }
-</pre>
-
-But now you may also pass an Descriptor instance if that works for you:
-<pre>
- indexer = Solrizer::Descriptor.new(:integer, :indexed, :stored)
- t.main_title(:index_as=>[indexer],:path=>"title", :label=>"title") { ... }
-
-</pre>
-
-h3. Extractor and Extractor Mixins
-
-Solrizer::Extractor provides utilities for extracting solr fields from objects or inserting solr fields into documents:
-
-<pre>
-extractor = Solrizer::Extractor.new
-
-extractor.format_node_value(["foo ","\n bar"]) # returns "foo bar"
-
-solr_doc = Hash.new
-extractor.insert_solr_field_value(solr_doc, "foo","bar") # solr_doc is now {"foo" => ["bar"]}
-extractor.insert_solr_field_value(solr_doc,"foo","baz") # solr_doc is now {"foo" => ["bar","baz"]}
-extractor.insert_solr_field_value(solr_doc, "boo","hoo") # solr_doc is now {"foo" => ["bar","baz"], "boo" => ["hoo"]}
-</pre>
-
-h4. Solrizer provides some default mixins:
-
-* Solrizer::HTML::Extractor -=> provides html_to_solr method
-* Solrizer::XML::Extractor -=> provides xml_to_solr method
-
-<pre>
-xml = "<fields><foo>bar</foo><bar>baz</bar></fields>"
-
-extractor.xml_to_solr(xml) # returns {:foo_tesim=>"bar", :bar_tesim=>"baz"}
-</pre>
-
-h4. Solrizer::XML::TerminologyBasedSolrizer
-
-Another powerful mixin for use with classes that include the OM::XML::Document module is Solrizer::XML::TerminologyBasedSolrizer.
-The methods provided by this module map provides a robust way of mapping terms and solr fields via om terminologies. A notable example
-can be found in ActiveFedora::NokogiriDatatstream.
-
-
-h2. JMS Listener for Hydra Rails Applications
-
-h3. The executables: solrizer and solrizerd
-
-The solrizer gem provides two executables:
-
- * solrizer is a stomp consumer which listens for fedora.apim.updates and solrizes (or de-solrizes) objects accordingly.
- * solrizerd is a wrapper script that spawns a daemonized version of solrizer and handles start|stop|restart|status requests.
-
-h3. Usage
-
-The usage for solrizerd is as follows:
-
-<pre>
- solrizerd command --hydra_home PATH [options]
-</pre>
-
-The commands are as follows:
- * start start an instance of the application
- * stop stop all instances of the application
- * restart stop all instances and restart them afterwards
- * status show status (PID) of application instances
-
-Required parameters:
-
---hydra_home: this is the path to your hydra rails applications' root directory. Solrizerd needs this in order to load all your models and corresponding terminoligies.
-
-The options:
- * -p, --port Stomp port 61613
- * -o, --host Host to connect to localhost
- * -u, --user User name for stomp listener
- * -w, --password Password for stomp listener
- * -d, --destination Topic to listen to (default: /topic/fedora.apim.update)
- * -h, --help Display this screen
-
-Note:
-
-Since the solrizer script must fire up your hydra rails application, it must have all the gems installed that your hydra instance needs.
-
-
-h2. Note on Patches/Pull Requests
-
-* Fork the project.
-* Make your feature addition or bug fix.
-* Add tests for it. This is important so I don't break it in a
- future version unintentionally.
-* Commit, do not mess with rake file, version, or history.
- (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
-* Send me a pull request. Bonus points for topic branches.
-
-h2. Acknowledgements
-
-Technical Lead: Matt Zumwalt ("MediaShelf":http://yourmediashelf.com)
-
-Thanks to
-
-Douglas Kim, who created the initial code base for Solrizer.
-Chris Fitzpatrick, who patiently ran the first prototype through its paces for weeks.
-Bess Sadler, who created the JMS integration for Solrizer, generously served as a sounding board for numerous design issues around solr indexing, and pushes the technology forward with the skill of a true engineer.
-
-h2. Copyright
-
-Copyright (c) 2010 Matt Zumwalt. See LICENSE for details.

0 comments on commit a222cf3

Please sign in to comment.
Something went wrong with that request. Please try again.