Sunspot 2.0 README

johnf edited this page Aug 12, 2012 · 7 revisions

Sunspot 2.0

The purpose of this document is to provide a framework for the development of Sunspot 2.0 using the README-driven development model. Sunspot 2.0 does not yet exist, but this document aims to describe the functionality that we hope to achieve when we build it.

Installation

Rails 3

Just add Sunspot to your Gemfile:

group :production do
  gem 'sunspot-client' # The gem-packaged Solr server isn't appropriate for production
end

group :test, :development do
  gem 'sunspot' # Meta-package of sunspot-client and sunspot-server
end

Rails 2

Install the gem from your shell:

sudo gem install sunspot
# To install the optional packaged Solr server (recommended for development):
sudo gem install sunspot_solr

Then add the dependency to your environment.rb:

config.gem 'sunspot'

Note: The sunspot_rails gem no longer needs to be installed. As of version 2.0, Sunspot will automatically include Rails integration if you load it in a Rails environment; it still works fine in a non-Rails environment as well.

Configuration and running

Sunspot can be run without any explicit configuration, but if you're using Rails, the easiest way to maintain a consistent configuration across your team is to use the built-in generator to add configurations to your project:

script/rails generate sunspot

This will create the following new files in the current directory:

config/sunspot.yml
solr/conf/schema.xml
solr/conf/solrconfig.xml
solr/data/.gitignore

The sunspot.yml file is used for application-level configuration of Sunspot. This includes a URL at which Sunspot can access Solr; if the hostname of this URL is localhost or 127.0.0.1, the sunspot-solr executable will also use the configuration you've specified to start up the bundled Solr instance. An example sunspot.yml might look like this:

production:
  solr: http://solr.my-host.com/solr
development:
  solr:
    url: http://localhost:8982/solr
    max_memory: 1024M
    # TK more configuration options here

The files in the solr/conf directory are used directly by the bundled Solr instance when you run it locally. You probably won't need to change it, but if you need advanced customization of Solr's behavior, these files are where you can do that. The solr/data directory contains your actual Solr index on disk, and is thus excluded from Git for you.

To start Solr in your development environment, simply run:

sunspot solr start # sunspot_solr gem must be installed

If you run this from the root of a Rails project, Sunspot will detect that and use your config/sunspot.yml if it's present.

Setting up your classes for search

Sunspot is designed to index and search Ruby objects that are persisted to a separate primary data store. Sunspot supports ActiveRecord, DataMapper, Mongoid, and MongoMapper [TK what else?] out of the box; it's quite easy to add support for other persistence layers. See the documentation for Sunspot::Adapter.

Configuring a model class for search primarily consists of defining which fields Sunspot should index, and setting those fields up with various options. Fields do not need to correspond to database columns; Sunspot will happily any index the return value of any method your object responds to.

The basics

The examples in this README all assume we're building a straightforward blogging platform. Let's start with a simple configuration for our Post model.

class Post < ActiveRecord::Base
  include Sunspot::Searchable

  index :body
  index :blog_id
end

If your field names correspond to database columns (or predefined fields in Mongoid, etc.), Sunspot will infer the Solr field type from the column type. String columns are inferred as fulltext fields; other types are inferred as attribute fields of the corresponding type. Fulltext fields and attribute fields are quite different in their properties and usage.

Fulltext fields always have the type fulltext, and are used for keyword search. Solr breaks apart the data from fulltext fields into individual words, and when a fulltext search is performed, documents are matched against search terms on a word-by-word basis.

Attribute fields, on the other hand, are scalar data, and are indexed as-is without any analysis. Attribute fields can have several scalar types: string, integer, long, float, double, date, time, and boolean are the main ones. You can think of attribute fields as equivalent to columns in a database: they can be used for filtering search results to a certain scope (e.g. only return results with a blog_id of 1); ordering results; and faceting, a topic we will cover in more depth later in this README.

Populating data

The above example uses the simplest method of populating fields; Sunspot will simply call the method named by the field, and index the return value if it's non-nil. If you wish to give the field a different name from the method that populates it, use the :using option:

class Post < ActiveRecord::Base
  index :my_blog_id, :using => :blog_id
end

This will populate a my_blog_id in Solr using the return value of the Post#blog_id method.

If you wish to populate a field with data that is not defined by a method on your model class, you can pass a block to the field definition; the block is evaluated in the context of the model instance, and the return value is indexed. For instance, perhaps we wish to index the number of comments on a given post:

class Post < ActiveRecord::Base
  has_many :comments

  index(:comments_count, :as => :integer) { comments.count }
end

In this case, since the field does not correspond directly to a database column, we must explicitly specify the field type. If Sunspot cannot infer the field type and no type is specified, it will assume it is fulltext.

Reference fields

A special type of attribute field is a reference field. These are fields that hold references to other persistent objects; they're particularly useful for faceting. For example, instead of our blog_id field above, we might simply index blog as a reference field:

class Post < ActiveRecord::Base
  belongs_to :blog

  index :blog
end

Now instead of working with an integer when using this field, we'll be working with actual Blog objects.

Indexing data from associations

Reference fields can also be passed a block, which is an easy way to index data from associated objects. Again, Sunspot will attempt to infer the field's type from the column type in the associated object.

class Post < ActiveRecord::Base
  belongs_to :blog

  index :blog do
    index :name
  end
end

Trie fields

TK

Universal field options

The following options are available on all fields:

:stored
By default, Sunspot does not add field data to Solr in a way that allows Solr to return that field data in search results; instead, Sunspot only stores the object's class name and primary key, and uses that information from the search result to load the original object out of the primary database. You can override this behavior on a per-field basis to instruct Solr to return the field data in search results; in certain cases, this can allow you to bypass looking up the original objects in the database altogether, giving you a performance boost.
:as
**Advanced.** Usually, Sunspot constructs an internal field name for your fields based on the field type and options you've set; Sunspot's built-in Solr schema is set up to follow the same naming conventions. In certain cases, such as legacy schemas or for functionality not supported by Sunspot, you may want to override this and directly set the field name that will be used internally.

Fulltext field options

TK

Attribute field options

TK

Adding your objects to Solr

TK

Full reindexing

$ sunspot reindex

TK

Automatic lifecycle indexing

TK

Indexing vs. committing

TK

Searching

TK

Fulltext Search

TK

Boosting and phrase fields

TK

Highlighting

TK

More like this

TK

Scoping results

Post.search do
  where(:blog_id => 1)
  where(:comments_count).gt(0)
end

TK

Ordering results

Faceting

TK

Field facets

TK

Query Facets

TK

Geospatial search

TK