Skip to content
outoftime edited this page May 17, 2011 · 7 revisions

Sunspot 2.0

The purpose of this document is to provide a framework for the development of Sunspot 2.0 using the README-driven development model. Sunspot 2.0 does not yet exist, but this document aims to describe the functionality that we hope to achieve when we build it.

Installation

Rails 3

Just add Sunspot to your Gemfile:

group :production do
  gem 'sunspot-client' # The gem-packaged Solr server isn't appropriate for production
end

group :test, :development do
  gem 'sunspot' # Meta-package of sunspot-client and sunspot-server
end

Rails 2

Install the gem from your shell:

sudo gem install sunspot
# To install the optional packaged Solr server (recommended for development):
sudo gem install sunspot_solr

Then add the dependency to your environment.rb:

    config.gem 'sunspot'

Note: The sunspot_rails gem no longer needs to be installed. As of version 2.0, Sunspot will automatically include Rails integration if you load it in a Rails environment; it still works fine in a non-Rails environment as well.

Configuration and running

Sunspot can be run without any explicit configuration, but if you're using Rails, the easiest way to maintain a consistent configuration across your team is to use the built-in generator to add configurations to your project:

    script/rails generate sunspot

This will create the following new files in the current directory:

config/sunspot.yml
solr/conf/schema.xml
solr/conf/solrconfig.xml
solr/data/.gitignore

The sunspot.yml file is used for application-level configuration of Sunspot. This includes a URL at which Sunspot can access Solr; if the hostname of this URL is localhost or 127.0.0.1, the sunspot-solr executable will also use the configuration you've specified to start up the bundled Solr instance. An example sunspot.yml might look like this:

    production:
      solr: http://solr.my-host.com/solr
    development:
      solr:
        url: http://localhost:8982/solr
        max_memory: 1024M
        # TK more configuration options here

The files in the solr/conf directory are used directly by the bundled Solr instance when you run it locally. You probably won't need to change it, but if you need advanced customization of Solr's behavior, these files are where you can do that. The solr/data directory contains your actual Solr index on disk, and is thus excluded from Git for you.

To start Solr in your development environment, simply run:

sunspot solr start # sunspot_solr gem must be installed

If you run this from the root of a Rails project, Sunspot will detect that and use your config/sunspot.yml if it's present.

Setting up your classes for search

Sunspot is designed to index and search Ruby objects that are persisted to a separate primary data store. Sunspot supports ActiveRecord, DataMapper, Mongoid, and MongoMapper [TK what else?] out of the box; it's quite easy to add support for other persistence layers. See the documentation for Sunspot::Adapter.

Configuring a model class for search primarily consists of defining which fields Sunspot should index, and setting those fields up with various options. Fields do not need to correspond to database columns; Sunspot will happily any index the return value of any method your object responds to.

The basics

The examples in this README all assume we're building a straightforward blogging platform. Let's start with a simple configuration for our Post model.

    class Post < ActiveRecord::Base
      include Sunspot::Searchable

      searchable do
        fulltext :body
        integer :blog_id
      end
    end

The searchable block is where all Sunspot configuration is performed. Here we have three fields: the title and body fulltext fields, and the blog_id integer field. These fields exemplify the two basic field types in Sunspot: fulltext fields and attribute fields.

Fulltext fields always have the type fulltext, and are used for keyword search. Solr breaks apart the data from fulltext fields into individual words, and when a fulltext search is performed, documents are matched against search terms on a word-by-word basis.

Attribute fields, on the other hand, are scalar data, and are indexed as-is without any analysis. Attribute fields can have several scalar types: string, integer, long, float, double, date, time, and boolean are the main ones. You can think of attribute fields as equivalent to columns in a database: they can be used for filtering search results to a certain scope (e.g. only return results with a blog_id of 1); ordering results; and faceting, a topic we will cover in more depth later in this README.

Populating data

The above example uses the simplest method of populating fields; Sunspot will simply call the method named by the field, and index the return value if it's non-nil. If you wish to give the field a different name from the method that populates it, use the :using option:

    searchable do
      integer :my_blog_id, :using => :blog_id
    end

This will populate a my_blog_id in Solr using the return value of the Post#blog_id method.

If you wish to populate a field with data that is not defined by a method on your model class, you can pass a block to the field definition; the block is evaluated in the context of the model instance, and the return value is indexed. For instance, perhaps we wish to index the number of comments on a given post:

    searchable do
      integer(:comments_count) { comments.count }
    end

Reference fields

A special type of attribute field is a reference field. These are fields that hold references to other persistent objects; they're particularly useful for faceting. For example, instead of our blog_id field above, we might simply index blog as a reference field:

    searchable do
      reference :blog
    end

Now instead of working with an integer when using this field, we'll be working with actual Blog objects.

Trie fields

TK

Universal field options

The following options are available on all fields:

:stored
By default, Sunspot does not add field data to Solr in a way that allows Solr to return that field data in search results; instead, Sunspot only stores the object's class name and primary key, and uses that information from the search result to load the original object out of the primary database. You can override this behavior on a per-field basis to instruct Solr to return the field data in search results; in certain cases, this can allow you to bypass looking up the original objects in the database altogether, giving you a performance boost.
:as
**Advanced.** Usually, Sunspot constructs an internal field name for your fields based on the field type and options you've set; Sunspot's built-in Solr schema is set up to follow the same naming conventions. In certain cases, such as legacy schemas or for functionality not supported by Sunspot, you may want to override this and directly set the field name that will be used internally.

Fulltext field options

TK

Attribute field options

TK

Adding your objects to Solr

TK

Full reindexing

    $ sunspot reindex

TK

Automatic lifecycle indexing

TK

Indexing vs. committing

TK

Searching

TK

Fulltext Search

TK

Boosting and phrase fields

TK

Highlighting

TK

More like this

TK

Scoping results

Post.search do
  where(:blog_id => 1)
  where(:comments_count).gt(0)
end

TK

Ordering results

Faceting

TK

Field facets

TK

Query Facets

TK

Geospatial search

TK