Skip to content

Drilling down with facets

s01ipsist edited this page Apr 10, 2013 · 2 revisions

Solr’s powerful faceting feature allows easy construction of drill-down search/browse interfaces. The purpose of faceting is to determine, given a query, how many documents would be matched if a further restriction were placed on that query. Facets take two main forms: field facets (including the extra functionality for time fields), and query facets.

Field Facets

The simplest and most widely useful type of facet that Solr provides is the field facet. Given a query and a field, a field facet will determine all of the values for that field which are contained by a document matching the query; further, it will determine how many documents match each value.

Sunspot uses the facet method in the DSL to specify field facets. The facet method can take one or more field name arguments, followed optionally by a hash of options (which will be applied to the facets for all of the specified fields). Let’s take a simple example, a single field facet with no options:


Sunspot.search(Post) do
  with(:blog_id, 1)
  facet(:category_ids)
end

The facet call tells Solr, “return a list of all of the category IDs that documents in Blog 1 have, with a count of documents matching each category ID”. See Working with search for details on how to access that information from the Search object.

A couple of important things to note about the example above. First, the scope of the search matters for faceting: the category IDs returned by the facet will only be the ones contained by documents in Blog 1 . Second, faceting is not limited to the actual set of (paginated) documents returned by the search — for the purposes of calculating facets, all documents matching the search scope are used.

Date Facets

http://wiki.apache.org/solr/SimpleFacetParameters#Date_Faceting_Parameters

These aren’t explicitly supported by Sunspot currently (2.0) but are a good example of how you can manually manipulate parameters.

Given a declaration of


class Entry 
...
searchable do
  ...
  time    :updated_at
end

I can find the counts of entries updated between April 1 and today April 10 by day, only including days with at least 1 update.


@search = Sunspot.search(Entry) do
  fulltext '*:*'
  adjust_solr_params do |params|
    params['facet'] = 'true'
    params['facet.date'] = 'updated_at_d'
    params['facet.date.start'] = '2013-04-01T00:00:00Z'
    params['facet.date.end'] = 'NOW'
    params['facet.date.gap'] = '+1DAY'
    params['f.updated_at_d.facet.mincount'] = '1'
  end
  paginate :page => 1, :per_page => 0 #as we aren't interested in anything but the facets
end

@search.facet_response['facet_dates']['updated_at_d'].sort.map {|date_s, count|
  next if ['gap', 'start', 'end'].include?(date_s) #discard meta values returned in hash
  date = date_s.to_time
  print "#{date} - #{count}\n"
}