Permalink
Newer
Older
100644 733 lines (561 sloc) 17.8 KB
1
# Sunspot
2
3
[![Build Status](http://travis-ci.org/sunspot/sunspot.png)](http://travis-ci.org/sunspot/sunspot)
4
5
Sunspot is a Ruby library for expressive, powerful interaction with the Solr
6
search engine. Sunspot is built on top of the RSolr library, which
7
provides a low-level interface for Solr interaction; Sunspot provides a simple,
8
intuitive, expressive DSL backed by powerful features for indexing objects and
9
searching for them.
10
11
Sunspot is designed to be easily plugged in to any ORM, or even non-database-backed
12
objects such as the filesystem.
13
14
This README provides a high level overview; class-by-class and
15
method-by-method documentation is available in the [API
16
reference](http://sunspot.github.com/sunspot/docs/).
17
18
## Quickstart with Rails 3
19
20
Add to Gemfile:
21
22
```ruby
23
gem 'sunspot_rails'
24
gem 'sunspot_solr' # optional pre-packaged Solr distribution for use in development
25
```
26
27
Bundle it!
28
29
```bash
30
bundle install
31
```
32
33
Generate a default configuration file:
34
35
```bash
36
rails generate sunspot_rails:install
37
```
38
39
If `sunspot_solr` was installed, start the packaged Solr distribution
40
with:
41
42
```bash
43
bundle exec rake sunspot:solr:start # or sunspot:solr:run to start in foreground
44
```
45
46
## Setting Up Objects
47
48
Add a `searchable` block to the objects you wish to index.
49
50
```ruby
51
class Post < ActiveRecord::Base
52
searchable do
53
text :title, :body
54
text :comments do
55
comments.map { |comment| comment.body }
56
end
57
58
boolean :featured
59
integer :blog_id
60
integer :author_id
61
integer :category_ids, :multiple => true
62
double :average_rating
63
time :published_at
64
time :expired_at
65
66
string :sort_title do
67
title.downcase.gsub(/^(an?|the)/, '')
68
end
69
end
70
end
71
```
72
73
`text` fields will be full-text searchable. Other fields (e.g.,
74
`integer` and `string`) can be used to scope queries.
75
76
## Searching Objects
77
78
```ruby
79
Post.search do
80
fulltext 'best pizza'
81
82
with :blog_id, 1
83
with(:published_at).less_than Time.now
84
order_by :published_at, :desc
85
paginate :page => 2, :per_page => 15
86
facet :category_ids, :author_id
87
end
88
```
89
90
## Search In Depth
91
92
Given an object `Post` setup in earlier steps ...
93
94
### Full Text
95
96
```ruby
97
# All posts with a `text` field (:title, :body, or :comments) containing 'pizza'
98
Post.search { fulltext 'pizza' }
99
100
# Posts with pizza, scored higher if pizza appears in the title
101
Post.search do
102
fulltext 'pizza' do
103
boost_fields :title => 2.0
104
end
105
end
106
107
# Posts with pizza, scored higher if featured
108
Post.search do
109
fulltext 'pizza' do
110
boost(2.0) { with(:featured, true) }
111
end
112
end
113
114
# Posts with pizza *only* in the title
115
Post.search do
116
fulltext 'pizza' do
117
fields(:title)
118
end
119
end
120
121
# Posts with pizza in the title (boosted) or in the body (not boosted)
122
Post.search do
123
fulltext 'pizza' do
124
fields(:body, :title => 2.0)
125
end
126
end
127
```
128
129
#### Phrases
130
131
Solr allows searching for phrases: search terms that are close together.
132
133
In the default query parser used by Sunspot (dismax), phrase searches
134
are represented as a double quoted group of words.
135
136
```ruby
137
# Posts with the exact phrase "great pizza"
138
Post.search do
139
fulltext '"great pizza"'
140
end
141
```
142
143
If specified, **query_phrase_slop** sets the number of words that may
144
appear between the words in a phrase.
145
146
```ruby
147
# One word can appear between the words in the phrase, so "great big pizza"
148
# also matches, in addition to "great pizza"
149
Post.search do
150
fulltext '"great pizza"' do
151
query_phrase_slop 1
152
end
153
end
154
```
155
156
##### Phrase Boosts
157
158
Phrase boosts add boost to terms that appear in close proximity;
159
the terms do not *have* to appear in a phrase, but if they do, the
160
document will score more highly.
161
162
```ruby
163
# Matches documents with great and pizza, and scores documents more
164
# highly if the terms appear in a phrase in the title field
165
Post.search do
166
fulltext 'great pizza' do
167
phrase_fields :title => 2.0
168
end
169
end
170
171
# Matches documents with great and pizza, and scores documents more
172
# highly if the terms appear in a phrase (or with one word between them)
173
# in the title field
174
Post.search do
175
fulltext 'great pizza' do
176
phrase_fields :title => 2.0
177
phrase_slop 1
178
end
179
end
180
```
182
### Scoping (Scalar Fields)
183
184
Fields not defined as `text` (e.g., `integer`, `boolean`, `time`,
185
etc...) can be used to scope (restrict) queries before full-text
186
matching is performed.
187
188
#### Positive Restrictions
189
190
```ruby
191
# Posts with a blog_id of 1
192
Post.search do
193
with(:blog_id, 1)
194
end
195
196
# Posts with an average rating between 3.0 and 5.0
197
Post.search do
198
with(:average_rating, 3.0..5.0)
199
end
200
201
# Posts with a category of 1, 3, or 5
202
Post.search do
203
with(:category_ids, [1, 3, 5])
204
end
205
206
# Posts published since a week ago
207
Post.search do
208
with(:published_at).greater_than(1.week.ago)
209
end
210
```
211
212
#### Negative Restrictions
213
214
```ruby
215
# Posts not in category 1 or 3
216
Post.search do
217
without(:category_ids, [1, 3])
218
end
219
220
# All examples in "positive" also work negated using `without`
221
```
222
223
#### Disjunctions and Conjunctions
224
225
```ruby
226
# Posts that do not have an expired time or have not yet expired
227
Post.search do
228
any_of do
229
with(:expired_at).greater_than(Time.now)
230
with(:expired_at, nil)
231
end
232
end
233
```
234
235
```ruby
236
# Posts with blog_id 1 and author_id 2
237
Post.search do
238
all_of do
239
with(:blog_id, 1)
240
with(:author_id, 2)
241
end
242
end
243
```
244
245
Disjunctions and conjunctions may be nested
246
247
```ruby
248
Post.search do
249
any_of do
250
with(:blog_id, 1)
251
all_of do
252
with(:blog_id, 2)
253
with(:category_ids, 3)
254
end
255
end
256
end
257
```
258
259
#### Combined with Full-Text
260
261
Scopes/restrictions can be combined with full-text searching. The
262
scope/restriction pares down the objects that are searched for the
263
full-text term.
264
265
```ruby
266
# Posts with blog_id 1 and 'pizza' in the title
267
Post.search do
268
with(:blog_id, 1)
269
fulltext("pizza")
270
end
271
```
272
273
### Pagination
274
275
**All results from Solr are paginated**
276
277
The results array that is returned has methods mixed in that allow it to
278
operate seamlessly with common pagination libraries like will\_paginate
279
and kaminari.
280
281
By default, Sunspot requests the first 30 results from Solr.
282
283
```ruby
284
search = Post.search do
285
fulltext "pizza"
286
end
287
288
# Imagine there are 60 *total* results (at 30 results/page, that is two pages)
289
results = search.results # => Array with 30 Post elements
290
291
search.total # => 60
292
293
results.total_pages # => 2
294
results.first_page? # => true
295
results.last_page? # => false
296
results.previous_page # => nil
297
results.next_page # => 2
298
results.out_of_bounds? # => false
299
results.offset # => 0
300
```
301
302
To retrieve the next page of results, recreate the search and use the
303
`paginate` method.
304
305
```ruby
306
search = Post.search do
307
fulltext "pizza"
308
paginate :page => 2
309
end
310
311
# Again, imagine there are 60 total results; this is the second page
312
results = search.results # => Array with 30 Post elements
313
314
search.total # => 60
315
316
results.total_pages # => 2
317
results.first_page? # => false
318
results.last_page? # => true
319
results.previous_page # => 1
320
results.next_page # => nil
321
results.out_of_bounds? # => false
322
results.offset # => 30
323
```
324
325
A custom number of results per page can be specified with the
326
`:per_page` option to `paginate`:
327
328
```ruby
329
search = Post.search do
330
fulltext "pizza"
331
paginate :page => 1, :per_page => 50
332
end
333
```
334
335
### Faceting
336
337
Faceting is a feature of Solr that determines the number of documents
338
that match a given search *and* an additional criterion. This allows you
339
to build powerful drill-down interfaces for search.
340
341
Each facet returns zero or more rows, each of which represents a
342
particular criterion conjoined with the actual query being performed.
343
For **field facets**, each row represents a particular value for a given
344
field. For **query facets**, each row represents an arbitrary scope; the
345
facet itself is just a means of logically grouping the scopes.
346
347
#### Field Facets
348
349
```ruby
350
# Posts that match 'pizza' returning counts for each :author_id
351
search = Post.search do
352
fulltext "pizza"
353
facet :author_id
354
end
355
356
search.facet(:author_id).rows.each do |facet|
357
puts "Author #{facet.value} has #{facet.count} pizza posts!"
358
end
359
```
360
361
#### Query Facets
362
363
```ruby
364
# Posts faceted by ranges of average ratings
365
Post.search do
366
facet(:average_rating) do
367
row(1.0..2.0) do
368
with(:average_rating, 1.0..2.0)
369
end
370
row(2.0..3.0) do
371
with(:average_rating, 2.0..3.0)
372
end
373
row(3.0..4.0) do
374
with(:average_rating, 3.0..4.0)
375
end
376
row(4.0..5.0) do
377
with(:average_rating, 4.0..5.0)
378
end
379
end
380
end
381
382
# e.g.,
383
# Number of posts with rating withing 1.0..2.0: 2
384
# Number of posts with rating withing 2.0..3.0: 1
385
search.facet(:average_rating).rows.each do |facet|
386
puts "Number of posts with rating withing #{facet.value}: #{facet.count}"
387
end
388
```
389
390
### Ordering
391
392
By default, Sunspot orders results by "score": the Solr-determined
Nov 9, 2011
393
relevancy metric. Sorting can be customized with the `order_by` method:
394
395
```ruby
396
# Order by average rating, descending
397
Post.search do
398
fulltext("pizza")
399
order_by(:average_rating, :desc)
400
end
401
402
# Order by relevancy score and in the case of a tie, average rating
403
Post.search do
404
fulltext("pizza")
405
406
order_by(:score, :desc)
407
order_by(:average_rating, :desc)
408
end
409
410
# Randomized ordering
411
Post.search do
412
fulltext("pizza")
413
order_by(:random)
414
end
415
```
416
417
### Geospatial
418
419
TODO
420
421
### Highlighting
422
423
Highlighting allows you to display snippets of the part of the document
424
that matched the query.
425
426
The fields you wish to highlight must be **stored**.
427
428
```ruby
429
class Post < ActiveRecord::Base
430
searchable do
431
# ...
432
text :body, :stored => true
433
end
434
end
435
```
436
437
Highlighting matches on the `body` field, for instance, can be acheived
438
like:
439
440
```ruby
441
search = Post.search do
442
fulltext "pizza" do
443
highlight :body
444
end
445
end
446
447
# Will output something similar to:
448
# Post #1
449
# I really love *pizza*
450
# *Pizza* is my favorite thing
451
# Post #2
452
# Pepperoni *pizza* is delicious
453
search.hits.each do |hit|
454
puts "Post ##{hit.primary_key}"
455
456
hit.highlights(:body).each do |highlight|
457
puts " " + highlight.format { |word| "*#{word}*" }
458
end
459
end
460
```
462
### Functions
463
464
TODO
465
466
### More Like This
467
468
Sunspot can extract related items using more_like_this. When searching
469
for similar items, you can pass a block with the following options:
470
471
* fields :field_1[, :field_2, ...]
472
* minimum_term_frequency ##
473
* minimum_document_frequency ##
474
* minimum_word_length ##
475
* maximum_word_length ##
476
* maximum_query_terms ##
477
* boost_by_relevance true/false
478
Nov 30, 2011
479
```ruby
480
class Post < ActiveRecord::Base
481
searchable do
482
# The :more_like_this option must be set to true
Nov 30, 2011
483
text :body, :more_like_this => true
484
end
485
end
486
487
post = Post.first
488
489
results = Sunspot.more_like_this(post) do
490
fields :body
Nov 30, 2011
491
minimum_term_frequency 5
492
end
493
```
495
## Indexing In Depth
496
497
TODO
498
499
### Index-Time Boosts
500
501
To specify that a field should be boosted in relation to other fields for
502
all queries, you can specify the boost at index time:
503
504
```ruby
505
class Post < ActiveRecord::Base
506
searchable do
507
text :title, :boost => 5.0
508
text :body
509
end
510
end
511
```
512
513
### Stored Fields
514
515
Stored fields keep an original (untokenized/unanalyzed) version of their
516
contents in Solr.
517
518
Stored fields allow data to be retrieved without also hitting the
519
underlying database (usually an SQL server). They are also required for
520
highlighting and more like this queries.
521
522
Stored fields come at some performance cost in the Solr index, so use
523
them wisely.
524
525
```ruby
526
class Post < ActiveRecord::Base
527
searchable do
528
text :body, :stored => true
529
end
530
end
531
532
# Retrieving stored contents without hitting the database
533
Post.search.hits.each do |hit|
534
puts hit.stored(:body)
535
end
536
```
537
538
## Hits vs. Results
539
540
Sunspot simply stores the type and primary key of objects in Solr.
541
When results are retrieved, those primary keys are used to load the
542
actual object (usually from an SQL database).
543
544
```ruby
545
# Using #results pulls in the records from the object-relational
546
# mapper (e.g., ActiveRecord + a SQL server)
547
Post.search.results.each do |result|
548
puts result.body
549
end
550
```
551
552
To access information about the results without querying the underlying
553
database, use `hits`:
554
555
```ruby
556
# Using #hits gives back all information requested from Solr, but does
557
# not load the object from the object-relational mapper
558
Post.search.hits.each do |hit|
559
puts hit.stored(:body)
560
end
561
```
562
563
If you need both the result (ORM-loaded object) and `Hit` (e.g., for
564
faceting, highlighting, etc...), you can use the convenience method
565
`each_hit_with_result`:
566
567
```ruby
568
Post.search.each_hit_with_result do |hit, result|
569
# ...
570
end
571
```
573
## Reindexing Objects
574
575
If you are using Rails, objects are automatically indexed to Solr as a
576
part of the `save` callbacks.
577
578
If you make a change to the object's "schema" (code in the `searchable` block),
579
you must reindex all objects so the changes are reflected in Solr:
580
581
```bash
582
bundle exec rake sunspot:solr:reindex
583
584
# or, to be specific to a certain model with a certain batch size:
585
bundle exec rake sunspot:solr:reindex[500,Post] # some shells will require escaping [ with \[ and ] with \]
586
```
587
588
## Use Without Rails
589
590
TODO
591
592
## Manually Adjusting Solr Parameters
593
594
To add or modify parameters sent to Solr, use `adjust_solr_params`:
595
596
```ruby
597
Post.search do
598
adjust_solr_params do |params|
599
params[:q] += " AND something_s:more"
600
end
601
end
602
```
603
604
## Session Proxies
605
606
TODO
607
608
## Type Reference
609
610
TODO
611
612
## Development
613
614
### Running Tests
615
616
#### sunspot
617
618
Install the required gem dependencies:
619
620
```bash
621
cd /path/to/sunspot/sunspot
622
bundle install
623
```
624
625
Start a Solr instance on port 8983:
626
627
```bash
628
bundle exec sunspot-solr start -p 8983
629
# or `bundle exec sunspot-solr run -p 8983` to run in foreground
630
```
631
632
Run the tests:
633
634
```bash
635
bundle exec rake spec
636
```
637
638
If desired, stop the Solr instance:
639
640
```bash
641
bundle exec sunspot-solr stop
642
```
643
644
#### sunspot\_rails
645
646
Install the gem dependencies for `sunspot`:
647
648
```bash
649
cd /path/to/sunspot/sunspot
650
bundle install
651
```
652
653
Start a Solr instance on port 8983:
654
655
```bash
656
bundle exec sunspot-solr start -p 8983
657
# or `bundle exec sunspot-solr run -p 8983` to run in foreground
658
```
659
660
Navigate to the `sunspot_rails` directory:
661
662
```bash
663
cd ../sunspot_rails
664
```
665
666
Run the tests:
667
668
```bash
669
rake spec # all Rails versions
670
rake spec RAILS=3.1.1 # specific Rails version only
671
```
672
673
If desired, stop the Solr instance:
674
675
```bash
676
cd ../sunspot
677
bundle exec sunspot-solr stop
678
```
679
680
### Generating Documentation
681
682
Install the `yard` and `redcarpet` gems:
683
684
```bash
685
$ gem install yard redcarpet
686
```
687
688
Uninstall the `rdiscount` gem, if installed:
689
690
```bash
691
$ gem uninstall rdiscount
692
```
693
694
Generate the documentation from topmost directory:
695
696
```bash
697
$ yardoc -o docs */lib/**/*.rb - README.md
698
```
699
700
## Tutorials and Articles
701
702
* [Full Text Searching with Solr and Sunspot](http://collectiveidea.com/blog/archives/2011/03/08/full-text-searching-with-solr-and-sunspot/) (Collective Idea)
703
* [Full-text search in Rails with Sunspot](http://tech.favoritemedium.com/2010/01/full-text-search-in-rails-with-sunspot.html) (Tropical Software Observations)
704
* [Sunspot Full-text Search for Rails/Ruby](http://therailworld.com/posts/23-Sunspot-Full-text-Search-for-Rails-Ruby) (The Rail World)
705
* [A Few Sunspot Tips](http://blog.trydionel.com/2009/11/19/a-few-sunspot-tips/) (spiral_code)
706
* [Sunspot: A Solr-Powered Search Engine for Ruby](http://www.linux-mag.com/id/7341) (Linux Magazine)
707
* [Sunspot Showed Me the Light](http://bennyfreshness.com/2010/05/sunspot-helped-me-see-the-light/) (ben koonse)
708
* [RubyGems.org — A case study in upgrading to full-text search](http://blog.websolr.com/post/3505903537/rubygems-search-upgrade-1) (Websolr)
709
* [How to Implement Spatial Search with Sunspot and Solr](http://codequest.eu/articles/how-to-implement-spatial-search-with-sunspot-and-solr) (Code Quest)
710
* [Sunspot 1.2 with Spatial Solr Plugin 2.0](http://joelmats.wordpress.com/2011/02/23/getting-sunspot-1-2-with-spatial-solr-plugin-2-0-to-work/) (joelmats)
711
* [rails3 + heroku + sunspot : madness](http://anhaminha.tumblr.com/post/632682537/rails3-heroku-sunspot-madness) (anhaminha)
712
* [How to get full text search working with Sunspot](http://cookbook.hobocentral.net/recipes/57-how-to-get-full-text-search) (Hobo Cookbook)
713
* [Full text search with Sunspot in Rails](http://hemju.com/2011/01/04/full-text-search-with-sunspot-in-rail/) (hemju)
714
* [Using Sunspot for Free-Text Search with Redis](http://masonoise.wordpress.com/2010/02/06/using-sunspot-for-free-text-search-with-redis/) (While I Pondered...)
715
* [Fuzzy searching in SOLR with Sunspot](http://www.pipetodevnull.com/past/2010/8/5/fuzzy_searching_in_solr_with_sunspot/) (pipe :to => /dev/null)
716
* [Default scope with Sunspot](http://www.cloudspace.com/blog/2010/01/15/default-scope-with-sunspot/) (Cloudspace)
717
* [Index External Models with Sunspot/Solr](http://www.medihack.org/2011/03/19/index-external-models-with-sunspotsolr/) (Medihack)
718
* [Chef recipe for Sunspot in production](http://gist.github.com/336403)
719
* [Testing with Sunspot and Cucumber](http://collectiveidea.com/blog/archives/2011/05/25/testing-with-sunspot-and-cucumber/) (Collective Idea)
720
* [Cucumber and Sunspot](http://opensoul.org/2010/4/7/cucumber-and-sunspot) (opensoul.org)
721
* [Testing Sunspot with Cucumber](http://blog.trydionel.com/2010/02/06/testing-sunspot-with-cucumber/) (spiral_code)
722
* [Running cucumber features with sunspot_rails](http://blog.kabisa.nl/2010/02/03/running-cucumber-features-with-sunspot_rails) (Kabisa Blog)
723
* [Testing Sunspot with Test::Unit](http://timcowlishaw.co.uk/post/3179661158/testing-sunspot-with-test-unit) (Type Slowly)
724
* [How To Use Twitter Lists to Determine Influence](http://www.untitledstartup.com/2010/01/how-to-use-twitter-lists-to-determine-influence/) (Untitled Startup)
725
* [Sunspot Quickstart](http://wiki.websolr.com/index.php/Sunspot_Quickstart) (WebSolr)
726
* [Solr, and Sunspot](http://www.kuahyeow.com/2009/08/solr-and-sunspot.html) (YT!)
727
* [The Saga of the Switch](http://mrb.github.com/2010/04/08/the-saga-of-the-switch.html) (mrb -- includes comparison of Sunspot and Ultrasphinx)
728
* [Conditional Indexing with Sunspot](http://mikepackdev.com/blog_posts/19-conditional-indexing-with-sunspot) (mikepack)
729
730
## License
731
732
Sunspot is distributed under the MIT License, copyright (c) 2008-2009 Mat Brown