diff --git a/details.html b/details.html new file mode 100644 index 00000000..2ddb9b83 --- /dev/null +++ b/details.html @@ -0,0 +1,250 @@ + + +
+ + + + + + + + + + + + + + + + + + + + ++A semantic text search engine does not operate on huge blobs of text, but instead on smaller, highly categorized text amounts. For example, on varchar database fields. +
++If your data isn't categorized well (like text from a book), then you should instead choose a full-text search engine, like +Sphinx +or +Solr (Lucene). +
++Often, full-text search engines are misused by letting them loose on highly categorized (semantic) text. +
++Picky helps your user find data which in a full-text search engine would be buried in a heap of results. Also, it lets him do so with a Google-y single search field. +
++Sure the word "peter" is found most often in document #7, but he actually just wants documents by someone with surname "Peter", and not everything related to peters. +
++Picky helps him refine his search by way of a comfortable interface to get exactly what he wants. +
++Full-Text search engines do one thing especially well: Making full (i.e. uncategorized heaps of) text searchable. +
++For small, highly categorized text, we simply need new ideas. Picky is one of them. +
++Ok, that was my elevator pitch ;) +
++Using a real +telephone search +as an example. +
+ ++This was at the fantastic +EuRuKo 2010 +Conference in +beautiful +Krakow. +
++It's fast enough and the high level really helped understanding it as it evolved. There are some parts that have been written in pedal-to-the-metal C code. +
++This depends on many factors, but generally we recommend using Picky with a maximum of 150 million data points, i.e. words (we used it there). +The area under 20 millions is probably best. Your mileage may vary, of course, depending on how many partial indexes you use etc. +
++See the +use case +in the enterprise section. +
++Indexing is not too fast, and I'd be glad if it were faster. However, you get the full power of Ruby and fully customizable indexing. +
++Glad you asked. But first, read this +Wikipedia entry about octopuses. +Also, +a movie. +Finished? I think that sums it up pretty well. And it's cuuute, don't you think? :) +
++But don't call him that. He likes to be called "Octor the Destroyer". +
++Mainly me, +Florian Hanke, +but I also had +excellent help +by friends and coworkers. +
++I'd have preferred a MIT license. In the end it was a compromise between my former employer and me. +
++Wiki Roadmap +
++There aren't many +real +Ruby search engines. Just more or less elegant adapters for existing ones. I found two real ones: +
++Whistlepig +by William Morgan. +"Whistlepig is a minimalist real-time full-text search". +
++Ion +by Rico Sta. Cruz. +A Ruby search engine based on a Redis backend. +
++We're always glad for help requests, feedback, single-page scripts, project battle stories: +
++Share it in the mailing list +
++For quick info updates, +follow Picky +on twitter. +
++You might also find excellent Pickyists on IRC in +#picky +that can also help. +
+
+
+
+This webpage & the images on it have been designed by me,
+Florian Hanke
+@hanke
+aka "Flöre", or "Floere" – if you have personal feedback on anything, I'm pleased to hear it.
+
This is the one page help document for Picky.
+ +Search for things using your browser (use ⌘F).
+ +Edit typos directly in the github page of a section using the edit button.
+It's All Ruby. You'll never feel powerless. Look at your index data anytime.
+Creating an example app to get you up and running fast, Servers or Clients.
+ +Generating them:
+ + + +More infos on the applications:
+ + +How to integrate Picky in:
+ +How data is cut into little pieces for the index and when searching.
+ + +How the data is stored and what you can do with Indexes.
+ +Configuring an index:
+ + + +How does data get into an index?
+ +How is the data categorized?
+ +How is the data prepared?
+ + + +Getting at the data:
+ + + +There are four different store types:
+ + + +Advanced topics:
+ +How to configure a search interface over an index (or multiple).
+ + + +What options does a user have when searching?
+ + + +Advanced topics:
+ +When you need a slice over a category's data.
+ +What a picky search returns.
+ + +We include a JavaScript library to make writing snazzy interfaces easier – see the options.
+A bit of thanks!
+ +Never forget this: Picky is all Ruby, all the time!
+ +Even though we only describe examples of classic and Sinatra style servers, Picky can be included directly in Rails, as a client or server. Or in DRb. Or in your simple script without HTTP. Anywhere you like, as long as it's Ruby, really.
+ +To drive the point home, remember that Picky is mainly two pieces working together: An index, and a search interface on indexes.
+ +The index normally has a source, knows how to tokenize data, and has a few data categories. And the search interface normally knows how to tokenize incoming queries. That's it (copy and run in a script):
+ +require 'picky'
+
+Person = Struct.new :id, :first, :last
+
+index = Picky::Index.new :people do
+ source { People.all }
+ indexing splits_text_on: /[\s-]/
+ category :first
+ category :last
+end
+index.add Person.new(1, 'Florian', 'Hanke')
+index.add Person.new(2, 'Peter', 'Mayer-Miller')
+
+people = Picky::Search.new index do
+ searching splits_text_on: /[\s,-]/
+end
+
+results = people.search 'Miller'
+p results.ids # => [2]
+
+
+You can put these pieces anywhere, independently.
+Picky tries its best to be transparent so you can go have a look if something goes wrong. It wants you to never feel powerless.
+ +All the indexes can be viewed in the /index
directory of the project. They are waiting for you to inspect their JSONy goodness.
+Should anything not work with your search, you can investigate how it is indexed by viewing the actual index files (remember, they are in readable JSON) and change your indexing parameters accordingly.
You can also log as much data as you want to help you improve your search application until it's working perfectly.
+ +Picky offers a few generators to have a running server and client up in 5 minutes. So you can either get started right away
+ +or, run gem install
+ +gem install picky-generators
+
+
+and simply enter
+ +picky generate
+
+
+This will raise an Picky::Generators::NotFoundException
and show you the possibilities.
The "All In One" Client/Server might be interesting for Heroku projects, as it is a bit complicated to set up two servers that interact with each other.
+Currently, Picky offers two generated example projects that you can adapt to your project: Separate Client and Server (recommended) and All In One.
+ +If this is your first time using Picky, we suggest to start out with these even if you have a project where you want to integrate Picky already.
+The server is generated with
+ +picky generate server target_directory
+
+
+and generates a full Sinatra server that you can try immediately. Just follow the instructions.
+All In One is actually a single Sinatra server containing the Server AND the client. This server is generated with
+ +picky generate all_in_one target_directory
+
+
+and generates a full Sinatra Picky server and client in one that you can try immediately. Just follow the instructions.
+Picky currently offers an example Sinatra client that you can adapt to your project (or look at it to get a feeling for how to use Picky in Rails).
+This client is generated with
+ +picky generate client target_directory
+
+
+and generates a full Sinatra Picky client (including Javascript etc.) that you can try immediately. Just follow the instructions.
+ +Picky, from version 3.0 onwards, is designed to run anywhere, in anything. An octopus has eight legs, remember?
+ +This means you can have a Picky server running in a DRb instance if you want to. Or in irb, for example.
+ +We do run and test the Picky server in two styles, Classic and Sinatra.
+ +But don't let that stop you from just using it in a class or just a script. This is a perfectly ok way to use Picky:
+ +require 'picky'
+
+include Picky # So we don't have to type Picky:: everywhere.
+
+books_index = Index.new(:books) do
+ source Sources::CSV.new(:title, :author, file: 'library.csv')
+ category :title
+ category :author
+end
+
+books_index.index
+books_index.reload
+
+books = Search.new books_index do
+ boost [:title, :author] => +2
+end
+
+results = books.search "test"
+results = books.search "alan turing"
+
+require 'pp'
+pp results.to_hash
+
+
+More Ruby, more power to you!
+A Sinatra server is usually just a single file. In Picky, it is a top-level file named
+ +app.rb
+
+
+We recommend to use the modular Sinatra style as opposed to the classic style. It's possible to write a Picky server in the classic style, but using the modular style offers more options.
+ +require 'sinatra/base'
+require 'picky'
+
+class BookSearch < Sinatra::Application
+
+ books_index = Index.new(:books) do
+ source { Book.order("isbn ASC") }
+ category :title
+ category :author
+ end
+
+ books = Search.new books_index do
+ boost [:title, :author] => +2
+ end
+
+ get '/books' do
+ results = books.search params[:query],
+ params[:ids] || 20,
+ params[:offset] || 0
+ results.to_json
+ end
+
+end
+
+
+This is already a complete Sinatra server.
+The Sinatra Picky server uses the same routing as Sinatra (of course). More information on Sinatra routing.
+ +If you use the server with the picky client software (provided with the picky-client gem), you should return JSON from the Sinatra get
.
+Just call to_json
on the returned results to get the results in JSON format.
get '/books' do
+ results = books.search params[:query], params[:ids] || 20, params[:offset] || 0
+ results.to_json
+end
+
+
+The above example search can be called using for example curl
:
curl 'localhost:8080/books?query=test'
+
+TODO Update this section.
+ +This is one way to do it:
+ +MyLogger = Logger.new "log/search.log"
+
+# ...
+
+get '/books' do
+ results = books.search "test"
+ MyLogger.info results
+ results.to_json
+end
+
+
+or set it up in separate files for different environments:
+ +require "logging/#{PICKY_ENVIRONMENT}"
+
+
+Note that this is not Rack logging, but Picky search engine logging. The resulting file can be used with the picky-statistics gem.
+The All In One server is a Sinatra server and a Sinatra client rolled in one.
+ +It's best to just generate one and look at it:
+ +picky generate all_in_one all_in_one_test
+
+
+and then follow the instructions.
+ +When would you use an All In One server? One place is Heroku, since it is a bit more complicated to set up two servers that interact with each other.
+ +It's nice for small convenient searches. For production setups we recommend to use a separate server to make everything separately cacheable etc.
+ +How do you integrate Picky in…?
+There are basically two basic ways to integrate Picky in Rails:
+ +The advantage of the first setup is that you don't need to manage an external server. However, having a separate search server is much cleaner: You don't need to load the indexes on Rails startup as you just leave the search server running separately.
+If you just want a small search engine inside your Rails app, this is the way to go.
+ +In config/initializers/picky.rb
, add the following: (lots of comments to help you)
# Set the Picky logger.
+#
+Picky.logger = Picky::Loggers::Silent.new
+# Picky.logger = Picky::Loggers::Concise.new
+# Picky.logger = Picky::Loggers::Verbose.new
+
+# Set up an index and store it in a constant.
+#
+BooksIndex = Picky::Index.new :books do
+ # Our keys are usually integers.
+ #
+ key_format :to_i
+ # key_format :to_s # From eg. Redis they are strings.
+ # key_format ... (whatever method needs to be called on
+ # the id of what you are indexing)
+
+ # Some indexing options to start with.
+ # Please see: http://florianhanke.com/picky/documentation.html#tokenizing
+ # on what the options are.
+ #
+ indexing removes_characters: /[^a-z0-9\s\/\-\_\:\"\&\.]/i,
+ stopwords: /\b(and|the|of|it|in|for)\b/i,
+ splits_text_on: /[\s\/\-\_\:\"\&\/]/,
+ rejects_token_if: lambda { |token| token.size < 2 }
+
+ # Define categories on your data.
+ #
+ # They have a lot of options, see:
+ # http://florianhanke.com/picky/documentation.html#indexes-categories
+ #
+ category :title
+ category :subtitle
+ category :author
+ category :isbn,
+ :partial => Picky::Partial::None.new # Only full matches
+end
+
+# BookSearch is the search interface
+# on the books index. More info here:
+# http://florianhanke.com/picky/documentation.html#search
+#
+BookSearch = Picky::Search.new BooksIndex
+
+# We are explicitly indexing the book data.
+#
+Book.all.each { |book| BooksIndex.add book }
+
+
+That's already a nice setup. Whenever Rails starts up, this will add all books to the index.
+ +From anywhere (if you have multiple, call Picky::Indexes.index
to index all).
Ok, this sets up the index and the indexing. What about the model?
+ +In the model, here app/models/book.rb
add this:
# Two callbacks.
+#
+after_save :picky_index
+after_destroy :picky_index
+
+# Updates the Picky index.
+#
+def picky_index
+ if destroyed?
+ BooksIndex.remove id
+ else
+ BooksIndex.replace self
+ end
+end
+
+
+I actually recommend to use after_commit
, but it did not work at the time of writing.
Now, in the controller, you need to return some results to the user.
+ +# GET /books/search
+#
+def search
+ results = BookSearch.search query, params[:ids] || 20, params[:offset] || 0
+
+ # Render nicely as a partial.
+ #
+ results = results.to_hash
+ results.extend Picky::Convenience
+ results.populate_with Book do |book|
+ render_to_string :partial => "book", :object => book
+ end
+
+ respond_to do |format|
+ format.html do
+ render :text => "Book result ids: #{results.ids.to_s}"
+ end
+ format.json do
+ render :text => results.to_json
+ end
+ end
+end
+
+
+The first line executes the search using query params. You can try this using curl
:
curl http://127.0.0.1:4567/books/search?query=test
+
+
+The next few lines use the results as a hash, and populate the results with data loaded from the database, rendering a book partial.
+ +Then, we respond to HTML requests with a simple web page, or respond to JSON requests with the results rendered in JSON.
+ +As you can see, you can do whatever you want with the results. You could use this in an API, or send simple text to the user, or...
+ +TODO Using the Picky client JavaScript.
+TODO
+TODO Reloading indexes live
+ +TODO Prepending the current user to filter
+ +# Prepends the current user filter to
+# the current query.
+#
+query = "user:#{current_user.id} #{params[:query]}"
+
+TODO
+ +TODO Also mention Padrino.
+TODO
+TODO
+ +The indexing
method in an Index
describes how index data is handled.
The searching
method in a Search
describes how queries are handled.
This is where you use these options:
+ +Picky::Index.new :books do
+ indexing options_hash_or_tokenizer
+end
+
+Search.new *indexes do
+ searching options_hash_or_tokenizer
+end
+
+
+Both take either an options hash, your hand-rolled tokenizer, or a Picky::Tokenizer
instance initialized with the options hash.
Picky by default goes through the following list, in order:
+ +#substitute(text) #=> substituted text
[[/matching_regexp/, 'replace match \1']]
Infinity
(Don't go there, ok?).->(token){ token == 'hello' }
true
or false
, false
is default.stem(text)
that returns stemmed text.You pass the above options into
+ +Search.new *indexes do
+ searching options_hash
+end
+
+
+You can provide your own tokenizer:
+ +Search.new books_index do
+ searching MyTokenizer.new
+end
+
+
+TODO Update what the tokenizer needs to return.
+ +The tokenizer needs to respond to the method #tokenize(text)
, returning a Picky::Query::Tokens
object. If you have an array of tokens, e.g. [:my, :nice, :tokens]
,
+you can pass it into Picky::Query::Tokens.process(my_tokens)
to get the tokens and return these.
rake 'try[text,some_index,some_category]'
(some_index
, some_category
optional) tells you how a given text is indexed.
It needs to be programmed in a performance efficient way if you want your search engine to be fast.
+Even though you usually provide options (see below), you can provide your own:
+ +Picky::Index.new :books do
+ indexing MyTokenizer.new
+end
+
+
+The tokenizer must respond to tokenize(text)
and return [tokens, words]
, where tokens
is an Array of processed tokens and words
is an Array of words that represent the original words in the query (or as close as possible to the original words).
It is also possible to return [tokens]
, where tokens is the Array of processed query words. (Picky will then just use the tokens as words)
A very simple tokenizer that just splits the input on commas:
+ +class MyTokenizer
+ def tokenize text
+ tokens = text.split ','
+ [tokens]
+ end
+end
+
+MyTokenizer.new.tokenize "Hello, world!" # => [["Hello", " world!"]]
+
+Picky::Index.new :books do
+ indexing MyTokenizer.new
+end
+
+
+The same can be achieved with this:
+ +Picky::Index.new :books do
+ indexing splits_text_on: ','
+end
+
+Usually, you use the same options for indexing and searching:
+ +tokenizer_options = { ... }
+
+index = Picky::Index.new :example do
+ indexing tokenizer_options
+end
+
+Search.new index do
+ searching tokenizer_options
+end
+
+
+However, consider this example.
+Let's say your data has lots of words in them that look like this: all-data-are-tokenized-by-dashes
.
+And people would search for them using spaces to keep words apart: searching for data
.
+In this case it's a good idea to split the data and the query differently.
+Split the data on dashes, and queries on \s
:
index = Picky::Index.new :example do
+ indexing splits_text_on: /-/
+end
+
+Search.new index do
+ searching splits_text_on: /\s/
+end
+
+
+The rule number one to remember when tokenizing is: +Tokenized query text needs to match the text that is in the index.
+ +So both the index and the query need to tokenize to the same string:
+ +all-data-are-tokenized-by-dashes
=> ["all", "data", "are", "tokenized", "by", "dashes"]
searching for data
=> ["searching", "for", "data"]
Either look in the /index
directory (the "prepared" files is the tokenized data), or use Picky's try
rake task:
$ rake try[test]
+"test" is saved in the Picky::Indexes index as ["test"]
+"test" as a search will be tokenized as ["test"]
+
+
+You can tell Picky which index, or even category to use:
+ +$ rake try[test,books]
+$ rake try[test,books,title]
+
+
+Indexes do three things:
+ +Picky offers a choice of four index types:
+ +This is how they look in code:
+ +books_memory_index = Index.new :books do
+ # Configuration goes here.
+end
+
+books_redis_index = Index.new :books do
+ backend Backends::Redis.new
+ # Configuration goes here.
+end
+
+
+Both save the preprocessed data from the data source in the /index
directory so you can go look if the data is preprocessed correctly.
Indexes are then used in a Search
interface.
Searching over one index:
+ +books = Search.new books_index
+
+
+Searching over multiple indexes:
+ +media = Search.new books_index, dvd_index, mp3_index
+
+
+The resulting ids should be from the same id space to be useful – or the ids should be exclusive, such that eg. a book id does not collide with a dvd id.
+The in-memory index saves its indexes as files transparently in the form of JSON files that reside in the /index
directory.
When the server is started, they are loaded into memory. As soon as the server is stopped, the indexes are deleted from memory.
+ +Indexing regenerates the JSON index files and can be reloaded into memory, even in the running server (see below).
+The Redis index saves its indexes in the Redis server on the default port, using database 15.
+ +When the server is started, it connects to the Redis server and uses the indexes in the key-value store.
+ +Indexing regenerates the indexes in the Redis server – you do not have to restart the server running Picky.
+TODO
+TODO
+If you don't have access to your indexes directly, like so
+ +books_index = Index.new(:books) do
+ # ...
+end
+
+books_index.do_something_with_the_index
+
+
+and for example you'd like to access the index from a rake task, you can use
+ +Picky::Indexes
+
+
+to get all indexes.
+ +To get a single index use
+ +Picky::Indexes[:index_name]
+
+
+and to get a single category of an index, use
+ +Picky::Indexes[:index_name][:category_name]
+
+
+That's it.
+This is all you can do to configure an index:
+ +books_index = Index.new :books do
+ source { Book.order("isbn ASC") }
+
+ indexing removes_characters: /[^a-z0-9\s\:\"\&\.\|]/i, # Default: nil
+ stopwords: /\b(and|the|or|on|of|in)\b/i, # Default: nil
+ splits_text_on: /[\s\/\-\_\:\"\&\/]/, # Default: /\s/
+ removes_characters_after_splitting: /[\.]/, # Default: nil
+ normalizes_words: [[/\$(\w+)/i, '\1 dollars']], # Default: nil
+ rejects_token_if: lambda { |token| token == :blurf }, # Default: nil
+ case_sensitive: true, # Default: false
+ substitutes_characters_with: Picky::CharacterSubstituters::WestEuropean.new, # Default: nil
+ stems_with: Lingua::Stemmer.new # Default: nil
+
+ category :id
+ category :title,
+ partial: Partial::Substring.new(:from => 1),
+ similarity: Similarity::DoubleMetaphone.new(2),
+ qualifiers: [:t, :title, :titre]
+ category :author,
+ partial: Partial::Substring.new(:from => -2)
+ category :year,
+ partial: Partial::None.new
+ qualifiers: [:y, :year, :annee]
+
+ result_identifier 'boooookies'
+end
+
+
+Usually you won't need to configure all that.
+ +But if your boss comes in the door and asks why X is not found… you know. And you can improve the search engine relatively quickly and painless.
+ +More power to you.
+Data sources define where the data for an index comes from. There are explicit data sources and implicit data sources.
+Explicit data sources are mentioned in the index definition using the #source
method.
You define them on an index:
+ +Index.new :books do
+ source Book.all # Loads the data instantly.
+end
+
+Index.new :books do
+ source { Book.all } # Loads on indexing. Preferred.
+end
+
+
+Or even on a single category:
+ +Index.new :books do
+ category :title,
+ source: lambda { Book.all }
+end
+
+
+TODO more explanation how index sources and single category sources might work together.
+ +Explicit data sources must respond to #each, for example, an Array.
+Picky supports any data source as long as it supports #each
.
See under Flexible Sources how you can use this.
+ +In short. Model:
+ +class Monkey
+ attr_reader :id, :name, :color
+ def initialize id, name, color
+ @id, @name, @color = id, name, color
+ end
+end
+
+
+The data:
+ +monkeys = [
+ Monkey.new(1, 'pete', 'red'),
+ Monkey.new(2, 'joey', 'green'),
+ Monkey.new(3, 'hans', 'blue')
+]
+
+
+Setting the array as a source
+ +Index::Memory.new :monkeys do
+ source { monkeys }
+ category :name
+ category :couleur, :from => :color # The couleur category will take its data from the #color method.
+end
+
+If you define the source directly in the index block, it will be evaluated instantly:
+ +Index::Memory.new :books do
+ source Book.order('title ASC')
+end
+
+
+This works with ActiveRecord and other similar ORMs since Book.order
returns a proxy object that will only be evaluated when the server is indexing.
For example, this would instantly get the records, since #all
is a kicker method:
Index::Memory.new :books do
+ source Book.all # Not the best idea.
+end
+
+
+In this case, it is better to give the source
method a block:
Index::Memory.new :books do
+ source { Book.all }
+end
+
+
+This block will be executed as soon as the indexing is running, but not earlier.
+Implicit data sources are not mentioned in the index definition, but rather, the data is added (or removed) via realtime methods on an index, like #add
, #<<
, #unshift
, #remove
, #replace
, and a special form, #replace_from
.
So, you don't define them on an index or category as in the explicit data source, but instead add to either like so:
+ +index = Index.new :books do
+ category :example
+end
+
+Book = Struct.new :id, :example
+index.add Book.new(1, "Hello!")
+index.add Book.new(2, "World!")
+
+
+Or to a specific category:
+ +index[:example].add Book.new(3, "Only add to a single category")
+
+Currently, there are 7 methods to change an index:
+ +#add
: Adds the thing to the end of the index (even if already there). index.add thing
#<<
: Adds the thing to the end of the index (shows up last in results). index << thing
#unshift
: Adds the thing to the beginning of the index (shows up first in results). index.unshift thing
#remove
: Removes the thing from the index (if there). index.remove thing
#replace
: Replaces the thing in the index (if there, otherwise like #add
). Equal to #remove
followed by #add
. index.replace thing
#replace_from
: Pass in a Hash. Replaces the thing in the index (if there, otherwise like #add
). Equal to #remove
followed by #add
. index.replace id: 1, example: "Hello, I am Hash!"
See Tokenizing for tokenizer options.
+ +Categories – usually what other search engines call fields – define categorized data. For example, book data might have a title
, an author
and an isbn
.
So you define that:
+ +Index.new :books do
+ source { Book.order('author DESC') }
+
+ category :title
+ category :author
+ category :isbn
+end
+
+
+(The example assumes that a Book
has readers for title
, author
, and isbn
)
This already works and a search will return categorized results. For example, a search for "Alan Tur" might categorize both words as author
, but it might also at the same time categorize both as title
. Or one as title
and the other as author
.
That's a great starting point. So how can I customize the categories?
+The partial option defines if a word is also found when it is only partially entered. So, Picky
will be found when typing Pic
.
The default partial marker is *
, so entering Pic*
will force Pic
to be looked for in the partial index.
The last word in a query is always partial, by default. If you want to force a non partial search on the last query word, use "
as in last query word would be "partial"
, but here partial
would not be searched in the partial index.
By default, the partial marker is *
and the non-partial marker is "
. You change the markers by setting
Picky::Query::Token.partial_character = '\*'
Picky::Query::Token.no_partial_character = '"'
You define this by this:
+ +category :some, partial: (some generator which generates partial words)
+
+
+The Picky default is
+ +category :some, partial: Picky::Partial::Substring.new(from: -3)
+
+
+You get this one by defining no partial option:
+ +category :some
+
+
+The option Partial::Substring.new(from: 1)
will make a word completely partially findable.
So the word Picky
would be findable by entering Picky
, Pick
, Pic
, Pi
, or P
.
If you don't want any partial finds to occur, use:
+ +category :some, partial: Partial::None.new
+
+There are four built-in partial options. All examples use "hello" as the token.
+ +Partial::None.new
Generates no partials, using *
will use exact word matching.Partial::Postfix.new(from: startpos)
Generates all postfixes.
from: 1
# => ["hello", "hell", "hel", "he", "h"]from: 4
# => ["hello", "hell"]Partial::Substring.new(from: startpos, to: endpos)
Generates substring partials. to: -1
is set by default.
from: 1
# => ["hello", "hell", "hel", "he", "h"]from: 4
# => ["hello", "hell"]from: 1, to: -2
# => ["hell", "hel", "he", "h"]from: 4, to: -2
# => ["hell"]Partial::Infix.new(min: minlength, max: maxlength)
Generates infix partials. max: -1
is set by default.
min: 1
# => ["hello", "hell", "ello", "hel", "ell", "llo", "he", "el", "ll", "lo", "h", "e", "l", "l", "o"]min: 4
# => ["hello", "hell", "ello"]min: 1, max: -2
# => ["hell", "ello", "hel", "ell", "llo", "he", "el", "ll", "lo", "h", "e", "l", "l", "o"]min: 4, max: -2
# => ["hell", "ello"]The general rule is: The more tokens are generated from a token, the larger your index will be. Ask yourself whether you really need an infix partial index.
+You can also pass in your own partial generators. How?
+ +Implement an object which has a single method #each_partial(token, &block)
. That method should yield all partials for a given token. Want to implement a (probably useless) random partial search? No problem.
Example:
+ +You need an alphabetic index search. If somebody searches for a name, it should only be found if typed as a whole. But you'd also like to find it when just entering a
, for Andy
, Albert
, etc.
class AlphabeticIndexPartial
+ def each_partial token, &block
+ [token[0], token].each &block
+ end
+end
+
+
+This will result in "A" and "Andy" being in the index for "Andy".
+ +Pretty straightforward, right?
+The weight option defines how strongly a word is weighed. By default, Picky rates a word according to the logarithm of its occurrence. This means that a word that occurs more often will be weighed slightly higher.
+ +You define a weight option like this:
+ +category :some, weight: MyWeights.new
+
+
+The default is Weights::Logarithmic.new
.
You can also pass in your own weight generators. See this article to learn more.
+ +If you don't want Picky to calculate weights for your indexed entries, you can use constant or dynamic weights.
+ +With 0.0 as a constant weight:
+ +category :some, weight: Weights::Constant.new # Returns 0.0 for all results.
+
+
+With 3.14 as a constant weight:
+ +category :some, weight: Weights::Constant.new(3.14) # Returns 3.14 for all results.
+
+
+Or with a dynamically calculated weight:
+ +Weights::Dynamic.new do |str_or_sym|
+ sym_or_str.length # Uses the length of the symbol as weight.
+end
+
+
+You almost never need to define weights. More often than not, you can fiddle with boosting combinations of categories , via the boost
method in searches.
Usually it is preferable to boost specific search results, say "florian hanke" mapped to [:first_name, :last_name], but sometimes you want a specific category boosted wherever it occurs.
+ +For example, the title in a movie search engine would need to be boosted in all searches it occurs. Do this:
+ +category :title, weight: Weights::Logarithmic.new(+1)
+
+
+This adds +1 to all weights. Why the logarithmic? By default, Picky weighs categories using the logarithm of occurrences. So the default would be:
+ +category :title, weight: Weights::Logarithmic.new # The default.
+
+
+The Logarithmic
initializer accepts a constant to be added to the result. Adding the constant +1
is like multiplying the weight by Math::E
(e is Euler's constant). If you don't understand, don't worry, just know that by adding a constant you multiply by a certain value.
In short:
+* Use weight
on the index, if you need a category to be boosted everywhere, wherever it occurs
+* Use boosting if you need to boost specific combinations of categories only for a specific search.
The similarity option defines if a word is also found when it is typed wrong, or close to another word. So, "Picky" might be already found when typing "Pocky~" (Picky will search for similar word when you use the tilde, ~).
+ +You define a similarity option like this:
+ +category :some, similarity: Similarity::None.new
+
+
+(This is also the default)
+ +There are several built-in similarity options, like
+ +category :some, similarity: Similarity::Soundex.new
+category :this, similarity: Similarity::Metaphone.new
+category :that, similarity: Similarity::DoubleMetaphone.new
+
+
+You can also pass in your own similarity generators. See this article to learn more.
+Usually, when you search for title:wizard
you will only find books with "wizard" in their title.
Maybe your client would like to be able to only enter t:wizard
. In that case you would use this option:
category :some, qualifier: "t"
+
+
+Or if you'd like more to match:
+ +category :some,
+ qualifiers: ["t", "title", "titulo"]
+
+
+(This matches "t", "title", and also the italian "titulo")
+ +Picky will warn you if on one index the qualifiers are ambiguous (Picky will assume that the last "t" for example is the one you want to use).
+ +This means that:
+ +category :some, qualifier: "t"
+category :other, qualifier: "t"
+
+
+Picky will assume that if you enter t:bla
, you want to search in the other
category.
Searching in multiple categories can also be done. If you have:
+ +category :some, :qualifier => 's'
+category :other, :qualifier => 'o'
+
+
+Then searching with s,o:bla
will search for bla
in both :some
and :other
. Neat, eh?
Usually, the categories will take their data from the reader or field that is the same as their name.
+ +Sometimes though, the model has not the right names. Say, you have an italian book model, Libro
. But you still want to use english category names.
Index.new :books do
+ source { Libro.order('autore DESC') }
+
+ category :title, :from => :titulo
+ category :author, :from => :autore
+ category :isbn
+end
+
+
+You can also populate the index at runtime (eg. with index.add
) using a lambda. The required argument inside the lambda is the object being added to the index.
Index.new :books do
+ category :authors, :from => lambda { |book| book.authors.map(&:name) }
+end
+
+You will almost never need to use this, as the key format will usually be the same for all categories, which is when you would define it on the index, like so.
+ +But if you need to, use as with the index.
+ +Index.new "books" do
+ category :title,
+ :key_format => :to_s
+end
+
+You will almost never need to use this, as the source will usually be the same for all categories, which is when you would define it on the index, "like so":#indexes-sources.
+ +But if you need to, use as with the index.
+ +Index.new :books do
+ category :title,
+ source: some_source
+end
+
+Set this option to false
when you give Picky already tokenized data (an Array, or generally an Enumerator).
Index.new :people do
+ category :names, tokenize: false
+end
+
+
+And Person has a method #names
which returns this array:
class Person
+
+ def names
+ ['estaban', 'julio', 'ricardo', 'montoya', 'larosa', 'ramirez']
+ end
+
+end
+
+
+Then Picky will simply use the tokens in that array without (pre-)processing them. Of course, this means you need to do all the tokenizing work. If you leave the tokens in uppercase formatting, then nothing will be found, unless you set the Search to be case-sensitive, for example.
+Users can use some special features when searching. They are:
+ +something*
(By default, the last word is implicitly partial)"something"
(The quotes make the query on this word explicitly non-partial)something~
(The tilde makes this word eligible for similarity search)title:something
(Picky will only search in the category designated as title, in each index of the search)title,author:something
(Picky will search in title and author categories, in each index of the search)year:1999…2012
(Picky will search all values in a Ruby Range
: (1999..2012)
)These options can be combined (e.g. title,author:funky~"
): This will try to find similar words to funky (like "fonky"), but no partials of them (like "fonk"), in both title and author.
Non-partial will win over partial, if you use both, as in test*"
.
Also note that these options need to make it through the tokenizing, so don't remove any of *":,-
. TODO unclear
By default, the indexed data points to keys that are integers, or differently said, are formatted using to_i
.
If you are indexing keys that are strings, use to_s
– a good example are MongoDB BSON keys, or UUID keys.
The key_format
method lets you define the format:
Index.new :books do
+ key_format :to_s
+end
+
+
+The Picky::Sources
already set this correctly. However, if you use an #each
source that supplies Picky with symbol ids, you should tell it what format the keys are in, eg. key_format :to_s
.
By default, an index is identified by its name in the results. This index is identified by :books
:
Index.new :books do
+ # ...
+end
+
+
+This index is identified by media
in the results:
Index.new :books do
+ # ...
+ result_identifier 'media'
+end
+
+
+You still refer to it as :books
in e.g. Rake tasks, Picky::Indexes[:books].reload
. The result_identifier
option is just for the results.
Indexing can be done programmatically, at any time. Even while the server is running.
+ +Indexing all indexes is done with
+ +Picky::Indexes.index
+
+
+Indexing a single index can be done either with
+ +Picky::Indexes[:index_name].index
+
+
+or
+ +index_instance.index
+
+
+Indexing a single category of an index can be done either with
+ +Picky::Indexes[:index_name][:category_name].index
+
+
+or
+ +category_instance.index
+
+Loading (or reloading) your indexes in a running application is possible.
+ +Loading all indexes is done with
+ +Picky::Indexes.load
+
+
+Loading a single index can be done either with
+ +Picky::Indexes[:index_name].load
+
+
+or
+ +index_instance.load
+
+
+Loading a single category of an index can be done either with
+ +Picky::Indexes[:index_name][:category_name].load
+
+
+or
+ +category_instance.load
+
+To communicate with your server using signals:
+ +books_index = Index.new(:books) do
+ # ...
+end
+
+Signal.trap("USR1") do
+ books_index.reindex
+end
+
+
+This reindexes the books_index when you call
+ +kill -USR1 <server_process_id>
+
+
+You can refer to the index like so if want to define the trap somewhere else:
+ +Signal.trap("USR1") do
+ Picky::Indexes[:books].reindex
+end
+
+Reindexing your indexes is just indexing followed by reloading (see above).
+ +Reindexing all indexes is done with
+ +Picky::Indexes.reindex
+
+
+Reindexing a single index can be done either with
+ +Picky::Indexes[:index_name].reindex
+
+
+or
+ +index_instance.reindex
+
+
+Reindexing a single category of an index can be done either with
+ +Picky::Indexes[:index_name][:category_name].reindex
+
+
+or
+ +category_instance.reindex
+
+
+Picky offers a Search
interface for the indexes. You instantiate it as follows:
Just searching over one index:
+ +books = Search.new books_index # searching over one index
+
+
+Searching over multiple indexes:
+ +media = Search.new books_index, dvd_index, mp3_index
+
+
+Such an instance can then search over all its indexes and returns a Picky::Results
object:
results = media.search "query", # the query text
+ 20, # number of ids
+ 0 # offset (for pagination)
+
+
+Please see the part about Results to know more about that.
+You use a block to set search options:
+ +media = Search.new books_index, dvd_index, mp3_index do
+ searching tokenizer_options_or_tokenizer
+ boost [:title, :author] => +2,
+ [:author, :title] => -1
+end
+
+See Tokenizing for tokenizer options.
+The boost
option defines what combinations to boost.
This is unlike boosting in most other search engines, where you can only boost a given field. I've found it much more useful to boost combinations.
+ +For example, you have an index of addresses. The usual case is that someone is looking for a street and a number. So if Picky encounters that combination (in that order), it should promote the results containing that combination to a more prominent spot. +On the other hand, if picky encounters a street number followed by a street name, which is unlikely to be a search for an address (where I come from), you might want to demote that result.
+ +So let's boost street, streetnumber
, while at the same time deboost streetnumber, street
:
addresses = Picky::Search.new address_index do
+ boost [:street, :streetnumber] => +2,
+ [:streetnumber, :street] => -1
+end
+
+
+If you still want to boost a single category, check out the category weight option. +For example:
+ +Picky::Index.new :addresses do
+ category :street, weight: Picky::Weights::Logarithmic.new(+4)
+ category :streetnumber
+end
+
+
+This boosts the weight of the street category for all searches using the index with this category. So whenever the street category is found in results, it will boost these.
+Picky combines consecutive categories in searches for boosting. So if you search for "star wars empire strikes back", when you defined [:title] => +1
, then that boosting is applied.
Why? In earlier versions of Picky we found that boosting specific combinations is less useful than boosting a specific order of categories.
+ +Let me give you an example from a movie search engine. instead of having to say boost [:title] => +1, [:title, :title] => +1, [:title, :title, :title] => +1
, it is far more useful to say "If you find any number of title words in a row, boost it". So, when searching for "star wars empire strikes back 1979", it is less important that the query contains 5 title words than that it contains a title followed by a release year. So in this particular case, a boost defined by [:title, :release_year] => +3
would be applied.
There's a full blog post devoted to this topic.
+ +In short, an ignore :name
option makes that Search throw away (ignore) any tokens (words) that map to category name
.
Let's say we have a search defined:
+ +names = Picky::Search.new name_index do
+ ignore :first_name
+end
+
+
+Now, if Picky finds the tokens "florian hanke" in both :first_name, :last_name
and :last_name, :last_name
, then it will throw away the solutions for :first_name
("florian" will be thrown away) leaving only "hanke", since that is a last name. The [:last_name, :last_name]
combinations will be left alone – ie. if "florian" and "hanke" are both found in last_name
.
The ignore
option also takes arrays. If you give it an array, it will throw away all solutions where that order of categories occurs.
Let's say you want to throw away results where last name is found before first name, because your search form is in order: [first_name last_name]
.
names = Picky::Search.new name_index do
+ ignore [:last_name, :first_name]
+end
+
+
+So if somebody searches for "peter paul han" (each a last name as well as a first name), and Picky finds the following combinations:
+ +[:first_name, :first_name, :first_name]
+[:last_name, :first_name, :last_name]
+[:first_name, :last_name, :first_name]
+[:last_name, :first_name, :first_name]
+[:last_name, :last_name, :first_name]
+
+
+then the combinations
+ +[:last_name, :first_name, :first_name]
+[:last_name, :last_name, :first_name]
+
+
+will be thrown away, since they are in the order [:last_name, :first_name]
. Note that [:last_name, :first_name, :last_name]
is not thrown away since it is last-first-last.
This is the opposite of the ignore
option above.
Almost. The only
option only takes arrays. If you give it an array, it will keep only solutions where that order of categories occurs.
Let's say you want to keep only results where first name is found before last name, because your search form is in order: [first_name last_name]
.
names = Picky::Search.new name_index do
+ only [:first_name, :last_name]
+end
+
+
+So if somebody searches for "peter paul han" (each a last name as well as a first name), and Picky finds the following combinations:
+ +[:first_name, :first_name, :last_name]
+[:last_name, :first_name, :last_name]
+[:first_name, :last_name, :first_name]
+[:last_name, :first_name, :first_name]
+[:last_name, :last_name, :first_name]
+
+
+then only the combination
+ +[:first_name, :first_name, :last_name]
+
+
+will be kept, since it is the only one where first comes before last, in that order.
+There's a full blog post devoted to this topic.
+ +In short, the ignore_unassigned_tokens true/false
option makes Picky be very lenient with your queries. Usually, if one of the search words is not found, say in a query "aston martin cockadoodledoo", Picky will return an empty result set, because "cockadoodledoo" is not in any index, in a car search, for example.
By ignoring the "cockadoodledoo" that can't be assigned sensibly, you will still get results.
+ +This could be used in a search for advertisements that are shown next to the results.
+ +If you've defined an ads search like so:
+ +ads_search = Search.new cars_index do
+ ignore_unassigned_tokens true
+end
+
+
+then even if Picky does not find anything for "aston martin cockadoodledoo", it will find an ad, simply ignoring the unassigned token.
+The max_allocations(integer)
option cuts off calculation of allocations.
What does this mean? Say you have code like:
+ +phone_search = Search.new phonebook do
+ max_allocations 1
+end
+
+
+And someone searches for "peter thomas".
+ +Picky then generates all possible allocations and sorts them.
+ +It might get
+ +[first_name, last_name]
[last_name, first_name]
[first_name, first_name]
with the first allocation being the most probable one.
+ +So, with max_allocations 1
it will only use the topmost one and throw away all the others.
It will only go through the first one and calculate only results for that one. This can be used to speed up Picky in case of exploding amounts of allocations.
+The terminate_early(integer)
or terminate_early(with_extra_allocations: integer)
option stops Picky from calculate all ids of all allocations.
However, this will also return a wrong total.
+ +So, important note: Only use when you don't display a total. Or you want to fool your users (not recommended).
+ +Examples:
+ +Stop as soon as you have calculated enough ids for the allocation.
+ +phone_search = Search.new phonebook do
+ terminate_early # The default uses 0.
+end
+
+
+Stop as soon as you have calculated enough ids for the allocation, and then calculate 3 allocations more (for example, to show to the user).
+ +phone_search = Search.new phonebook do
+ terminate_early 3
+end
+
+
+There's also a hash form to be more explicit. So the next coder knows what it does. (However, us cool Picky hackers know ;) )
+ +phone_search = Search.new phonebook do
+ terminate_early with_extra_allocations: 5
+end
+
+
+This option speeds up Picky if you don't need a correct total.
+ +Results are returned by the Search
instance.
books = Search.new books_index do
+ searching splits_text_on: /[\s,]/
+ boost [:title, :author] => +2
+end
+
+results = books.search "test"
+
+p results # Returns results in log form.
+p results.to_hash # Returns results as a hash.
+p results.to_json # Returns results as JSON.
+
+If no sorting is defined, Picky results will be sorted in the order of the data provided by the data source.
+ +However, you can sort the results any way you want.
+You can define an arbitrary sorting on results by calling Results#sort_by
.
+It takes a block with a single parameter: The stored id of a result item.
This example looks up a result item via id and then takes the priority of the item to sort the results.
+ +results.sort_by { |id| MyResultItemsHash[id].priority }
+
+
+The results are only sorted within their allocation.
+If you, for example, searched for Peter
, and Picky allocated results in first_name
and last_name
, then each allocation's results would be sorted.
Picky is optimized: it only sorts results which are actually visible. So if Picky looks for the first 20 results, and the first allocation already has more than 20 results in it – say, 100 --, then it will only sort the 100 results of the first allocation. It will still calculate all other allocations, but not sort them.
+Results#sort_by
, then sorting incurs no costs.sort_hash = {
+ 1 => 10, # important
+ 2 => 100 # not so important
+}
+results.sort_by { |id| sort_hash[id] }
+
+
+Note that in Ruby, a lower value => more to the front (the higher up in Picky).
+TODO Update with latest logging style and ideas on how to separately log searches.
+ +Picky results can be logged wherever you want.
+ +A Picky Sinatra server logs whatever to wherever you want:
+ +MyLogger = Logger.new "log/search.log"
+
+# ...
+
+get '/books' do
+ results = books.search "test"
+ MyLogger.info results
+ results.to_json
+end
+
+
+or set it up in separate files for different environments:
+ +require "logging/#{PICKY_ENVIRONMENT}"
+
+
+A Picky classic server logs to the logger defined with the Picky.logger=
writer.
Set it up in a separate logging.rb
file (or directly in the app/application.rb
file).
Picky.logger = Picky::Loggers::Concise.new STDOUT
+
+
+and the Picky classic server will log the results into it, if it is defined.
+ +Why in a separate file? So that you can have different logging for different environments.
+ +More power to you.
+ +Here's the Wikipedia entry on facets. I fell asleep after about 5 words. Twice.
+ +In Picky, categories are explicit slices over your index data. Picky facets are implicit slices over your category data.
+ +What does "implicit" mean here?
+ +It means that you didn't explicitly say, "My data is shoes, and I have these four brands: Nike, Adidas, Puma, and Vibram".
+ +No, instead you told Picky that your data is shoes, and there is a category "brand". Let's make this simple:
+ +index = Picky::Index.new :shoes do
+ category :brand
+ category :name
+ category :type
+end
+
+index.add Shoe.new(1, 'nike', 'zoom', 'sports')
+index.add Shoe.new(2, 'adidas', 'speed', 'sports')
+index.add Shoe.new(3, 'nike', 'barefoot', 'casual')
+
+
+With this data in mind, let's look at the possibilities:
+Index facets are very straightforward.
+ +You ask the index for facets and it will give you all the facets it has and how many results there are within:
+ +index.facets :brand # => { 'nike' => 2, 'adidas' => 1 }
+
+
+The category type is a good candidate for facets, too:
+ +index.facets :type # => { 'sports' => 2, 'casual' => 1 }
+
+
+What are the options?
+ +at_least
: index.facets :brand, at_least: 2 # => { 'nike' => 2 }
counts
: index.facets :brand, counts: false # => ['nike', 'adidas']
index.facets :brand, at_least: 2, counts: false # => ['nike']
at_least
only gives you facets which occur at least n times and counts
tells the facets method whether you want counts with the facets or not. If counts are omitted, you'll get an Array
of facets instead of a Hash
.
Pretty straightforward, right?
+ +Search facets are quite similar:
+Search facets work similarly to index facets. In fact, you can use them in the same way:
+ +search_interface.facets :brand # => { 'nike' => 2, 'adidas' => 1 }
+search_interface.facets :type # => { 'sports' => 2, 'casual' => 1 }
+search_interface.facets :brand, at_least: 2 # => { 'nike' => 2 }
+search_interface.facets :brand, counts: false # => ['nike', 'adidas']
+search_interface.facets :brand, at_least: 2, counts: false # => ['nike']
+
+
+However search facets are more powerful, as you can also filter the facets with a filter query option:
+ +shoes.facets :brand, filter: 'some filter query'
+
+
+What does that mean?
+ +Usually you want to use multiple facets in your interface. +For example, a customer might already have filtered results by type "sports" because they are only interested in sports shoes. +Now you'd like to show them the remaining brands, so that they can filter on the remaining facets.
+ +How do you do this?
+ +Let's say we have an index as above, and a search interface to the index:
+ +shoes = Picky::Search.new index
+
+
+If the customer has already filtered for sports, you simply pass the query to the filter
option:
shoes.facets :brand, filter: 'type:sports' # => { 'nike' => 1, 'adidas' => 1 }
+
+
+This will give you only 1 "nike" facet. If the customer filtered for "casual":
+ +shoes.facets :brand, filter: 'type:casual' # => { 'nike' => 1 }
+
+
+then we'd only get the casual nike facet (from that one "barefoot" shoe picky loves so much).
+ +As said, filtering works like the query string passed to picky. So if the customer has filtered for brand "nike" and type "sports", you'd get:
+ +shoes.facets :brand, filter: 'brand:nike type:sports' # => { 'nike' => 1 }
+shoes.facets :name, filter: 'brand:nike type:sports' # => { 'zoom' => 1 }
+
+
+Playing with it is fun :)
+ +See below for testing and performance tips.
+Let's say we have an index with some data:
+ +index = Picky::Index.new :people do
+ category :name
+ category :surname
+end
+
+person = Struct.new :id, :name, :surname
+index.add person.new(1, 'tom', 'hanke')
+index.add person.new(2, 'kaspar', 'schiess')
+index.add person.new(3, 'florian', 'hanke')
+
+
+This is how you test facets:
+# We should find two surname facets.
+#
+index.facets(:surname).should == {
+ 'hanke' => 2, # hanke occurs twice
+ 'schiess' => 1 # schiess occurs once
+}
+
+# Only one occurs at least twice.
+#
+index.facets(:surname, at_least: 2).should == {
+ 'hanke' => 2
+}
+
+# Passing in no filter query just returns the facets
+#
+finder.facets(:surname).should == {
+ 'hanke' => 2,
+ 'schiess' => 1
+}
+
+# A filter query narrows the facets down.
+#
+finder.facets(:name, filter: 'surname:hanke').should == {
+ 'tom' => 1,
+ 'florian' => 1
+}
+
+# It allows explicit partial matches.
+#
+finder.facets(:name, filter: 'surname:hank*').should == {
+ 'fritz' => 1,
+ 'florian' => 1
+}
+
+Two rules:
+ +A good example for a meaningful use of facets would be brands of shoes. +There aren't many different brands (usually less than 100).
+ +So this facet query
+ +finder.facets(:brand, filter: 'type:sports')
+
+
+does not return thousands of facets.
+ +Should you find yourself in a position where you have to use a facet query on uncontrolled data, eg. user entered data, you might want to cache the results:
+ +category = :name
+filter = 'age_bracket:40'
+
+some_cache[[category, filter]] ||= finder.facets(category, filter: filter)
+
+
+Picky offers a standard HTML interface that works well with its JavaScript. Render this into your HTML (needs the picky-client
gem):
Picky::Helper.cached_interface
+
+
+Adding a JS interface (written in jQuery for brevity):
+ +$(document).ready(function() {
+ pickyClient = new PickyClient({
+ // A full query displays the rendered results.
+ //
+ full: '/search/full',
+
+ // More options...
+
+ });
+});
+
+
+See the options described and listed below.
+ +The variable pickyClient has the following functions:
+ +// Params are params for the controller action. Full is either true or false.
+//
+pickyClient.insert(query, params, full);
+
+// Resends the last query.
+//
+pickyClient.resend;
+
+// If not given a query, will use query from the URL (needs history.js).
+//
+pickyClient.insertFromURL(overrideQuery);
+
+
+When creating the client itself, you have many more options, as described here:
+Search options are about configuring the search itself.
+ +There are four different callbacks that you can use. The part after the ||
describes the default, which is an empty function.
The beforeInsert
is executed before a call to pickyClient.beforeInsert
. Use this to sanitize queries coming from URLs:
var beforeInsertCallback = config.beforeInsert || function(query) { };
+
+
+The before
is executed before a call to the server. Use this to add any filters you might have from radio buttons or other interface elements:
var beforeCallback = config.before || function(query, params) { };
+
+
+The success
is executed just after a successful response. Use this to modify returned results before Picky renders them:
var successCallback = config.success || function(data, query) { };
+
+
+The after
callback is called just after Picky has finished rendering results – use it to make any changes to the interface (like update an advertisement or similar).
var afterCallback = config.after || function(data, query) { };
+
+
+This will cause the interface to search even if the input field is empty:
+ +var searchOnEmpty = config.searchOnEmpty || false;
+
+
+If you want to tell the server you need more than 0 live search results, use liveResults
:
var liveResults = config.liveResults || 0;
+
+
+If the live results need to be rendered, set this to be true. Usually used when full results need to be rendered even for live searches (search as you type):
+ +var liveRendered = config.liveRendered || false;
+
+
+After each keystroke, Picky waits for a designated interval (default is 180ms) for the next keystroke. If no key is hit, it will send a "live" query to the search server. This option lets you change that interval time:
+ +var liveSearchTimerInterval = config.liveSearchInterval || 180;
+
+
+You can completely exchange the backend used to make calls to the server – in this case I trust you to read the JS code of Picky yourself:
+ +var backends = config.backends;
+
+With these options, you can change the text that is displayed in the interface.
+ +These options can be locale dependent.
+ +Qualifiers are used when you have a category that uses a different qualifier name than the category. That is, if you have a category in the index that is named differently from its qualifiers. Eg. category :application, qualifiers: ['app']
. You'd then have to tell the Picky interface to map the category correctly to a qualifier.
qualifiers: {
+ en:{
+ application: 'app'
+ }
+},
+
+
+Remember that you only need this if you do funky stuff. Keep to the defaults and you'll be fine.
+ +Explanations are the small headings over allocations (grouped results). Picky just writes "with author soandso" – if you want a better explanation, use the explanations option:
+ +explanations: {
+ en:{
+ title: 'titled',
+ author: 'written by',
+ year: 'published in',
+ publisher: 'published by',
+ subjects: 'with subjects'
+ }
+}
+
+
+Picky would now write "written by soandso", making it much nicer to read.
+ +Choices describe the choices that are given to a user when Picky would like to know what the user was searching. This is done when Picky gets too many results in too many allocations, e.g. it is very unclear what the user was looking for.
+ +An example for choices would be:
+ +choices: {
+ en:{
+ 'title': {
+ format: "Called <strong>%1$s</strong>",
+ filter: function(text) { return text.toUpperCase(); },
+ ignoreSingle: true
+ },
+ 'author': 'Written by %1$s',
+ 'subjects': 'Being about %1$s',
+ 'publisher': 'Published by %1$s',
+ 'author,title': 'Called %1$s, written by %2$s',
+ 'title,author': 'Called %2$s, written by %1$s',
+ 'title,subjects': 'Called %1$s, about %2$s',
+ 'author,subjects': '%1$s who wrote about %2$s'
+ }
+},
+
+
+Was the user just looking for a title? (Displayed as eg. "ULYSSES – because of the filter and format) or was he looking for an author? (Displayed as "Written by Ulysses")
+ +Multicategory combinations are possible. If the user searches for Ulysses Joyce, then Picky will most likely as if this is a title and an author: "Called Ulysses, written by Joyce".
+ +This is a much nicer way to ask the user, don't you think?
+ +The last option just describes which categories should not show ellipses …
behind the text (eg. ) if the user searched for it in a partial way. Use this when the categories are not partially findable on the server.
nonPartial: ['year', 'id']
+
+
+When searching for "1977", this will result in the text being "written in 1977" instead of "written in 1977…", where the ellipses don't make much sense.
+ +The last option describes how to group the choices in a text. Play with this to see the effects (I know, am tired ;) ).
+ +groups: ['title', 'author'];
+
+There are quite a few selector options – you only need those if you heavily customise the interface. You tell Picky where to find the div containing the results or the search form etc.
+ +The selector that contains the search input and the result:
+ +config['enclosingSelector'] || '.picky';
+
+
+The selector that describes the form the input field is in:
+ +var formSelector = config['formSelector'] || (enclosingSelector + ' form');
+
+
+The formSelector
(short fs
) is used to find the input etc.:
config['input'] = $(config['inputSelector'] || (fs + ' input[type=search]'));
+config['reset'] = $(config['resetSelector'] || (fs + ' div.reset'));
+config['button'] = $(config['buttonSelector'] || (fs + ' input[type=button]'));
+config['counter'] = $(config['counterSelector'] || (fs + ' div.status'));
+
+
+The enclosingSelector
(short es
) is used to find the results
config['results'] = $(config['resultsSelector'] || (es + ' div.results'));
+config['noResults'] = $(config['noResultsSelector'] || (es + ' div.no_results'));
+config['moreSelector'] = config['moreSelector'] ||
+ es + ' div.results div.addination:last';
+
+
+The moreSelector refers to the clickable "more results" pagination/addination.
+ +The result allocations are selected on by these options:
+ +config['allocations'] = $(config['allocationsSelector'] ||
+ (es + ' .allocations'));
+config['shownAllocations'] = config['allocations'].find('.shown');
+config['showMoreAllocations'] = config['allocations'].find('.more');
+config['hiddenAllocations'] = config['allocations'].find('.hidden');
+config['maxSuggestions'] = config['maxSuggestions'] || 3;
+
+
+Results rendering is controlled by:
+ +config['results'] = $(config['resultsSelector'] ||
+ (enclosingSelector + ' div.results'));
+config['resultsDivider'] = config['resultsDivider'] || '';
+config['nonPartial'] = config['nonPartial'] || [];
+ // e.g. ['category1', 'category2']
+config['wrapResults'] = config['wrapResults'] || '<ol></ol>';
+
+
+The option wrapResults
refers to what the results are wrapped in, by default <ol></ol>
.
Thanks to whoever made the Sinatra README page for the inspiration.
+ ++Don't be deceived by the cute appearance! Picky is a tough little bastard. +
++
+Codewise, Picky is tested by around 900 unit and integration specs. Code size is extremely small (~4000 loc), making it easy to maintain. +
++Picky uses Ruby and in the case described used +Unicorn. +We bombarded three servers, running a Unicorn with 8 children each, using ab. +
++After filling the Unicorn request buffers, Picky predictably went down, but recuperating like fresh spring dew after the torrent was over. +
++In the mentioned project, Picky has been running without fail now for more than a year. It has been shown to be totally maintenance free. +
++Picky scales very well with your data. +
++We recommend setting up a shared /index directory where one Picky indexes, and the others reload the indexes after restarting. +
++Then, add the server to your proxy of choice, +nginx, +varnish +etc. +
++We've added two servers to the first one like this, without problems. We just needed to add the new server to the varnish front end round robin list. +
++Picky is a combinatorial search engine. Meaning: All the data needs to be available for combining. +
++Admittedly, regarding data, +if +you have exponential data growth, we do not recommend to use Picky. +
++If your data is slowly growing (linearly), we recommend to use Picky. +
++Most people think – since it is written in Ruby – it must be slow. +
++That attitude usually stops after seeing it in production or trying the example (in +getting started). +
++Lots of consideration went into writing performant Ruby code. For example, each request has a very small memory footprint. +
++Around 100 specs test for possible performance reductions. Critical parts have been rewritten in C, giving it the edge it needs. +
++From the ground up, Picky is designed to be very flexible. +
++Search requests can be routed to any combination of indexes. If you need customized tokenizers etc. you can easily implement them. +
++Also, it supports quite a few data sources, and new data sources can be easily added. +
++Picky was used in a telephone search engine, where it was possible to search for address, phone number, names, organization, and many other search features. +
++Around 150 Million data points. A data point is a phone number, or a name, or an address. +
++Imagine a table with 10 million records, with 15 varchar fields each. +
++Almost all data categories (exceptions: zipcode, partially: phone numbers) used a full partial search. Five categories were indexed also using a moderately configured (phonetic) similarity index. +
++Memory usage was a bit higher than expected, since phone numbers are unique, requiring space. Also, we used a lot of partial indexes, increasing the memory need greatly. The indexes in each server needed 10 GB. +
++Using more complicated than average queries, Picky on 3 virtual servers, each with 8 processes on 2 cores each (totalling 6 cores), peaked at 120 requests per second. +
++With relatively simple queries, Picky peaked at about 500-1000 requests per second (around 1.3 billion requests per month). +
++We noticed a high variability of answer times, probably due to the combinatorial nature of Picky. From 0.00001 seconds up to 2 seconds (in extremely hard and rare cases). Due to this fact, using Unicorn paid off very much. +
++We had a rather complicated join to combine the data for use in Picky. Also, a lot of the data needed to be cleaned and prepared. +
++Using a single server with 2 processors, we needed a bit more than an hour to prepare the indexes for Picky. +
++We were not too happy about the speed, but running the indexing process every night did at least not disturb normal operations. +
++Also, we got it almost twice as fast by Picky using all the available processors. +
++Picky is relatively memory hungry. This is in part due to its non-specific index (useable with any data). +
++Indexing performance could be better when using lots of partial indexes. It cannot (yet) be used if indexes need to be up to date instantly. +
++On the plus side, it is extremely fast and stable. +
++It offers a unique user interface that people were quick to learn and use. +
++Also, we noted that requests by our management were very easy to implement, even if the requests seemed to be very hard to build in at first sight. This is due to Picky's very modular nature. +
++(In the words of +Roger Braun +from +the original +and the +Japanese German Dictionary +this use case is about) +
++Picky is used to find dictionary entries in +WaDokuJT, +the largest Japanese-German dictionary. +
++There are around 250K entries in the WaDokuJT file, with around 5 main fields. +
++The file is only 60 MB in size, but the field for the German version of the entries has a lot of internal structure that would in other cases often be modeled with database relations. These relations have a lot of semantic information and have to be remodeled in the indexing step. +
++All categories are indexed with full partial search. Also, several virtual fields exist that are created at indexing time, like the romaji field, which is generated directly from the Japanese characters. +
++In the future it is planned to add more virtual fields like headwords, place names etc. Picky makes this very easy, as you can just write these virtual fields with standard Ruby code. +
++Queries are often just one word searches and not very complex. Picky can usually serve the request in under a millisecond. A "like %"-based SQLite search on the same data took around 2~3 seconds. +
++Indexing is very fast, too. The server is an Xserve3,1 with a Quad-Core Xeon with 2.26 Ghz. +
+edv@rokuhara:~/picky_speed_test$ time bundle exec rake index
Loaded picky with environment 'development' in /Users/[…]
peed_test on Ruby 1.9.2.
Application loaded.
[…]
real 8m39.234s
user 11m46.977s
sys 1m11.744s
++Using Picky is one of the things that makes wadoku.eu good and easy to use. Having just one search field instead of the usual "advanced search" is great and we expect it to be a great advantage. This still has to be tested by our users, though. +
++Having your search completely seperated from your database design is a huge relief and makes it easy to change and optimize both search and database functions separately. +
++Picky can also serve as a lightweight search API for third party services that want to use our data without any additional work on our part. +
++These are a few simple examples that will get you into Picky quickly. +
++
gem install picky
+
++
# Copy this into a Ruby file "objectsearch.rb", then:
# ruby objectsearch.rb
require 'picky'
# Create an index which is saved into './index' when you
# run index.dump(). Note that :id is implied - every input
# object must supply an :id!
#
index = Picky::Index.new :people do
category :age
category :name
end
# Define a data input class. Any object that responds to
# :id, :age, :name can be added to the index.
#
Person = Struct.new :id, :age, :name
# Add some data objects to the index.
# IDs can be any unique string or integer.
#
index.add Person.new(1, 34, 'Florian is the author of picky')
index.add Person.new(2, 77, 'Floris is related to Florian')
# Create a search interface object.
#
people = Picky::Search.new index
# Do a search and remember the results.
#
results = people.search 'floris'
# Show the results.
#
p results.ids # => [2]
+
++
gem install picky
+
++Copy the CSV data into ./people.csv: +
++
id,name,age
1,Florian Hanke,37
2,Kaspar Schiess,36
+
++Then run this code: +
++
# Copy this into a Ruby file "csvsearch.rb", then:
# ruby csvsearch.rb
require 'picky'
require 'csv'
require 'ostruct'
require 'fileutils'
# Prepare CSV data.
#
options = {
headers: true,
header_converters: ->(header) { header.to_sym }
}
csv = CSV.open('./people.csv', options)
.to_a
.map { |row| OpenStruct.new row.to_hash }
# Define an index.
#
data = Picky::Index.new :people do
source { csv }
category :age
category :name
end
# The index is saved into './index'.
#
data.index
# Create a search interface object.
#
people = Picky::Search.new data
# Do a search and remember the results.
#
results = people.search 'age:36'
# Show the results.
#
p results.ids # => ["2"]
+
++
gem install picky
+
++Copy the poem "The Raven" by E. A. Poe into ./story.txt. Here +is the ending (that's enough to run the example): +
++
And the raven, never flitting, still is sitting, still is sitting On the pallid bust of Pallas just above my chamber door; And his eyes have all the seeming of a demon's that is dreaming, And the lamp-light o'er him streaming throws his shadow on the floor; And my soul from out that shadow that lies floating on the floor Shall be lifted - nevermore!+ +
+Then run this code: +
++
# Copy this into a Ruby file "document_search.rb", then:
# ruby document_search.rb
require 'picky'
# Define an index.
#
data = Picky::Index.new :people do
# Only keep alpha/blank characters.
indexing removes_characters: /[^\p{Alpha}\p{Blank}]/i
# Only index full words.
category :text, partial: Picky::Partial::None.new
end
# Define a data input class. Any object that responds to
# :id, :age, :name can be added to the index.
#
Document = Struct.new :id, :text
# Add some data objects to the index.
# IDs can be any unique string or integer.
#
File.open 'story.txt' do |story|
data.add Document.new(1, story.read)
end
# Create a search interface object.
#
people = Picky::Search.new data
# Do two searches and remember the results.
#
found = people.search 'nevermore'
only_full_words = people.search 'nevermor'
not_found = people.search 'peter'
# Show the results.
#
p found.ids # => [1]
p only_full_words.ids # => []
p not_found.ids # => []
+
+
+Adding to an in-memory index using
+Index#add
+will not automatically
+store the index data on permanent storage.
+
+To write the index to disk, call
+Index#dump
+.
+
+To load the index from disk, call
+Index#load
+.
+
+
gem install picky
+
++Then run this code: +
++
# Copy this into a Ruby file "dump_load_search.rb", then:
# ruby dump_load_search.rb
require 'fileutils'
require 'picky'
# Make Picky be quiet.
#
Picky.logger = Picky::Loggers::Silent.new
# Create an index.
# Note that :id is implied - every input
# object must supply an :id!
#
index = Picky::Index.new :people do
category :age
category :name
end
# Define a data input class. Any object that responds to
# :id, :age, :name can be added to the index.
#
Person = Struct.new :id, :age, :name
# Add some data objects to the index.
# IDs can be any unique string or integer.
#
index.add Person.new(1, 34, 'Florian is the author of picky')
index.add Person.new(2, 77, 'Floris is related to Florian')
# Create a search interface object to search the index.
#
people = Picky::Search.new index
# The index data is saved into './index' when you
# run index.dump().
# But you still find results.
#
index.dump
p people.search('flori').ids # => [2, 1]
# Clearing the index will empty it in memory.
#
index.clear
p people.search('flori').ids # => []
# Loading the index will fill the index again.
#
index.load
p people.search('flori').ids # => [2, 1]
+
++
gem install picky
+
++Then run this code: +
++
# Copy this into a Ruby file "inspection.rb", then:
# ruby inspection.rb
require 'picky'
# Create an index which is saved into './index' when you
# run index.dump(). Note that :id is implied - every input
# object must supply an :id!
#
index = Picky::Index.new :people do
category :age
category :name
end
# Define a data input class. Any object that responds to
# :id, :age, :name can be added to the index.
#
Person = Struct.new :id, :age, :name
# Add some data objects to the index.
# IDs can be any unique string or integer.
#
index.add Person.new(1, 34, 'Florian is the author of picky')
index.add Person.new(2, 77, 'Floris is related to Florian')
# Look at pieces of the index.
#
puts index[:name].exact.inverted
puts index[:name].exact.weights
puts index[:name].partial.inverted
puts index[:name].partial.weights
+
++Picky is a standalone search server currently offering a HTTP interface (returning JSON) and a nice in-code Ruby configuration (no huge XML files). +Also: +
splits_text_on: /[\s\/\-\"\&]/
+pick*
+pecky~
+napoleon, title:war
+{ [:title, :author] => +3, [:isbn, :author] => -5 }
+get %r{/books} do
books.search params[:query]
end
++… and much more. +
+ +gem install picky
++There's a whole section devoted to getting started with Picky! +See +here +for getting on your feet quickly :) +
++Offers a Ruby client that connects to the Server's JSON interface and provides a clever (in a helpful way) and easily configurable Javascript interface. +
++Including the interface in your views is as easy as +
= Picky::Client.interface
+
+
+and adding the
+picky.min.js
+file.
+
+This is it: + +
++This is how you search for books with title similar to lyterature that were published in 2002. +
++The gem also contains useful methods to render results into the JSON that is sent to the Javascript interface, extracting just the ids, and much more. +
+gem install picky-client
++There's a whole section devoted to getting started with the Picky client! +See +here +and scroll down to the client part on the left. +
++Clam is a simple log parser that starts a in-gem webapp and shows you relevant statistics about your app. +
++This is part of its interface: + +
++It shows you +
+and other things. +
+gem install picky-statistics
+Then run:
+picky stats path/to/pickys/log/search.log
+
+If you are in the Picky server directory, the path is probably
+log/search.log
+.
+
You should see a message like:
+Clam, Picky's friend, is looking at Picky's logfile.
... and showing results on port 4567.
++Go to +localhost:4567 +and you should see the statistics interface in its full glory! +
+
+More params are available, just enter
+picky
+in the console.
+
+Suckerfish is an introspective system that allows you to look at parameters of a running Picky Server. Not only that, it allows you to change parameters on the fly. +
++This is part of its interface: + +
++Currently, you have the possibility to modify how the server handles query input. +
++Hit "Update Server now" and see the effects instantly on all queries. +
++Note: Only works with multiprocessing servers, like Unicorn. +
+gem install picky-live
+
+Add this to
+app/application.rb
+in the server:
+
%r{/admin} => LiveParameters.new
+Start the server and enter:
+picky live
+You should see a message like:
+Suckerfish has sucked onto Picky at localhost:8080/admin.
Sinatra has taken the stage on port 4568...
++Go to +localhost:4568 +and fiddle with the parameters (server needs to be running). +
+
+More params are available, just enter
+picky
+in the console.
+
+Need a search engine in a Ruby script and got 3 minutes? +
gem install picky
+
++and copy paste +
#!/usr/bin/env ruby
require 'picky'
# Create an index which is saved into './index' when you
# run index.dump(). Note that :id is implied - every input
# object must supply an :id!
#
index = Picky::Index.new :people do
category :age
category :name
end
# Define a data input class. Any object that responds to
# :id, :age, :name can be added to the index.
#
Person = Struct.new :id, :age, :name
# Add some data objects to the index.
# IDs can be any unique string or integer.
#
index.add Person.new(1, 34, 'Florian is the author of picky')
index.add Person.new(2, 77, 'Floris is related to Florian')
# Create a search interface object.
#
people = Picky::Search.new index
# Do a search and remember the results.
#
results = people.search 'floris'
# Show the results.
#
p results.ids # => [2]
+
++Have fun changing the code to suit your needs :) +
++Need a search engine in a Sinatra server & web frontend and got 10 minutes? +
+gem install picky-generators
++This will also install the needed gems "picky" and "picky-client". +
++Other +system requirements +if it doesn't run straightaway. +
+The server generates a library example, which you can run right away.
+# Generates a directory "app_name"
# with a new Picky default server project.
# Type "picky generate" to see other options.
picky generate server app_name
+cd app_name
bundle install
+rake index
+rake start
+curl localhost:8080/books?query=test
++Don't worry about the strange looking results! +The next part (client) will take care of them. +
++If you're interested anyway: +Results (Format & Structure) +
+The client generates an example app for the "library" example backend, using Sinatra.
+# Generates a directory "app_name"
# with a new Picky client Sinatra project.
picky generate client app_name
+cd app_name bundle install+
unicorn -p 3000
++Go to +http://localhost:3000/ +and try the examples. +
++You're probably itching to change the example for +your own data. How do you do this? +
+In the server directory, just type
+rake todo
+and it will tell you where to change the server configuration.
++Go to +http://localhost:3000/configure +and the page will show you how to configure your app server. +
++That's it, congratulations! :) +
++
+I recommend +chruby +for installing and managing Ruby versions. +
+The big picture:
+ ++That's the basic setup. The things to remember are: +
+Together they are like a small A-Team, something like "Action Search Squad Alpha"! «We've got the results and are heading back to base now, Sir!» Bam! +
++Note that you don't need a Picky client. You could just as well use the results in your Python/Java/PHP app server (If you happen to write a client for one of these, please let me know). +
++Right here. I'm happy to help! +If something doesn't work, send/gist me your app/application.rb +and I'll look into it. +
++github (floere), +twitter (hanke), +mail (gmail) +
++There's a Wiki as well: +Picky Wiki +
++If you don't have the time or leisure to do it yourself, watch this: +
+ ++Note that the video was made with version 1.0.0. +
++In the latest version, instead of +
+picky project <server dir>
picky-client sinatra <client dir>
++the following commands are now used: +
+picky generate unicorn_server <server dir>
picky generate sinatra_client <client dir>
++A bit more wordy, but hopefully clearer what it does. +
++We have released 4.0! Motto: "The +Sinatra +of search engines!" :) +
++Quite a few more features are planned. +
++See the +Issues List +to see what's – probably – going to happen next. +
++Server: Around 1300 specs, 100 of them complete integration specs. Many, many functional specs. +Other gems: Around 150 specs. +
++The server specs run in ~10 seconds. +
++Many people have worked on Picky. +
++See the +list of Contributors +to see who has helped Picky become what it is today. +
++If you want to contribute, see +the Wiki +under "Extending and contributing to Picky" to see how to add new data sources etc. +
++Or join us on IRC in #picky. +
++Picky has been mentioned in a few places. Check them out: +
+Using a real +telephone search +as an example. +
+ ++This was at the fantastic +EuRuKo 2010 +Conference in +beautiful +Krakow. +
++A challenge to think more about searching data and an introduction to Picky. +
+ ++This was at the Swiss special of the +Ruby User Group Berlin +Meetup in Berlin. +
++I'm looking for a Ruby conference in Europe (or near) during Spring 2011 to show Picky. +Send me a message if you're interested. +
++Doing the +Getting Started +part (see right column). +
+ + ++How to use Picky for a +geo search +part 1. +
+ + ++How to use Picky for a +geo search +part 2. +
+ + ++How to do command line searches with Picky. +
+ +Yet to come.
+