Clone this wiki locally
Indexes in Picky hold categorized data. You use Categories to define how Picky searches in the data.
An index needs:
For a valid index, you will need to define the index and one or more categories. An index without categories cannot be searched. The categories link the data – the what – in the source to how the data is searched.
To define a category on an index (and its data), use the
In code (using Ruby 1.9 new style hashes):
books_index = Index::Memory.new(:books) do source some_source # some_source can be a Sources::DB, Sources::CSV etc. category :title, partial: Partial::Substring.new(from: 1) category :author, similarity: Similarity::DoubleMetaphone.new(3), partial: Partial::Substring.new(from: 1) category :isbn, tokenizer: IsbnTokenizer.new end
This creates a new index – using the
index method –, which has some data source (not important for the example). That data source provides us with a title, an author, and an isbn for each entry.
For the title category, we tell Picky to find a title word even if it is only partially matched, so “Hob” will find “Hobbit”. Since it is partial from the first character, even a single “H” will find “hobbit”. Search for partial words using the asterisk *. The last word in a query is partially searched by default.
For the author category, we also want Picky to find partial matches, and also phonetically similar matches. So “Solschenyzin” will also find “Solschenizyn”. Search for similar words using the tilde ~.
The isbn category uses neither a similarity, nor a partial search – does not make sense on an ISBN –, but a special tokenizer which will define how ISBNs are indexed. If you’re starting out with Picky you won’t need that yet.
Options of category
category defines both how data is indexed and how data is searched. The first argument is the identifier of the category. This identifier is used in the front end, but also to categorize query text. For example, “title:hobbit” will narrow the hobbit query on categories with the identifier
Partial::Substring.new(from: starting_char, to: ending_char). Default is
Partial::Substring.new(from: -3, to: -1).
Similarity::Soundex.new(similar_words_searched). Default is
Weights::Dynamic.new. Default is
- key_format: How to format the ids/keys. If it is integers, like from a database, use
:to_i, or nothing, as
:to_iis the default. If it’s strings, from Redis or similar, use
:to_symif you prefer Symbols. Note that Symbols are not garbage collected, and will use up more permanent memory. However, this can improve speed.
- backend: The backend to use. Default is
Backends::Memory.new. Other options are:
- tokenizer: Give the category a specific tokenizer. Takes the same options as
- qualifiers: An array of qualifiers with which you can define which category you’d like to search, for example “title:hobbit” will search for hobbit in just title categories. Example:
qualifiers: [:t, :titre, :title](use it for example with multiple languages). Default is the name of the category.
- qualifier: Convenience options if you just need a single qualifier, see above. Example:
qualifiers => :title. Default is the name of the category.
- from: Take the data from the data category with this name. Example: You have a source
Sources::CSV.new(:title, file:'some_file.csv')but you want the category to be called differently. The you use from:
category(:similar_title, :from => :title).
- source: Use a different source than the index uses. If you think you need that, there might be a better solution to your problem. Please post to the mailing list first with your application.rb :)