Permalink
Browse files

+ moved the docs into the Picky main repo

  • Loading branch information...
1 parent 79d9a11 commit c763882d1d0517d32b95e0b9ee6e2bf2d3d8730f @floere committed Oct 27, 2012
View
2 web/build/stylesheets/specific.css
@@ -4,7 +4,7 @@
/* line 4, specific.css.sass */
.container_2 div.index {
width: 20em;
- float: right;
+/* float: left;*/
padding: 3em;
margin: 1em;
border: 2px solid #e0e0d0; }
View
0 web/source/documentation.html.textile → web/old/documentation.html.textile
File renamed without changes.
View
21 web/source/documentation.html.haml
@@ -0,0 +1,21 @@
+---
+title: Documentation
+---
+
+/ This file puts all sections together in a nice one page documentation.
+
+.container_2
+ .grid_1.index
+ = partial 'documentation/linkage'
+ .grid_1.help
+ /= partial 'documentation/help'
+ /= partial 'documentation/api'
+ = partial 'documentation/intro'
+ = partial 'documentation/generators'
+ = partial 'documentation/servers'
+ = partial 'documentation/tokenizing'
+ = partial 'documentation/indexes'
+ = partial 'documentation/search'
+ = partial 'documentation/results'
+ = partial 'documentation/facets'
+ = partial 'documentation/thanks'
View
7 web/source/documentation/_api.html.md
@@ -0,0 +1,7 @@
+## API Docs
+
+For documentation on how to configure Picky, see
+
+[Server API docs](doc/server/index.html) and [Client API docs](doc/client/index.html)
+
+and for a bit more info [the Wiki](http://github.com/floere/picky/wiki) in the [repository](http://github.com/floere/picky).
View
169 web/source/documentation/_facets.html.md
@@ -0,0 +1,169 @@
+## Facets
+
+Here's [the Wikipedia entry on facets](http://en.wikipedia.org/wiki/Faceted_classification). I fell asleep after about 5 words. Twice.
+
+In Picky, categories are explicit slices over your index data. Picky facets are implicit slices over your category data.
+
+What does "implicit" mean here?
+
+It means that you didn't explicitly say, "My data is shoes, and I have these four brands: Nike, Adidas, Puma, and Vibram".
+
+No, instead you told Picky that your data is shoes, and there is a category "brand". Let's make this simple:
+
+ index = Picky::Index.new :shoes do
+ category :brand
+ category :name
+ category :type
+ end
+
+ index.add Shoe.new(1, 'nike', 'zoom', 'sports')
+ index.add Shoe.new(2, 'adidas', 'speed', 'sports')
+ index.add Shoe.new(3, 'nike', 'barefoot', 'casual')
+
+With this data in mind, let's look at the possibilities:
+
+### Index facets
+
+Index facets are very straightforward.
+
+You ask the index for facets and it will give you all the facets it has and how many:
+
+ index.facets :brand # => { 'nike' => 2, 'adidas' => 1 }
+
+The category type is a good candidate also:
+
+ index.facets :type # => { 'sports' => 2, 'casual' => 1 }
+
+What are the options?
+
+* `at_least`: `index.facets :brand, at_least: 2 # => { 'nike' => 2 }`
+* `counts`: `index.facets :brand, counts: false # => ['nike', 'adidas']`
+* both options: `index.facets :brand, at_least: 2, counts: false # => ['nike']`
+
+`at_least` only gives you facets which occur at least n times and `counts` tells the facets method whether you want the counts with the facets or not.
+
+Pretty straightforward, right?
+
+Search facets are quite similar:
+
+### Search facets
+
+Search facets work the exact same way as index facets and you can use them in the same way:
+
+ search_interface.facets :brand # => { 'nike' => 2, 'adidas' => 1 }
+ search_interface.facets :type # => { 'sports' => 2, 'casual' => 1 }
+ search_interface.facets :brand, at_least: 2 # => { 'nike' => 2 }
+ search_interface.facets :brand, counts: false # => ['nike', 'adidas']
+ search_interface.facets :brand, at_least: 2, counts: false # => ['nike']
+
+However, you can also filter the facets with a filter query option.
+
+ shoes.facets :brand, filter: 'some filter query'
+
+What does that mean?
+
+Usually you want to use multiple facets in your interface.
+For example, a customer might already have filtered by type "sports" because they are only interested in sports shoes.
+Now you'd like to show them the remaining brands, so that they can filter on the remaining facets.
+
+How do you do this?
+
+Let's say we have an index as above, and a search interface to the index:
+
+ shoes = Picky::Search.new index
+
+Now, if the customer has already filtered for sports, you simply add the `filter` option:
+
+ shoes.facets :brand, filter: 'type:sports' # => { 'nike' => 1, 'adidas' => 1 }
+
+This will give you only 1 "nike" facet. If the customer filtered for "casual":
+
+ shoes.facets :brand, filter: 'type:casual' # => { 'nike' => 1 }
+
+then we'd only get the casual nike facet (from that one "barefoot" shoe).
+
+If the customer has filtered for brand "nike" and type "sports", you'd get:
+
+ shoes.facets :brand, filter: 'brand:nike type:sports' # => { 'nike' => 1 }
+ shoes.facets :name, filter: 'brand:nike type:sports' # => { 'zoom' => 1 }
+
+Playing with it is fun :)
+
+See below for testing and performance tips.
+
+### Testing How To
+
+Let's say we have an index with some data:
+
+ index = Picky::Index.new :people do
+ category :name
+ category :surname
+ end
+
+ person = Struct.new :id, :name, :surname
+ index.add person.new(1, 'tom', 'hanke')
+ index.add person.new(2, 'kaspar', 'schiess')
+ index.add person.new(3, 'florian', 'hanke')
+
+This is how you test facets:
+
+#### Index Facets
+
+ # We should find two surname facets.
+ #
+ index.facets(:surname).should == {
+ 'hanke' => 2, # hanke occurs twice
+ 'schiess' => 1 # schiess occurs once
+ }
+
+ # Only one occurs at least twice.
+ #
+ index.facets(:surname, at_least: 2).should == {
+ 'hanke' => 2
+ }
+
+#### Search Facets
+
+ # Passing in no filter query just returns the facets
+ #
+ finder.facets(:surname).should == {
+ 'hanke' => 2,
+ 'schiess' => 1
+ }
+
+ # A filter query narrows the facets down.
+ #
+ finder.facets(:name, filter: 'surname:hanke').should == {
+ 'tom' => 1,
+ 'florian' => 1
+ }
+
+ # It allows explicit partial matches.
+ #
+ finder.facets(:name, filter: 'surname:hank*').should == {
+ 'fritz' => 1,
+ 'florian' => 1
+ }
+
+### Performance
+
+Two rules:
+
+1. Index facets are faster than filtered search facets. If you don't filter though, search facets are as fast as index facets.
+1. Only use facets on data which are a good fit for facets – where there aren't many facets to the data.
+
+A good example for a good fit would be brands of shoes.
+There aren't many different brands (usually less than 100).
+
+So this facet query
+
+ finder.facets(:brand, filter: 'type:sports')
+
+does not return thousands of facets.
+
+Should you find yourself in a position where you have to use a facet query on uncontrolled data, eg. user entered data, you might want to cache the results:
+
+ category = :name
+ filter = 'age_bracket:40'
+
+ some_cache[[category, filter]] ||= finder.facets(category, filter: filter)
View
49 web/source/documentation/_generators.html.md
@@ -0,0 +1,49 @@
+## Generators{#generators}
+
+Picky offers a few generators to have a running server and client up in 5 minutes. Please follow the [Getting Started](getting_started.html).
+
+Or, run gem install
+
+ gem install picky-generators
+
+and simply enter
+
+ picky generate
+
+This will raise an `Picky::Generators::NotFoundException` and show you the possibilities.
+
+The "All In One" Client/Server is interested for Heroku projects, as it is a bit complicated to set up two servers that interact with each other.
+
+### Servers{#generators-servers}
+
+Currently, Picky offers two generated example projects that you can adapt to your project: *Separate Client and Server* (suggested) and *All In One*.
+
+If this is your first time with Picky, we suggest to start out with these even if you have a project where you want to integrate Picky already.
+
+#### Sinatra{#generators-servers-sinatra}
+
+This server is generated with
+
+ picky generate server target_directory
+
+and generates a full sinatra server that you can try immediately. Just follow the instructions.
+
+#### All In One{#generators-servers-allinone}
+
+All In One is actually a single Sinatra server containing the Server AND the client. This server is generated with
+
+ picky generate all_in_one target_directory
+
+and generates a full Sinatra Picky server and client that you can try immediately. Just follow the instructions.
+
+### Clients{#generators-clients}
+
+Picky currently offers an example Sinatra client that you can adapt for your project (or look at it how to use in Rails).
+
+#### Sinatra{#generators-clients-sinatra}
+
+This client is generated with
+
+ picky generate sinatra_client target_directory
+
+and generates a full Sinatra client (including Javascript etc.) that you can try immediately. Just follow the instructions.
View
5 web/source/documentation/_help.html.md
@@ -0,0 +1,5 @@
+### Help?
+
+If neither the docs, nor the Wiki, nor the single page help has helped, feel free to contact us [through the methods described on the about page](index.html).
+
+Best of success!
View
522 web/source/documentation/_indexes.html.md
@@ -0,0 +1,522 @@
+## Indexes{#indexes}
+
+Indexes do three things:
+
+* Define where the data comes from.
+* Define how data is handled before it enters the index.
+* Hold index categories.
+
+### Types{#indexes-types}
+
+Picky offers a choice of four index types:
+
+* Memory: Saves its indexes in JSON on disk and loads them into memory.
+* Redis: Saves its indexes in Redis.
+* SQLite: Saves its indexes in rows of a SQLite DB.
+* File: Saves its indexes in JSON in files.
+
+This is how they look in code:
+
+ books_memory_index = Index.new :books do
+ # Configuration goes here.
+ end
+
+ books_redis_index = Index.new :books do
+ backend Backends::Redis.new
+ # Configuration goes here.
+ end
+
+Both save the preprocessed data from the data source in the `/index` directory so you can go look if the data is preprocessed correctly.
+
+Indexes are then used in a `Search` interface.
+
+Searching over one index:
+
+ books = Search.new books_index
+
+Searching over multiple indexes:
+
+ media = Search.new books_index, dvd_index, mp3_index
+
+The resulting ids should be from the same id space to be useful – or the ids should be exclusive, such that eg. a book id does not collide with a dvd id.
+
+#### In-Memory / File-based{#indexes-types-memory}
+
+The in-memory index saves its indexes as files transparently in the form of JSON files that reside in the `/index` directory.
+
+When the server is started, they are loaded into memory. As soon as the server is stopped, the indexes are not in memory again.
+
+Indexing regenerates the JSON index files and can be reloaded into memory, even in the running server (see below).
+
+#### Redis{#indexes-types-redis}
+
+The Redis index saves its indexes in the Redis server on the default port, using database 15.
+
+When the server is started, it connects to the Redis server and uses the indexes in the key-value store.
+
+Indexing regenerates the indexes in the Redis server – you do not have to restart the server for that.
+
+#### SQLite{#indexes-types-sqlite}
+
+TODO
+
+#### File{#indexes-types-file}
+
+TODO
+
+### Accessing{#indexes-acessing}
+
+If you don't have access to your indexes directly, like so
+
+ books_index = Index.new(:books) do
+ # ...
+ end
+
+ books_index.do_something_with_the_index
+
+and for example you'd like to access the index from a rake task, you can use
+
+ Picky::Indexes
+
+to get *all indexes*.
+
+To get a *single index* use
+
+ Picky::Indexes[:index_name]
+
+and to get a *single category*, use
+
+ Picky::Indexes[:index_name][:category_name]
+
+That's it.
+
+### Configuration{#indexes-configuration}
+
+This is all you can do to configure an index:
+
+ books_index = Index.new :books do
+ source { Book.order("isbn ASC") }
+
+ indexing removes_characters: /[^a-zA-Z0-9\s\:\"\&\.\|]/i, # Default: nil
+ stopwords: /\b(and|the|or|on|of|in)\b/i, # Default: nil
+ splits_text_on: /[\s\/\-\_\:\"\&\/]/, # Default: /\s/
+ removes_characters_after_splitting: /[\.]/, # Default: nil
+ normalizes_words: [[/\$(\w+)/i, '\1 dollars']], # Default: nil
+ rejects_token_if: lambda { |token| token == :blurf }, # Default: nil
+ case_sensitive: true, # Default: false
+ substitutes_characters_with: Picky::CharacterSubstituters::WestEuropean.new, # Default: nil
+ stems_with: Lingua::Stemmer.new # Default: nil
+
+ category :id
+ category :title,
+ partial: Partial::Substring.new(:from => 1),
+ similarity: Similarity::DoubleMetaphone.new(2),
+ qualifiers: [:t, :title, :titre]
+ category :author,
+ partial: Partial::Substring.new(:from => -2)
+ category :year,
+ partial: Partial::None.new
+ qualifiers: [:y, :year, :annee]
+
+ result_identifier 'boooookies'
+ end
+
+Usually you don't need to configure all that.
+
+But if your boss comes in the door and asks why X is not found… you know. And you can improve the search engine relatively *quickly and painless*.
+
+More power to you.
+
+### Data Sources{#indexes-sources}
+
+Data sources define where the data for an index comes from.
+
+You define them on an *index*:
+
+ Index.new :books do
+ source Book.all # Loads the data instantly.
+ end
+
+ Index.new :books do
+ source { Book.all } # Loads on indexing. Preferred.
+ end
+
+Or even a *single category*:
+
+ Index.new :books do
+ category :title,
+ source: lambda { Book.all }
+ end
+
+At the moment there are two possibilities: [Objects responding to #each](#indexes-sources-each) and [Picky classic style sources](#indexes-sources-classic).
+
+#### Responding to #each{#indexes-sources-each}
+
+Picky supports any data source as long as it supports `#each`.
+
+See [under Flexible Sources](http://florianhanke.com/blog/2011/04/14/picky-two-point-two-point-oh.html) how you can use this.
+
+In short. Model:
+
+ class Monkey
+ attr_reader :id, :name, :color
+ def initialize id, name, color
+ @id, @name, @color = id, name, color
+ end
+ end
+
+The data:
+
+ monkeys = [
+ Monkey.new(1, 'pete', 'red'),
+ Monkey.new(2, 'joey', 'green'),
+ Monkey.new(3, 'hans', 'blue')
+ ]
+
+Setting the array as a source
+
+ Index::Memory.new :monkeys do
+ source { monkeys }
+ category :name
+ category :couleur, :from => :color # The couleur category will take its data from the #color method.
+ end
+
+#### Delayed{#indexes-sources-delayed}
+
+If you define the source directly in the index block, it will be evaluated instantly:
+
+ Index::Memory.new :books do
+ source Book.order('title ASC')
+ end
+
+This works with ActiveRecord and other similar ORMs since @Book.order@ returns a proxy object that will only be evaluated when the server is indexing.
+
+For example, this would instantly get the records, since `#all` is a kicker method:
+
+ Index::Memory.new :books do
+ source Book.all # Not the best idea.
+ end
+
+In this case, you can give the `source` method a block:
+
+ Index::Memory.new :books do
+ source { Book.all }
+ end
+
+This block will be executed as soon as the indexing is running, but not earlier.
+
+#### Classic Style{#indexes-sources-classic}
+
+The classic style uses Picky's own `Picky::Sources` to load the data into the index.
+
+ Index.new :books do
+ source Sources::CSV.new(:title, :author, file: 'app/library.csv')
+ end
+
+Use this one if you want to use a simple CSV file.
+
+However, you could also use the built-in Ruby `CSV` class and use it as an `#each` source (see above).
+
+ Index.new :books do
+ source Sources::DB.new('SELECT id, title, author, isbn13 as isbn FROM books', file: 'app/db.yml')
+ end
+
+Use this one if you want to use a database source with very custom SQL statements. If not, we suggest you use an ORM as an `#each` source (see above).
+
+### Indexing / Tokenizing{#indexes-indexing}
+
+See [Tokenizing](#tokenizing) for tokenizer options.
+
+### Categories{#indexes-categories}
+
+Categories – usually what other search engines call fields – define *categorized data*. For example, book data might have a `title`, an `author` and an `isbn`.
+
+So you define that:
+
+ Index.new :books do
+ source { Book.order('author DESC') }
+
+ category :title
+ category :author
+ category :isbn
+ end
+
+(The example assumes that a `Book` has readers for `title`, `author`, and `isbn`)
+
+This already works and a search will return categorized results. For example, a search for "Alan Tur" might categorize both words as `author`, but it might also at the same time categorize both as `title`. Or one as `title` and the other as `author`.
+
+That's a great starting point. So how can I customize the categories?
+
+#### Option partial{#indexes-categories-partial}
+
+The partial option defines if a word is also found when it is only *partially entered*. So, "Picky" might be already found when typing "Pic".
+
+You define this by this:
+
+ category :some, partial: Partial::Substring.new(from: -3)
+
+(This is also the default)
+The option `from: 1` will make a word completely partially findable.
+
+If you don't want any partial finds to occur, use:
+
+ category :some, partial: Partial::None.new
+
+You can also pass in your own partial generators. See [this article](http://florianhanke.com/blog/2011/08/15/picky-30-its-all-ruby-part-1.html) to learn more.
+
+#### Option weights{#indexes-categories-weights}
+
+The weights option defines how strongly a word is weighed. By default, Picky rates a word according to the logarithm of its occurrence. This means that a word that occurs more often will be slightly higher weighed.
+
+You define this by this:
+
+ category :some, weights: MyWeights.new
+
+The default is `Weights::Logarithmic.new`.
+
+You can also pass in your own weights generators. See [this article](http://florianhanke.com/blog/2011/08/15/picky-30-its-all-ruby-part-1.html) to learn more.
+
+If you don't want Picky to calculate weights for your indexed entries, you can use constant or dynamic weights.
+
+With 0.0 as default weight:
+
+ category :some, weights: Weights::Constant.new # Returns 0.0 for all results.
+
+With 3.14 as set weight:
+
+ category :some, weights: Weights::Constant.new(3.14) # Returns 3.14 for all results.
+
+Or with a dynamically calculated weight:
+
+ Weights::Dynamic.new do |str_or_sym|
+ sym_or_str.length # Uses the length of the symbol as weight.
+ end
+
+You almost never need to use your specific weights. More often than not, you can fiddle with boosting combinations of categories, via the `boost` method in searches.
+
+#### Option similarity{#indexes-categories-similarity}
+
+The similarity option defines if a word is also found when it is typed wrong, or _close_ to another word. So, "Picky" might be already found when typing "Pocky~". (Picky will search for similar word when you use the tilde, ~)
+
+You define this by this:
+
+ category :some, similarity: Similarity::None.new
+
+(This is also the default)
+
+There are several built-in similarity options, like
+
+ category :some, similarity: Similarity::Soundex.new
+ category :this, similarity: Similarity::Metaphone.new
+ category :that, similarity: Similarity::DoubleMetaphone.new
+
+You can also pass in your own similarity generators. See [this article](http://florianhanke.com/blog/2011/08/15/picky-30-its-all-ruby-part-1.html) to learn more.
+
+#### Option qualifier/qualifiers (categorizing){#indexes-categories-qualifiers}
+
+Usually, when you search for `title:wizard` you will only find books with "wizard" in their title.
+
+Maybe your client would like to be able to only enter "t:wizard". In that case you would use this option:
+
+ category :some,
+ :qualifier => :t
+
+Or if you'd like more to match:
+
+ category :some,
+ qualifiers: [:t, :title, :titulo]
+
+(This matches "t", "title", and also the italian "titulo")
+
+Picky will warn you if on one index the qualifiers are ambiguous (Picky will assume that the last "t" for example is the one you want to use).
+
+This means that:
+
+ category :some, :qualifier => :t
+ category :other, :qualifier => :t
+
+Picky will assume that if you enter "t:bla", you want to search in the :other category.
+
+Searching in multiple categories can also be done. If you have:
+
+ category :some, :qualifier => :s
+ category :other, :qualifier => :o
+
+Then searching with "s,o:bla" will search for bla in both @:some@ and @:other@. Neat, eh?
+
+#### Option from{#indexes-categories-from}
+
+Usually, the categories will take their data from the reader or field that is the same as their name.
+
+Sometimes though, the model has not the right names. Say, you have an italian book model, `Libro`. But you still want to use english category names.
+
+ Index.new :books do
+ source { Libro.order('autore DESC') }
+
+ category :title, :from => :titulo
+ category :author, :from => :autore
+ category :isbn
+ end
+
+#### Option key_format{#indexes-categories-keyformat}
+
+You almost never use this, as the key format will usually be the same for all categories, which is when you would define it on the index, [like so](#indexes-keyformat).
+
+But if you need to, use as with the index.
+
+ Index.new :books do
+ category :title,
+ :key_format => :to_sym
+ end
+
+#### Option source{#indexes-categories-source}
+
+You almost never use this, as the source will usually be the same for all categories, which is when you would define it on the index, "like so":#indexes-sources.
+
+But if you need to, use as with the index.
+
+ Index.new :books do
+ category :title,
+ source: some_source
+ end
+
+#### Searching{#indexes-categories-searching}
+
+Users can use some special features when searching. They are:
+
+* Partial: `something*` (By default, the last word is implicitly partial)
+* Non-Partial: `"something"` (The quotes make the query on this word explicitly non-partial)
+* Similarity: `something~` (The tilde makes this word eligible for similarity search)
+* Categorized: `title:something` (Picky will only search in the category designated as title, in each index of the search)
+* Multi-categorized: `title,author:something` (Picky will search in title _and_ author categories, in each index of the search)
+
+These options can be combined (e.g. `title,author:"funky~"`): This will try to find similar words to funky (like "fonky"), but no partials of them (like "fonk"), in both title and author.
+
+Non-partial will win over partial, if you use both, as in `"test*"`.
+
+Also note that these options need to make it through the [tokenizing](#tokenizing), so don't remove any of `*":,`.
+
+### Key Format (Format of the indexed Ids){#indexes-keyformat}
+
+By default, the indexed data points to keys that are integers, or differently said, are formatted using `to_i`.
+
+If you are indexing keys that are strings, use `to_sym` – a good example are MongoDB BSON keys, or UUID keys.
+
+The `key_format` method lets you define the format:
+
+ Index.new :books do
+ key_format :to_sym
+ end
+
+The `Picky::Sources` already set this correctly. However, if you use an `#each` source that supplies Picky with symbol ids, you should tell it what format the keys are in, eg. `key_format :to_sym`.
+
+### Identifying in Results{#indexes-results}
+
+By default, an index is identified by its *name* in the results. This index is identified by `:books`:
+
+ Index.new :books do
+ # ...
+ end
+
+This index is identified by `:media` in the results:
+
+ Index.new :books do
+ # ...
+ result_identifier :media
+ end
+
+You still refer to it as `:books` in e.g. Rake tasks, `Picky::Indexes[:books].reload`. It's just for the results.
+
+### Indexing{#indexes-indexing}
+
+Indexing can be done programmatically, at any time. Even while the server is running.
+
+Indexing *all indexes* is done with
+
+ Picky::Indexes.index
+
+Indexing a *single index* can be done either with
+
+ Picky::Indexes[:index_name].index
+
+or
+
+ index_instance.index
+
+Indexing a *single category* of an index can be done either with
+
+ Picky::Indexes[:index_name][:category_name].index
+
+or
+
+ category_instance.index
+
+### Loading{#indexes-reloading}
+
+Loading (or reloading) your indexes in a running application is possible.
+
+Loading *all indexes* is done with
+
+ Picky::Indexes.load
+
+Loading a *single index* can be done either with
+
+ Picky::Indexes[:index_name].load
+
+or
+
+ index_instance.load
+
+Loading a *single category* of an index can be done either with
+
+ Picky::Indexes[:index_name][:category_name].load
+
+or
+
+ category_instance.load
+
+#### Using signals{#indexes-reloading-signals}
+
+To communicate with your server using signals:
+
+ books_index = Index.new(:books) do
+ # ...
+ end
+
+ Signal.trap("USR1") do
+ books_index.reindex
+ end
+
+This reindexes the books_index when you call
+
+ kill -USR1 <server_process_id>
+
+You can refer to the index like so if want to define the trap somewhere else:
+
+ Signal.trap("USR1") do
+ Picky::Indexes[:books].reindex
+ end
+
+### Reindexing{#indexes-reindexing}
+
+Reindexing your indexes is just indexing followed by reloading (see above).
+
+Reindexing *all indexes* is done with
+
+ Picky::Indexes.reindex
+
+Reindexing a *single index* can be done either with
+
+ Picky::Indexes[:index_name].reindex
+
+or
+
+ index_instance.reindex
+
+Reindexing a *single category* of an index can be done either with
+
+ Picky::Indexes[:index_name][:category_name].reindex
+
+or
+
+ category_instance.reindex
View
34 web/source/documentation/_intro.html.md
@@ -0,0 +1,34 @@
+## All Ruby{#allruby}
+
+Never forget this: *Picky is all Ruby, all the time*!
+
+Even though we only describe examples of classic and Sinatra style servers, Picky can be included directly in Rails, as a client or server. Or in DRb. Or in your simple script without HTTP. Anywhere you like, as long as it's Ruby, really.
+
+To drive the point home, remember that Picky is mainly two pieces working together: An index, and a search interface on indexes.
+
+The index normally has a source, knows how to tokenize data, and has a few data categories. And the search interface normally knows how to tokenize incoming queries. That's it:
+
+ index = Picky::Index.new :people do
+ source { People.all }
+ indexing splits_text_on: /[\s,-]/
+ category :first
+ category :last
+ category :age, partial: Picky::Partial::None.new
+ end
+
+ people = Picky::Search.new index do
+ searching splits_text_on: /[\s,-]/
+ end
+ results = people.search 'joe'
+ puts results
+
+You can put these pieces anywhere, independently.
+
+## Transparency{#transparency}
+
+Picky tries its best to be *transparent* so you can go have a look if something goes wrong. It wants you to *never feel powerless*.
+
+All the indexes can be viewed in the `/index` directory of the project. They are waiting for you to inspect their JSONy goodness.
+Should anything not work with your search, you can see how it is indexed in the actual indexes and change your indexing parameters accordingly.
+
+Since all is Ruby, you can log as much data as you want to help you improve your search application until it's working perfectly.
View
89 web/source/documentation/_linkage.html.md
@@ -0,0 +1,89 @@
+## Single Page Help Index
+
+This is the one page help document for Picky. Simply search for anything in this page.
+
+Edit typos directly in the [github page](#todo) using the edit button.
+
+### Getting started
+
+It's [All Ruby](#allruby). [Transparency](#transparency) matters.
+
+#### Generating an app
+
+[Generators](#generators)
+ * [Servers](#generators-servers)
+ * [Sinatra](#generators-servers-sinatra)
+ * [All In One](#generators-servers-allinone)
+ * [Clients](#generators-clients)
+ * [Sinatra](#generators-clients-sinatra)
+* [Servers / Applications](#servers)
+ * [Sinatra Style](#servers-sinatra)
+ * [Routing](#servers-sinatra-routing)
+ * [Logging](#servers-sinatra-logging)
+ * [All In One (Client + Server)](#servers-allinone)
+
+#### Integration in Rails/Sinatra etc.
+
+[Rails](#rails)
+[Sinatra](#sinatra)
+[DRb](#drb)
+[Ruby Script](#rubyscript)
+
+### Tokenizing
+
+[Tokenizing](#tokenizing)
+[Options](#tokenizing-options)
+[Tokenizer](#tokenizing-tokenizer)
+[Examples](#tokenizing-examples)
+[Notes](tokenizing-notes)
+
+### Indexes
+
+[Indexes](#indexes)
+
+There are four different [types](#indexes-types):
+[Memory](#indexes-types-memory),
+[Redis](#indexes-types-redis),
+[SQLite](#indexes-types-sqlite), and
+[File](#indexes-types-file)
+
+[Accessing](#indexes-acessing)
+ * [Configuration](#indexes-configuration)
+ * [Data Sources](#indexes-sources)
+ * [Responding to #each](#indexes-sources-each)
+ * [Delayed](#indexes-sources-delayed)
+ * [Classic Style](#indexes-sources-classic)
+ * [Indexing / Tokenizing](#indexes-indexing)
+ * [Categories](#indexes-categories)
+ * [Option partial](#indexes-categories-partial)
+ * [Option weights](#indexes-categories-weights)
+ * [Option similarity](#indexes-categories-similarity)
+ * [Option qualifier / qualifiers (categorizing)](#indexes-categories-qualifiers)
+ * [Option from](#indexes-categories-from)
+ * [Option key_format](#indexes-categories-keyformat)
+ * [Option source](#indexes-categories-source)
+ * [Searching](#indexes-categories-searching)
+ * [Key Format (Format of the indexed Ids)](#indexes-keyformat)
+ * [Identifying in Results](#indexes-results)
+ * [Indexing](#indexes-indexing)
+ * [Reloading](#indexes-reloading)
+ * [Using signals](#indexes-reloading-signals)
+ * [Reindex](#indexes-reindexing)
+
+### Searching
+
+* [Search](#search)
+ * [Options](#search-options)
+ * [Searching / Tokenizing](#search-options-searching)
+ * [Boost](#search-options-boost)
+ * [Ignore Categories](#search-options-ignore)
+ * [Ignore Unassigned Tokens](#search-options-unassigned)
+ * [Maximum Allocations](#search-options-maxallocations)
+ * [Early Termination](#search-options-terminateearly)
+
+#### Results
+
+
+[Results](#results)
+ * [Logging](#results-logging)
+ * [Sorting](#results-sorting)
View
58 web/source/documentation/_results.html.md
@@ -0,0 +1,58 @@
+## Results{#results}
+
+Results are returned by the `Search` instance.
+
+ books = Search.new books_index do
+ searching splits_text_on: /[\s,]/
+ boost [:title, :author] => +2
+ end
+
+ results = books.search "test"
+
+ p results # Returns results in log form.
+ p results.to_hash # Returns results as a hash.
+ p results.to_json # Returns results as JSON.
+
+### Logging{#results-logging}
+
+TODO Update with latest logging style and ideas on how to separately log searches.
+
+Picky results can be logged wherever you want.
+
+A Picky Sinatra server logs whatever to wherever you want:
+
+ MyLogger = Logger.new "log/search.log"
+
+ # ...
+
+ get '/books' do
+ results = books.search "test"
+ MyLogger.info results
+ results.to_json
+ end
+
+or set it up in separate files for different environments:
+
+ require "logging/#{PICKY_ENVIRONMENT}"
+
+A Picky classic server logs to the logger defined with the `Picky.logger=` writer.
+
+Set it up in a separate `logging.rb` file (or directly in the `app/application.rb` file).
+
+ Picky.logger = Picky::Loggers::Concise.new STDOUT
+
+and the Picky classic server will log the results into it, if it is defined.
+
+Why in a separate file? So that you can have different logging for different environments.
+
+More power to you.
+
+### Sorting{#results-sorting}
+
+Picky results are always *sorted in the order of the data provided* by the data source.
+
+So if you need different sort orders you have to define two indexes.
+
+Why? This was a conscious design decision on my part. Usually, we do not need multiple sortings in a search application (I reckon around 95% of the cases). However, if you need it, you can.
+
+TODO Example that shows how to have different result sorting depending on the category a result is found.
View
148 web/source/documentation/_search.html.md
@@ -0,0 +1,148 @@
+## Search{#search}
+
+Picky offers a `Search` interface for the indexes. You instantiate it as follows.
+
+Just searching over one index:
+
+ books = Search.new books_index # searching over one index
+
+Searching over multiple indexes:
+
+ media = Search.new books_index, dvd_index, mp3_index
+
+Such an instance can then search over all its indexes and returns a `Picky::Results` object:
+
+ results = media.search "query", # the query text
+ 20, # number of ids
+ 0 # offset (for pagination)
+
+Please see the part about [Results](#results) to know more about that.
+
+### Options{#search-options}
+
+You use a block to set search options:
+
+ media = Search.new books_index, dvd_index, mp3_index do
+ searching tokenizer_options_or_tokenizer
+ boost [:title, :author] => +2,
+ [:author, :title] => -1
+ end
+
+#### Searching / Tokenizing{#search-options-searching}
+
+See [Tokenizing](#tokenizing) for tokenizer options.
+
+#### Boost{#search-options-boost}
+
+The `boost` option defines what combinations to boost.
+
+This is unlike boosting in most other search engines, where you can only boost a given field. I've found it much more useful to boost combinations.
+
+For example, you have an index of addresses. The usual case is that someone is looking for a street and a number. So if Picky encounters that combination (in that order), it should move these results to a more prominent spot.
+But if it thinks it's a street number, followed by a street, it is probably wrong, since usually you search for "Road 10", instead of "10 Road" (assuming this is the case where you come from).
+
+So let's boost `street, streetnumber`, while at the same time deboost `streetnumber, street`:
+
+ addresses = Picky::Search.new address_index do
+ boost [:street, :streetnumber] => +2,
+ [:streetnumber, :street] => -1
+ end
+
+If you still want to boost a single category, check out the [category weights option](#indexes-categories-weights).
+For example:
+
+ Picky::Index.new :addresses do
+ category :street, weights: Picky::Weights::Logarithmic.new(+4)
+ category :streetnumber
+ end
+
+This boosts the weight of the street category alone.
+
+#### Ignore Categories{#search-options-ignore}
+
+There's a [full blog post](http://florianhanke.com/blog/2011/09/01/picky-case-study-location-based-ads.html) devoted to this topic.
+
+In short, the `ignore :category_name` option makes Picky throw away any result combinations that have the named category in it.
+
+If Picky finds the tokens "florian hanke" in both `:first_name, :last_name` and `:last_name, :last_name`, and we've instructed it to ignore `first_name`,
+
+ names = Picky::Search.new name_index do
+ ignore :first_name
+ end
+
+then it will throw away the solutions for `:first_name, :last_name` (eg. "Peter Miller") and only use `:last_name, :last_name` (eg. "Smith Miller").
+
+#### Ignore Unassigned Tokens{#search-options-unassigned}
+
+There's a [full blog post](http://florianhanke.com/blog/2011/09/05/picky-ignoring-unassigned-tokens.html) devoted to this topic.
+
+In short, the `ignore_unassigned_tokens true/false` option makes Picky be very lenient with your queries. Usually, if one of the search words is not found, say in a query "aston martin cockadoodledoo", Picky will return an empty result set, because "cockadoodledoo" is not in any index, in a car search, for example.
+
+By ignoring the "cockadoodledoo" that can't be assigned sensibly, you will still get results.
+
+This could be used in a search for advertisements that are shown next to the results.
+
+If you've defined an ads search like so:
+
+ ads_search = Search.new cars_index do
+ ignore_unassigned_tokens true
+ end
+
+then even if Picky does not find anything for "aston martin cockadoodledoo", it will find an ad, simply ignoring the unassigned token.
+
+#### Maximum Allocations{#search-options-maxallocations}
+
+The `max_allocations(integer)` option cuts off calculation of allocations.
+
+What does this mean? Say you have code like:
+
+ phone_search = Search.new phonebook do
+ max_allocations 1
+ end
+
+And someone searches for "peter thomas".
+
+Picky then generates all possible allocations and sorts them.
+
+It might get
+
+* [first_name, last_name]
+* [last_name, first_name]
+* [first_name, first_name]
+* etc.
+
+with the first allocation being the most probable one.
+
+So, with `max_allocations 1` it will only use the topmost one and throw away all the others.
+
+It will only go through the first one and calculate only results for that one. This can be used to speed up Picky in case of exploding amounts of allocations.
+
+#### Early Termination{#search-options-terminateearly}
+
+The `terminate_early(integer)` or `terminate_early(with_extra_allocations: integer)` option stops Picky from calculate all ids of all allocations.
+
+However, this will also return a wrong total.
+
+So, important note: Only use when you don't display a total.
+
+Examples:
+
+Stop as soon as you have calculated enough ids for the allocation.
+
+ phone_search = Search.new phonebook do
+ terminate_early # The default uses 0.
+ end
+
+Stop as soon as you have calculated enough ids for the allocation, and then calculate 3 allocations more (for example, to show to the user).
+
+ phone_search = Search.new phonebook do
+ terminate_early 3
+ end
+
+There's also a hash form to be more explicit. So the next coder knows what it does. (However, us cool Picky hackers _know_ ;) )
+
+ phone_search = Search.new phonebook do
+ terminate_early with_extra_allocations: 5
+ end
+
+This option speeds up Picky if you don't need a correct total.
View
120 web/source/documentation/_servers.html.md
@@ -0,0 +1,120 @@
+## Servers / Applications{#servers}
+
+Picky, from version 3.0 onwards, is designed to run *anywhere*, *in anything*.
+
+This means you can have a Picky server running in a DRb instance if you want to. Or in irb, for example.
+
+We do run and test the Picky server in two styles, [Classic and Sinatra](#servers-classicvssinatra).
+
+But don't let that stop you from just using it in a class or just a script. This is a perfectly ok way to use Picky:
+
+ require 'picky'
+
+ include Picky # So we don't have to type Picky:: everywhere.
+
+ books_index = Index.new(:books) do
+ source Sources::CSV.new(:title, :author, file: 'library.csv')
+ category :title
+ category :author
+ end
+
+ books_index.index
+ books_index.reload
+
+ books = Search.new books_index do
+ boost [:title, :author] => +2
+ end
+
+ results = books.search "test"
+ results = books.search "alan turing"
+
+ require 'pp'
+ pp results.to_hash
+
+More *Ruby*, more *power* to you!
+
+### Sinatra Style{#servers-sinatra}
+
+A [Sinatra](http://sinatrarb.com) server is usually just a single file. In Picky, it is a top-level file named
+
+ app.rb
+
+We recommend to use the [modular Sinatra style](http://www.sinatrarb.com/intro#Serving%20a%20Modular%20Application) as opposed to the [classic style](http://www.sinatrarb.com/intro#Using%20a%20Classic%20Style%20Application%20with%20a%20config.ru). It's possible to write a Picky server in the classic style, but using the modular style offers more options.
+
+ require 'sinatra/base'
+ require 'picky'
+
+ class BookSearch < Sinatra::Application
+
+ books_index = Index.new(:books) do
+ source { Book.order("isbn ASC") }
+ category :title
+ category :author
+ end
+
+ books = Search.new books_index do
+ boost [:title, :author] => +2
+ end
+
+ get '/books' do
+ results = books.search params[:query],
+ params[:ids] || 20,
+ params[:offset] || 0
+ results.to_json
+ end
+
+ end
+
+This is already a complete Sinatra server.
+
+#### Routing{#servers-sinatra-routing}
+
+The Sinatra Picky server uses the same routing as Sinatra (of course). [More information on Sinara routing](http://www.sinatrarb.com/intro#Routes).
+
+If you use the server with the picky client software (provided with the picky-client gem), you should return JSON from the Sinatra `get`.
+Just call `to_json` on the returned results to get the results in JSON format.
+
+ get '/books' do
+ results = books.search params[:query], params[:ids] || 20, params[:offset] || 0
+ results.to_json
+ end
+
+The above example search can be called using for example @curl@:
+
+ curl 'localhost:8080/books?query=test'
+
+#### Logging{#servers-sinatra-logging}
+
+TODO Update this section.
+
+This is one way to do it:
+
+ MyLogger = Logger.new "log/search.log"
+
+ # ...
+
+ get '/books' do
+ results = books.search "test"
+ MyLogger.info results
+ results.to_json
+ end
+
+or set it up in separate files for different environments:
+
+ require "logging/#{PICKY_ENVIRONMENT}"
+
+Note that this is not Rack logging, but Picky search engine logging. The resulting file can be used with the picky-statistics gem.
+
+### All In One (Client + Server){#servers-allinone}
+
+The All In One server is a Sinatra server and a Sinatra client rolled in one.
+
+It's best to just generate one and look at it:
+
+ picky generate all_in_one all_in_one_test
+
+and then follow the instructions.
+
+When would you use an All In One server? One place is [Heroku](http://heroku.com), since it is a bit more complicated to set up two servers that interact with each other.
+
+It's nice for small convenient searches. For production setups we recommend to use a separate server to make everything separately cacheable etc.
View
3 web/source/documentation/_thanks.html.md
@@ -0,0 +1,3 @@
+### Thanks!
+
+Thanks to whoever made the [Sinatra README page](http://www.sinatrarb.com/intro) for the inspiration.
View
135 web/source/documentation/_tokenizing.html.md
@@ -0,0 +1,135 @@
+## Tokenizing{#tokenizing}
+
+The `indexing` method in an `Index` describes how *index data* is handled.
+
+The `searching` method in a `Search` describes how *queries* are handled.
+
+This is where you use these options:
+
+ Picky::Index.new :books do
+ indexing options_hash_or_tokenizer
+ end
+
+ Search.new *indexes do
+ searching options_hash_or_tokenizer
+ end
+
+Both take either an options hash, your hand-rolled tokenizer, or a `Picky::Tokenizer` instance initialized with the options hash.
+
+### Options{#tokenizing-options}
+
+Picky by default goes through the following list, in order:
+
+1. *substitutes_characters_with*: A character substituter that responds to `#substitute(text) #=> substituted text`
+1. *removes_characters*: Regexp of characters to remove.
+1. *stopwords*: Regexp of stopwords to remove.
+1. *splits_text_on*: Regexp on where to split the query text, including category qualifiers.
+1. *removes_characters_after_splitting*: Regexp on which characters to remove after the splitting.
+1. *normalizes_words*: `[[/matching_regexp/, 'replace match \1']]`
+1. *max_words*: How many words will be passed into the core engine. Default: `Infinity` (Don't go there, ok?).
+1. *rejects_token_if*: `->(token){ token == 'hello' }`
+1. *case_sensitive*: `true` or `false`, `false` is default.
+1. *stems_with*: A stemmer, ie. an object that responds to `stem(text)` that returns stemmed text.
+
+You pass the above options into
+
+ Search.new *indexes do
+ searching options_hash
+ end
+
+You can provide your own tokenizer:
+
+ Search.new books_index do
+ searching MyTokenizer.new
+ end
+
+TODO Update what the tokenizer needs to return.
+
+The tokenizer needs to respond to the method `#tokenize(text)`, returning a `Picky::Query::Tokens` object. If you have an array of tokens, e.g. `[:my, :nice, :tokens]`,
+you can pass it into `Picky::Query::Tokens.process(my_tokens)` to get the tokens and return these.
+
+`rake 'try[text,some_index,some_category]'` (`some_index`, `some_category` optional) tells you how a given text is indexed.
+
+It needs to be programmed in a performance efficient way if you want your search engine to be fast.
+
+### Tokenizer{#tokenizing-tokenizer}
+
+Even though you usually provide options (see below), you can provide your own:
+
+ Picky::Index.new :books do
+ indexing MyTokenizer.new
+ end
+
+The tokenizer must respond to `tokenize(text)` and return `[tokens, words]`, where `tokens` is an Array of processed tokens and `words` is an Array of words that represent the original words in the query (or as close as possible to the original words).
+
+It is also possible to return `[tokens]`, where tokens is the Array of processed query words. (Picky will then just use the tokens as words)
+
+#### Examples{#tokenizing-examples}
+
+A very simple tokenizer that just splits the input on commas:
+
+ class MyTokenizer
+ def tokenize text
+ tokens = text.split ','
+ [tokens]
+ end
+ end
+
+ MyTokenizer.new.tokenize "Hello, world!" # => [["Hello", " world!"]]
+
+ Picky::Index.new :books do
+ indexing MyTokenizer.new
+ end
+
+The same could have been achieved with this:
+
+ Picky::Index.new :books do
+ indexing splits_text_on: ','
+ end
+
+### Notes{#tokenizing-notes}
+
+Usually, you use the same options for indexing and searching:
+
+ tokenizer_options = { ... }
+
+ index = Picky::Index.new :example do
+ indexing tokenizer_options
+ end
+
+ Search.new index do
+ searching tokenizer_options
+ end
+
+However, consider this example.
+Let's say your data has lots of words in them that look like this: `all-data-are-tokenized-by-dashes`.
+And people would search for them using spaces to keep words apart: `searching for data`.
+In this case it's a good idea to split the data and the query differently.
+Split the data on dashes, and queries on `\s`:
+
+ index = Picky::Index.new :example do
+ indexing splits_text_on: /-/
+ end
+
+ Search.new index do
+ searching splits_text_on: /\s/
+ end
+
+The rule number one to remember when tokenizing is:
+*Tokenized query text needs to match the text that is in the index.*
+
+So both the index and the query need to tokenize to the same string:
+
+* `all-data-are-tokenized-by-dashes` => `["all", "data", "are", "tokenized", "by", "dashes"]`
+* `searching for data` => `["searching", "for", "data"]`
+
+Either look in the `/index` directory (the "prepared" files is the tokenized data), or use Picky's `try` rake task:
+
+ $ rake try[test]
+ "test" is saved in the Picky::Indexes index as ["test"]
+ "test" as a search will be tokenized as ["test"]
+
+You can tell Picky which index, or even category to use:
+
+ $ rake try[test,books]
+ $ rake try[test,books,title]
View
4 web/source/stylesheets/colors.css.sass
@@ -3,8 +3,8 @@ body
.container_2
- div.index
- :background-color white
+ // div.index
+ // :background-color white
.grid_1
:background-color #F5F5EA
View
17 web/source/stylesheets/specific.css.sass
@@ -2,15 +2,20 @@
:overflow hidden
div.index
- :width 20em
- :float right
- :padding 3em
- :margin 1em
- :border 2px solid #E0E0D0
+ :width 320px
+ // :width 20em
+ // :float right
+ :padding-right 2em
+ :padding-left 2em
+ // :margin 1em
+ // :border 2px solid #E0E0D0
h2, h3, h4, p, pre
:overflow auto
-
+
+ .grid_1.help
+ :width 590px
+
.header
:width 100%
:height 70px

0 comments on commit c763882

Please sign in to comment.