Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 248 lines (157 sloc) 8.555 kb
c763882 @floere + moved the docs into the Picky main repo
authored
1 ## Indexes{#indexes}
2
4278665 @floere Rewrite Picky web page to use special redcarpet header id handling.
authored
3
d242249 @floere + edit links for all the doc sections
authored
4 [edit](http://github.com/floere/picky/blob/master/web/source/documentation/_indexes.html.md)
5
c763882 @floere + moved the docs into the Picky main repo
authored
6 Indexes do three things:
7
8 * Define where the data comes from.
9 * Define how data is handled before it enters the index.
10 * Hold index categories.
11
12 ### Types{#indexes-types}
13
14 Picky offers a choice of four index types:
15
16 * Memory: Saves its indexes in JSON on disk and loads them into memory.
17 * Redis: Saves its indexes in Redis.
18 * SQLite: Saves its indexes in rows of a SQLite DB.
19 * File: Saves its indexes in JSON in files.
20
21 This is how they look in code:
22
23 books_memory_index = Index.new :books do
24 # Configuration goes here.
25 end
26
27 books_redis_index = Index.new :books do
28 backend Backends::Redis.new
29 # Configuration goes here.
30 end
31
32 Both save the preprocessed data from the data source in the `/index` directory so you can go look if the data is preprocessed correctly.
33
34 Indexes are then used in a `Search` interface.
35
36 Searching over one index:
37
38 books = Search.new books_index
39
40 Searching over multiple indexes:
41
42 media = Search.new books_index, dvd_index, mp3_index
43
44 The resulting ids should be from the same id space to be useful – or the ids should be exclusive, such that eg. a book id does not collide with a dvd id.
45
46 #### In-Memory / File-based{#indexes-types-memory}
47
48 The in-memory index saves its indexes as files transparently in the form of JSON files that reside in the `/index` directory.
49
ecf9ea7 @beatrichartz . some preliminary corrections to the documentation
beatrichartz authored
50 When the server is started, they are loaded into memory. As soon as the server is stopped, the indexes are deleted from memory.
c763882 @floere + moved the docs into the Picky main repo
authored
51
52 Indexing regenerates the JSON index files and can be reloaded into memory, even in the running server (see below).
53
54 #### Redis{#indexes-types-redis}
55
56 The Redis index saves its indexes in the Redis server on the default port, using database 15.
57
58 When the server is started, it connects to the Redis server and uses the indexes in the key-value store.
59
ecf9ea7 @beatrichartz . some preliminary corrections to the documentation
beatrichartz authored
60 Indexing regenerates the indexes in the Redis server – you do not have to restart the server running Picky.
c763882 @floere + moved the docs into the Picky main repo
authored
61
62 #### SQLite{#indexes-types-sqlite}
63
64 TODO
65
66 #### File{#indexes-types-file}
67
68 TODO
69
70 ### Accessing{#indexes-acessing}
71
72 If you don't have access to your indexes directly, like so
73
74 books_index = Index.new(:books) do
75 # ...
76 end
77
78 books_index.do_something_with_the_index
79
80 and for example you'd like to access the index from a rake task, you can use
81
82 Picky::Indexes
83
84 to get *all indexes*.
85
86 To get a *single index* use
87
88 Picky::Indexes[:index_name]
89
ecf9ea7 @beatrichartz . some preliminary corrections to the documentation
beatrichartz authored
90 and to get a *single category* of an index, use
c763882 @floere + moved the docs into the Picky main repo
authored
91
92 Picky::Indexes[:index_name][:category_name]
93
94 That's it.
95
96 ### Configuration{#indexes-configuration}
97
98 This is all you can do to configure an index:
99
100 books_index = Index.new :books do
101 source { Book.order("isbn ASC") }
102
8718c50 @floere ! documentation
authored
103 indexing removes_characters: /[^a-z0-9\s\:\"\&\.\|]/i, # Default: nil
c763882 @floere + moved the docs into the Picky main repo
authored
104 stopwords: /\b(and|the|or|on|of|in)\b/i, # Default: nil
105 splits_text_on: /[\s\/\-\_\:\"\&\/]/, # Default: /\s/
106 removes_characters_after_splitting: /[\.]/, # Default: nil
107 normalizes_words: [[/\$(\w+)/i, '\1 dollars']], # Default: nil
108 rejects_token_if: lambda { |token| token == :blurf }, # Default: nil
109 case_sensitive: true, # Default: false
110 substitutes_characters_with: Picky::CharacterSubstituters::WestEuropean.new, # Default: nil
111 stems_with: Lingua::Stemmer.new # Default: nil
112
113 category :id
114 category :title,
115 partial: Partial::Substring.new(:from => 1),
116 similarity: Similarity::DoubleMetaphone.new(2),
117 qualifiers: [:t, :title, :titre]
118 category :author,
119 partial: Partial::Substring.new(:from => -2)
120 category :year,
121 partial: Partial::None.new
122 qualifiers: [:y, :year, :annee]
123
124 result_identifier 'boooookies'
125 end
126
ecf9ea7 @beatrichartz . some preliminary corrections to the documentation
beatrichartz authored
127 Usually you won't need to configure all that.
c763882 @floere + moved the docs into the Picky main repo
authored
128
129 But if your boss comes in the door and asks why X is not found… you know. And you can improve the search engine relatively *quickly and painless*.
130
131 More power to you.
132
133 ### Data Sources{#indexes-sources}
134
80faa66 @floere + Implicit and explicit data sources
authored
135 Data sources define where the data for an index comes from. There are [explicit data sources](#indexes-sources-explicit) and [implicit data sources](#indexes-sources-implicit).
136
137 #### Explicit Data Sources{#indexes-sources-explicit}
138
139 Explicit data sources are mentioned in the index definition using the `#source` method.
c763882 @floere + moved the docs into the Picky main repo
authored
140
141 You define them on an *index*:
142
143 Index.new :books do
144 source Book.all # Loads the data instantly.
145 end
146
147 Index.new :books do
148 source { Book.all } # Loads on indexing. Preferred.
149 end
150
ecf9ea7 @beatrichartz . some preliminary corrections to the documentation
beatrichartz authored
151 Or even on a *single category*:
c763882 @floere + moved the docs into the Picky main repo
authored
152
153 Index.new :books do
154 category :title,
155 source: lambda { Book.all }
156 end
ecf9ea7 @beatrichartz . some preliminary corrections to the documentation
beatrichartz authored
157
158 TODO more explanation how index sources and single category sources might work together.
c763882 @floere + moved the docs into the Picky main repo
authored
159
80faa66 @floere + Implicit and explicit data sources
authored
160 Explicit data sources must [respond to #each](#indexes-sources-each), for example, an Array.
c763882 @floere + moved the docs into the Picky main repo
authored
161
80faa66 @floere + Implicit and explicit data sources
authored
162 ##### Responding to #each{#indexes-sources-each}
c763882 @floere + moved the docs into the Picky main repo
authored
163
164 Picky supports any data source as long as it supports `#each`.
165
166 See [under Flexible Sources](http://florianhanke.com/blog/2011/04/14/picky-two-point-two-point-oh.html) how you can use this.
167
168 In short. Model:
169
170 class Monkey
171 attr_reader :id, :name, :color
172 def initialize id, name, color
173 @id, @name, @color = id, name, color
174 end
175 end
176
177 The data:
178
179 monkeys = [
180 Monkey.new(1, 'pete', 'red'),
181 Monkey.new(2, 'joey', 'green'),
182 Monkey.new(3, 'hans', 'blue')
183 ]
184
185 Setting the array as a source
186
187 Index::Memory.new :monkeys do
188 source { monkeys }
189 category :name
190 category :couleur, :from => :color # The couleur category will take its data from the #color method.
191 end
192
80faa66 @floere + Implicit and explicit data sources
authored
193 ##### Delayed{#indexes-sources-delayed}
c763882 @floere + moved the docs into the Picky main repo
authored
194
195 If you define the source directly in the index block, it will be evaluated instantly:
196
197 Index::Memory.new :books do
198 source Book.order('title ASC')
199 end
200
8718c50 @floere ! documentation
authored
201 This works with ActiveRecord and other similar ORMs since `Book.order` returns a proxy object that will only be evaluated when the server is indexing.
c763882 @floere + moved the docs into the Picky main repo
authored
202
203 For example, this would instantly get the records, since `#all` is a kicker method:
204
205 Index::Memory.new :books do
206 source Book.all # Not the best idea.
207 end
208
ecf9ea7 @beatrichartz . some preliminary corrections to the documentation
beatrichartz authored
209 In this case, it is better to give the `source` method a block:
c763882 @floere + moved the docs into the Picky main repo
authored
210
211 Index::Memory.new :books do
212 source { Book.all }
213 end
214
215 This block will be executed as soon as the indexing is running, but not earlier.
216
80faa66 @floere + Implicit and explicit data sources
authored
217 #### Implicit Data Sources{#indexes-sources-implicit}
c763882 @floere + moved the docs into the Picky main repo
authored
218
80faa66 @floere + Implicit and explicit data sources
authored
219 Implicit data sources are not mentioned in the index definition, but rather, the data is added (or removed) via *realtime* methods on an index, like `#add`, `#<<`, `#unshift`, `#remove`, `#replace`, and a special form, `#replace_from`.
c763882 @floere + moved the docs into the Picky main repo
authored
220
80faa66 @floere + Implicit and explicit data sources
authored
221 So, you *don't* define them on an index or category as in the explicit data source, but instead add to either like so:
222
223 index = Index.new :books do
224 category :example
c763882 @floere + moved the docs into the Picky main repo
authored
225 end
80faa66 @floere + Implicit and explicit data sources
authored
226
227 Book = Struct.new :id, :example
228 index.add Book.new(1, "Hello!")
229 index.add Book.new(2, "World!")
c763882 @floere + moved the docs into the Picky main repo
authored
230
80faa66 @floere + Implicit and explicit data sources
authored
231 Or to a specific category:
c763882 @floere + moved the docs into the Picky main repo
authored
232
80faa66 @floere + Implicit and explicit data sources
authored
233 index[:example].add Book.new(3, "Only add to a single category")
c763882 @floere + moved the docs into the Picky main repo
authored
234
80faa66 @floere + Implicit and explicit data sources
authored
235 ##### Methods to change index or category data{#indexes-sources-implicit-methods}
236
237 Currently, there are 7 methods to change an index:
c763882 @floere + moved the docs into the Picky main repo
authored
238
80faa66 @floere + Implicit and explicit data sources
authored
239 * `#add`: Adds the thing to the end of the index (even if already there). `index.add thing`
ecf9ea7 @beatrichartz . some preliminary corrections to the documentation
beatrichartz authored
240 * `#<<`: Adds the thing to the end of the index (shows up last in results). `index << thing`
241 * `#unshift`: Adds the thing to the beginning of the index (shows up first in results). `index.unshift thing`
80faa66 @floere + Implicit and explicit data sources
authored
242 * `#remove`: Removes the thing from the index (if there). `index.remove thing`
243 * `#replace`: Replaces the thing in the index (if there, otherwise like `#add`). Equal to `#remove` followed by `#add`. `index.replace thing`
244 * `#replace_from`: Pass in a Hash. Replaces the thing in the index (if there, otherwise like `#add`). Equal to `#remove` followed by `#add`. `index.replace id: 1, example: "Hello, I am Hash!"`
c763882 @floere + moved the docs into the Picky main repo
authored
245
246 ### Indexing / Tokenizing{#indexes-indexing}
247
f7fa095 @floere + more partial documentation
authored
248 See [Tokenizing](#tokenizing) for tokenizer options.
Something went wrong with that request. Please try again.