Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 231 lines (147 sloc) 7.392 kb
c763882 Florian R. Hanke + moved the docs into the Picky main repo
authored
1 ## Indexes{#indexes}
2
d242249 Florian R. Hanke + edit links for all the doc sections
authored
3 {.edit}
4 [edit](http://github.com/floere/picky/blob/master/web/source/documentation/_indexes.html.md)
5
c763882 Florian R. Hanke + moved the docs into the Picky main repo
authored
6 Indexes do three things:
7
8 * Define where the data comes from.
9 * Define how data is handled before it enters the index.
10 * Hold index categories.
11
12 ### Types{#indexes-types}
13
14 Picky offers a choice of four index types:
15
16 * Memory: Saves its indexes in JSON on disk and loads them into memory.
17 * Redis: Saves its indexes in Redis.
18 * SQLite: Saves its indexes in rows of a SQLite DB.
19 * File: Saves its indexes in JSON in files.
20
21 This is how they look in code:
22
23 books_memory_index = Index.new :books do
24 # Configuration goes here.
25 end
26
27 books_redis_index = Index.new :books do
28 backend Backends::Redis.new
29 # Configuration goes here.
30 end
31
32 Both save the preprocessed data from the data source in the `/index` directory so you can go look if the data is preprocessed correctly.
33
34 Indexes are then used in a `Search` interface.
35
36 Searching over one index:
37
38 books = Search.new books_index
39
40 Searching over multiple indexes:
41
42 media = Search.new books_index, dvd_index, mp3_index
43
44 The resulting ids should be from the same id space to be useful – or the ids should be exclusive, such that eg. a book id does not collide with a dvd id.
45
46 #### In-Memory / File-based{#indexes-types-memory}
47
48 The in-memory index saves its indexes as files transparently in the form of JSON files that reside in the `/index` directory.
49
50 When the server is started, they are loaded into memory. As soon as the server is stopped, the indexes are not in memory again.
51
52 Indexing regenerates the JSON index files and can be reloaded into memory, even in the running server (see below).
53
54 #### Redis{#indexes-types-redis}
55
56 The Redis index saves its indexes in the Redis server on the default port, using database 15.
57
58 When the server is started, it connects to the Redis server and uses the indexes in the key-value store.
59
60 Indexing regenerates the indexes in the Redis server – you do not have to restart the server for that.
61
62 #### SQLite{#indexes-types-sqlite}
63
64 TODO
65
66 #### File{#indexes-types-file}
67
68 TODO
69
70 ### Accessing{#indexes-acessing}
71
72 If you don't have access to your indexes directly, like so
73
74 books_index = Index.new(:books) do
75 # ...
76 end
77
78 books_index.do_something_with_the_index
79
80 and for example you'd like to access the index from a rake task, you can use
81
82 Picky::Indexes
83
84 to get *all indexes*.
85
86 To get a *single index* use
87
88 Picky::Indexes[:index_name]
89
90 and to get a *single category*, use
91
92 Picky::Indexes[:index_name][:category_name]
93
94 That's it.
95
96 ### Configuration{#indexes-configuration}
97
98 This is all you can do to configure an index:
99
100 books_index = Index.new :books do
101 source { Book.order("isbn ASC") }
102
8718c50 Florian R. Hanke ! documentation
authored
103 indexing removes_characters: /[^a-z0-9\s\:\"\&\.\|]/i, # Default: nil
c763882 Florian R. Hanke + moved the docs into the Picky main repo
authored
104 stopwords: /\b(and|the|or|on|of|in)\b/i, # Default: nil
105 splits_text_on: /[\s\/\-\_\:\"\&\/]/, # Default: /\s/
106 removes_characters_after_splitting: /[\.]/, # Default: nil
107 normalizes_words: [[/\$(\w+)/i, '\1 dollars']], # Default: nil
108 rejects_token_if: lambda { |token| token == :blurf }, # Default: nil
109 case_sensitive: true, # Default: false
110 substitutes_characters_with: Picky::CharacterSubstituters::WestEuropean.new, # Default: nil
111 stems_with: Lingua::Stemmer.new # Default: nil
112
113 category :id
114 category :title,
115 partial: Partial::Substring.new(:from => 1),
116 similarity: Similarity::DoubleMetaphone.new(2),
117 qualifiers: [:t, :title, :titre]
118 category :author,
119 partial: Partial::Substring.new(:from => -2)
120 category :year,
121 partial: Partial::None.new
122 qualifiers: [:y, :year, :annee]
123
124 result_identifier 'boooookies'
125 end
126
127 Usually you don't need to configure all that.
128
129 But if your boss comes in the door and asks why X is not found… you know. And you can improve the search engine relatively *quickly and painless*.
130
131 More power to you.
132
133 ### Data Sources{#indexes-sources}
134
135 Data sources define where the data for an index comes from.
136
137 You define them on an *index*:
138
139 Index.new :books do
140 source Book.all # Loads the data instantly.
141 end
142
143 Index.new :books do
144 source { Book.all } # Loads on indexing. Preferred.
145 end
146
147 Or even a *single category*:
148
149 Index.new :books do
150 category :title,
151 source: lambda { Book.all }
152 end
153
154 At the moment there are two possibilities: [Objects responding to #each](#indexes-sources-each) and [Picky classic style sources](#indexes-sources-classic).
155
156 #### Responding to #each{#indexes-sources-each}
157
158 Picky supports any data source as long as it supports `#each`.
159
160 See [under Flexible Sources](http://florianhanke.com/blog/2011/04/14/picky-two-point-two-point-oh.html) how you can use this.
161
162 In short. Model:
163
164 class Monkey
165 attr_reader :id, :name, :color
166 def initialize id, name, color
167 @id, @name, @color = id, name, color
168 end
169 end
170
171 The data:
172
173 monkeys = [
174 Monkey.new(1, 'pete', 'red'),
175 Monkey.new(2, 'joey', 'green'),
176 Monkey.new(3, 'hans', 'blue')
177 ]
178
179 Setting the array as a source
180
181 Index::Memory.new :monkeys do
182 source { monkeys }
183 category :name
184 category :couleur, :from => :color # The couleur category will take its data from the #color method.
185 end
186
187 #### Delayed{#indexes-sources-delayed}
188
189 If you define the source directly in the index block, it will be evaluated instantly:
190
191 Index::Memory.new :books do
192 source Book.order('title ASC')
193 end
194
8718c50 Florian R. Hanke ! documentation
authored
195 This works with ActiveRecord and other similar ORMs since `Book.order` returns a proxy object that will only be evaluated when the server is indexing.
c763882 Florian R. Hanke + moved the docs into the Picky main repo
authored
196
197 For example, this would instantly get the records, since `#all` is a kicker method:
198
199 Index::Memory.new :books do
200 source Book.all # Not the best idea.
201 end
202
203 In this case, you can give the `source` method a block:
204
205 Index::Memory.new :books do
206 source { Book.all }
207 end
208
209 This block will be executed as soon as the indexing is running, but not earlier.
210
211 #### Classic Style{#indexes-sources-classic}
212
213 The classic style uses Picky's own `Picky::Sources` to load the data into the index.
214
215 Index.new :books do
216 source Sources::CSV.new(:title, :author, file: 'app/library.csv')
217 end
218
219 Use this one if you want to use a simple CSV file.
220
221 However, you could also use the built-in Ruby `CSV` class and use it as an `#each` source (see above).
222
223 Index.new :books do
224 source Sources::DB.new('SELECT id, title, author, isbn13 as isbn FROM books', file: 'app/db.yml')
225 end
226
227 Use this one if you want to use a database source with very custom SQL statements. If not, we suggest you use an ORM as an `#each` source (see above).
228
229 ### Indexing / Tokenizing{#indexes-indexing}
230
f7fa095 Florian R. Hanke + more partial documentation
authored
231 See [Tokenizing](#tokenizing) for tokenizer options.
Something went wrong with that request. Please try again.