Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 605 lines (475 sloc) 20.51 kb
6d1c074 @rnewson add notes about version compatibility.
authored
1 <h1>Version Compatibility</h1>
2
3 <table>
4 <tr><th>CouchDB</th><th>couchdb-lucene</th></tr>
5 <tr><td>0.9.1, 0.10</td><td>0.4</td></tr>
5cd5577 @rnewson 0.11 is now released.
authored
6 <tr><td>0.11</td><td>0.4-maint (0.4 with patch for trunk compatibility)</td></tr>
bc1bd8f @rnewson extend version compatibility matrix
authored
7 <tr><td>0.10.x, 0.11.x, 1.0.x, 1.1.x</td><td>0.5.x, 0.6.x, 0.7.x</td></tr>
6d1c074 @rnewson add notes about version compatibility.
authored
8 </table>
9
9084540 @rnewson add breaking changes section.
authored
10 <h1>Breaking Changes</h1>
11
12 <ul>
80b3ef9 @rnewson update docs for impending 0.6.0 release
authored
13 <li>couchdb-lucene 0.5.x and higher runs as a standalone daemon (0.4 was run directly by couchdb).
9084540 @rnewson add breaking changes section.
authored
14 <li>URL's now require the full design document id (where you would say "foo", you must now say "_design/foo").
15 </ul>
16
c6356fd @rnewson update README.md and TODO to reflect progress.
authored
17 <h1>Issue Tracking</h1>
a785480 @rnewson lighthouse sucks at formatting anything, abandon ship.
authored
18
7a0d1d3 @rnewson lighthouse sucks at formatting anything, abandon ship.
authored
19 Issue tracking at <a href="http://github.com/rnewson/couchdb-lucene/issues">github</a>.
5d4e56a @rnewson update readme.
authored
20
7dc5bac @rnewson reorg begins.
authored
21 <h1>Minimum System Requirements</h1>
ef3f787 @rnewson add sysreq for Sun JDK.
authored
22
0c6db37 @rnewson make unzip requirement more obvious
authored
23 Java 1.5 (or above) is required; the <strike>Sun</strike> Oracle version is recommended as it's regularly tested against.
ef3f787 @rnewson add sysreq for Sun JDK.
authored
24
0c6db37 @rnewson make unzip requirement more obvious
authored
25 <h1>Build and run couchdb-lucene</h1>
b207965 @rnewson improve README readability.
authored
26
27 <ol>
28 <li>Install Maven 2.
29 <li>checkout repository
30 <li>type 'mvn'
0c6db37 @rnewson make unzip requirement more obvious
authored
31 <li>cd target
2864de2 @rnewson and escape html entities..
authored
32 <li>unzip couchdb-lucene-&lt;version&gt;.zip
33 <li>cd couchdb-lucene-&lt;version&gt;
0c6db37 @rnewson make unzip requirement more obvious
authored
34 <li>./bin/run
b207965 @rnewson improve README readability.
authored
35 </ol>
36
0c6db37 @rnewson make unzip requirement more obvious
authored
37 The zip file contains all the couchdb-lucene code, dependencies, startup scripts and configuration files you need, so unzip it wherever you wish to install couchdb-lucene.
2d6180b @rnewson include properties file in zip. still not available to run script, thoug...
authored
38
b207965 @rnewson improve README readability.
authored
39 <h1>Configure CouchDB</h1>
40
0c6db37 @rnewson make unzip requirement more obvious
authored
41 The following settings are needed in CouchDB's local.ini file in order for it to communicate with couchdb-lucene;
42
b207965 @rnewson improve README readability.
authored
43 <pre>
0563120 @rnewson fixes.
authored
44 [couchdb]
45 os_process_timeout=60000 ; increase the timeout from 5 seconds.
46
b207965 @rnewson improve README readability.
authored
47 [external]
5675f7a @rnewson fix path
authored
48 fti=/path/to/python /path/to/couchdb-lucene/tools/couchdb-external-hook.py
b207965 @rnewson improve README readability.
authored
49
50 [httpd_db_handlers]
95b38b9 @rnewson more html escaping.
authored
51 _fti = {couch_httpd_external, handle_external_req, &lt;&lt;"fti"&gt;&gt;}
b207965 @rnewson improve README readability.
authored
52 </pre>
53
cbfe065 @rnewson remove registry and indexkey, change output paths
authored
54 <h2>Hook options</h2>
55
56 <table>
57 <tr><th>Option</th><th>Meaning</th><th>Default Value</th></tr>
58 <tr><td>--remote-host</td><td>The hostname of the couchdb-lucene server</td><td>localhost</td></tr>
59 <tr><td>--remote-port</td><td>The port of the couchdb-lucene server</td><td>5985</td></tr>
c47c494 @rnewson updated README to reflect latest Python scripts settings
authored
60 <tr><td>--local-key</td><td>The key for the local couchdb instance as known to the couchdb-lucene server</td><td>local</td></tr>
cbfe065 @rnewson remove registry and indexkey, change output paths
authored
61 </table>
b047e4a @rnewson update README.md to reflect (future) start/stop and config changes.
authored
62
cbfe065 @rnewson remove registry and indexkey, change output paths
authored
63 <h1>Configure couchdb-lucene</h1>
b047e4a @rnewson update README.md to reflect (future) start/stop and config changes.
authored
64
cbfe065 @rnewson remove registry and indexkey, change output paths
authored
65 couchdb-lucene runs in a single, standalone JVM. As such, you can choose to locate your couchdb-lucene server on a different machine to couchdb if you wish, or keep it on the same machine, it's your call.
b047e4a @rnewson update README.md to reflect (future) start/stop and config changes.
authored
66
67 <h1>Start couchdb-lucene</h1>
68
69 To start couchdb-lucene, run;
70 <pre>
2d6180b @rnewson include properties file in zip. still not available to run script, thoug...
authored
71 bin/run
b047e4a @rnewson update README.md to reflect (future) start/stop and config changes.
authored
72 </pre>
73
2d6180b @rnewson include properties file in zip. still not available to run script, thoug...
authored
74 To stop couchdb-lucene, simply kill the Java process.
b047e4a @rnewson update README.md to reflect (future) start/stop and config changes.
authored
75
b207965 @rnewson improve README readability.
authored
76 <h1>Indexing Strategy</h1>
77
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
78 <h2>Document Indexing</h2>
79
4c21806 @rnewson mention that you can return Document[] from an index function.
authored
80 You must supply a index function in order to enable couchdb-lucene as, by default, nothing will be indexed. To suppress a document from the index, return null. It's more typical to return a single Document object which contains everything you'd like to query and retrieve. You may also return an array of Document objects if you wish.
a2e9024 @rnewson wip
authored
81
437eae9 @rnewson s/view/fulltext in README.md
authored
82 You may add any number of index views in any number of design documents. All searches will be constrained to documents emitted by the index functions.
c207a60 @rnewson update README
authored
83
5077366 @rnewson clarify design document and add matching query URL's.
authored
84 Here's an complete example of a design document with couchdb-lucene features:
a2e9024 @rnewson wip
authored
85
697884b @rnewson documentation of future features.
authored
86 <pre>
87 {
3d2fb72 @rnewson fix example in README.
authored
88 "_id":"_design/foo",
5077366 @rnewson clarify design document and add matching query URL's.
authored
89 "fulltext": {
90 "by_subject": {
91 "index":"function(doc) { var ret=new Document(); ret.add(doc.subject); return ret }"
92 },
93 "by_content": {
94 "index":"function(doc) { var ret=new Document(); ret.add(doc.content); return ret }"
95 }
697884b @rnewson documentation of future features.
authored
96 }
97 }
98 </pre>
99
5077366 @rnewson clarify design document and add matching query URL's.
authored
100 Here are some example URL's for the given design document;
101
102 <pre>
6b73dc6 @rnewson commit to _design/foo in paths, use regexp to be sure.
authored
103 http://localhost:5984/database/_fti/_design/foo/by_subject?q=hello
104 http://localhost:5984/database/_fti/_design/foo/by_content?q=hello
5077366 @rnewson clarify design document and add matching query URL's.
authored
105 </pre>
106
697884b @rnewson documentation of future features.
authored
107 A fulltext object contains multiple index view declarations. An index view consists of;
108
109 <dl>
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
110 <dt>analyzer</dt><dd>(optional) The analyzer to use</dd>
111 <dt>defaults</dt><dd>(optional) The default for numerous indexing options can be overridden here. A full list of options follows.</dd>
697884b @rnewson documentation of future features.
authored
112 <dt>index</dt><dd>The indexing function itself, documented below.</dd>
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
113 </dl>
697884b @rnewson documentation of future features.
authored
114
115 <h3>The Defaults Object</h3>
116
117 The following indexing options can be defaulted;
118
119 <table>
120 <tr>
121 <th>name</th>
122 <th>description</th>
123 <th>available options</th>
124 <th>default</th>
125 </tr>
126 <tr>
a40523d @rnewson documentation of future features.
authored
127 <th>field</th>
128 <td>the field name to index under</td>
129 <td>user-defined</td>
130 <td>default</td>
d2e1e9e @rnewson add notes about typing.
authored
131 </tr>
132 <tr>
133 <th>type</th>
134 <td>the type of the field</td>
38ef320 @rnewson fix documentation s/integer/int
authored
135 <td>date, double, float, int, long, string</td>
d2e1e9e @rnewson add notes about typing.
authored
136 <td>string</td>
137 </tr>
a40523d @rnewson documentation of future features.
authored
138 <tr>
697884b @rnewson documentation of future features.
authored
139 <th>store</th>
f16fc9c @rnewson docs
authored
140 <td>whether the data is stored. The value will be returned in the search result.</td>
697884b @rnewson documentation of future features.
authored
141 <td>yes, no</td>
142 <td>no</td>
d2e1e9e @rnewson add notes about typing.
authored
143 </tr>
697884b @rnewson documentation of future features.
authored
144 <tr>
145 <th>index</th>
146 <td>whether (and how) the data is indexed</td>
8328332 @rnewson typo
authored
147 <td>analyzed, analyzed_no_norms, no, not_analyzed, not_analyzed_no_norms</td>
697884b @rnewson documentation of future features.
authored
148 <td>analyzed</td>
d2e1e9e @rnewson add notes about typing.
authored
149 </tr>
f6bfce8 @rnewson add EXPERIMENTAL index-time field boosting feature.
authored
150 <tr>
b00fb7f add termvector to the possible options for Field.
Santiago M. Mola authored
151 <th>termvector</th>
152 <td>whether and how a field should have term vectors</td>
153 <td>no, with_offsets, with_positions, with_positions_offsets, yes</td>
154 <td>no</td>
155 </tr>
156 <tr>
f6bfce8 @rnewson add EXPERIMENTAL index-time field boosting feature.
authored
157 <th>boost</th>
158 <td>Sets the boost factor hits on this field. This value will be multiplied into the score of all hits on this this field of this document.</td>
159 <td>floating-point value</td>
160 <td>1.0</td>
161 </tr>
697884b @rnewson documentation of future features.
authored
162 </table>
087dcec @rnewson update documentation.
authored
163
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
164 <h3>The Analyzer Option</h3>
165
166 Lucene has numerous ways of converting free-form text into tokens, these classes are called Analyzer's. By default, the StandardAnalyzer is used which lower-cases all text, drops common English words ("the", "and", and so on), among other things. This processing might not always suit you, so you can choose from several others by setting the "analyzer" field to one of the following values;
167
168 <ul>
169 <li>brazilian</li>
170 <li>chinese</li>
171 <li>cjk</li>
172 <li>czech</li>
173 <li>dutch</li>
174 <li>english</li>
175 <li>french</li>
176 <li>german</li>
177 <li>keyword</li>
76d290d @rnewson add perfield analyzer support
authored
178 <li>perfield</li>
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
179 <li>porter</li>
180 <li>russian</li>
181 <li>simple</li>
484bf40 @rnewson add Snowball Analyzer.
authored
182 <li>snowball</li>
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
183 <li>standard</li>
184 <li>thai</li>
fe43cf0 @rnewson add whitespace analyzer.
authored
185 <li>whitespace</li>
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
186 </ul>
187
484bf40 @rnewson add Snowball Analyzer.
authored
188 <h4>The Snowball Analyzer</h4>
189
3124308 @rnewson make link to Snowball Analyzer.
authored
190 This analyzer requires an extra argument to specify the language (see <a href="http://lucene.apache.org/java/3_0_3/api/contrib-snowball/org/apache/lucene/analysis/snowball/SnowballAnalyzer.html">here</a> for details);
484bf40 @rnewson add Snowball Analyzer.
authored
191
192 <pre>
193 "analyzer":"snowball:English"
194 </pre>
195
196 Note: the argument is case-sensitive and is passed directly to the <code>SnowballAnalyzer</code>'s constructor.
197
198 <h4>The Per-field Analyzer"</h4>
199
76d290d @rnewson add perfield analyzer support
authored
200 The "perfield" option lets you use a different analyzer for different fields and is configured as follows;
201
202 <pre>
dda9276 @rnewson more README fixes.
authored
203 "analyzer":"perfield:{field_name:\"analyzer_name\"}"
76d290d @rnewson add perfield analyzer support
authored
204 </pre>
205
86569b1 @rnewson use 'default' consistently.
authored
206 Unless overridden, any field name not specified will be handled by the standard analyzer. To change the default, use the special default field name;
76d290d @rnewson add perfield analyzer support
authored
207
208 <pre>
dda9276 @rnewson more README fixes.
authored
209 "analyzer":"perfield:{default:\"keyword\"}"
76d290d @rnewson add perfield analyzer support
authored
210 </pre>
211
087dcec @rnewson update documentation.
authored
212 <h3>The Document class</h3>
213
214 You may construct a new Document instance with;
215
216 <pre>
217 var doc = new Document();
218 </pre>
219
a40523d @rnewson documentation of future features.
authored
220 Data may be added to this document with the add method which takes an optional second object argument that can override any of the above default values.
087dcec @rnewson update documentation.
authored
221
222 <pre>
a40523d @rnewson documentation of future features.
authored
223 // Add with all the defaults.
224 doc.add("value");
225
d2e1e9e @rnewson add notes about typing.
authored
226 // Add a numeric field.
38ef320 @rnewson fix documentation s/integer/int
authored
227 doc.add(35, {"type":"int"});
d2e1e9e @rnewson add notes about typing.
authored
228
4670582 @rnewson use NumericField. currently broken.
authored
229 // Add a date field.
b4e054e @karmi Added another example of valid format for date string in `date` field ty...
karmi authored
230 doc.add(new Date("1972/1/6 16:05:00"), {"type":"date"});
7f4e703 @rnewson correct new Date(str) example and add matching test.
authored
231 doc.add(new Date("January 6, 1972 16:05:00"), {"type":"date"});
4670582 @rnewson use NumericField. currently broken.
authored
232
d2e1e9e @rnewson add notes about typing.
authored
233 // Add a date field (object must be a Date object
234
a40523d @rnewson documentation of future features.
authored
235 // Add a subject field.
236 doc.add("this is the subject line.", {"field":"subject"});
9a71557 @rnewson formatting
authored
237
a40523d @rnewson documentation of future features.
authored
238 // Add but ensure it's stored.
239 doc.add("value", {"store":"yes"});
9a71557 @rnewson formatting
authored
240
a40523d @rnewson documentation of future features.
authored
241 // Add but don't analyze.
242 doc.add("don't analyze me", {"index":"not_analyzed"});
9a71557 @rnewson formatting
authored
243
b0663ec @rnewson correct misleading .attachment examples (but, hey, one of them was corre...
authored
244 // Extract text from the named attachment and index it to a named field
245 doc.attachment("attachment field", "attachment name");
9f99f19 @rnewson add log.info example
authored
246
247 // log an event (trace, debug, info, warn and error are available)
248 if (doc.foo) {
249 log.info("doc has foo property!");
250 }
087dcec @rnewson update documentation.
authored
251 </pre>
252
ccb81a8 @rnewson add example transforms section.
authored
253 <h3>Example Transforms</h3>
254
390858a @rnewson re-add Index Everything example.
authored
255 <h4>Index Everything</h4>
256
257 <pre>
258 function(doc) {
7bad7dc @rnewson correct syntax error in JS fun.
authored
259 var ret = new Document();
260
261 function idx(obj) {
262 for (var key in obj) {
263 switch (typeof obj[key]) {
264 case 'object':
265 idx(obj[key]);
266 break;
267 case 'function':
268 break;
269 default:
270 ret.add(obj[key]);
271 break;
272 }
273 }
274 };
275
276 idx(doc);
277
278 if (doc._attachments) {
279 for (var i in doc._attachments) {
470171d @rnewson change examples to index attachments into default field.
authored
280 ret.attachment("default", i);
7bad7dc @rnewson correct syntax error in JS fun.
authored
281 }
390858a @rnewson re-add Index Everything example.
authored
282 }
d2e1e9e @rnewson add notes about typing.
authored
283
7bad7dc @rnewson correct syntax error in JS fun.
authored
284 return ret;
390858a @rnewson re-add Index Everything example.
authored
285 }
286 </pre>
287
ccb81a8 @rnewson add example transforms section.
authored
288 <h4>Index Nothing</h4>
289
290 <pre>
291 function(doc) {
292 return null;
293 }
294 </pre>
295
c207a60 @rnewson update README
authored
296 <h4>Index Select Fields</h4>
ccb81a8 @rnewson add example transforms section.
authored
297
298 <pre>
299 function(doc) {
c207a60 @rnewson update README
authored
300 var result = new Document();
c6356fd @rnewson update README.md and TODO to reflect progress.
authored
301 result.add(doc.subject, {"field":"subject", "store":"yes"});
302 result.add(doc.content, {"field":"subject"});
5cfa20c @rnewson fix error in example
authored
303 result.add(new Date(), {"field":"indexed_at"});
c207a60 @rnewson update README
authored
304 return result;
ccb81a8 @rnewson add example transforms section.
authored
305 }
306 </pre>
307
c207a60 @rnewson update README
authored
308 <h4>Index Attachments</h4>
ccb81a8 @rnewson add example transforms section.
authored
309
310 <pre>
311 function(doc) {
c207a60 @rnewson update README
authored
312 var result = new Document();
313 for(var a in doc._attachments) {
470171d @rnewson change examples to index attachments into default field.
authored
314 result.attachment("default", a);
ccb81a8 @rnewson add example transforms section.
authored
315 }
c207a60 @rnewson update README
authored
316 return result;
317 }
318 </pre>
319
320 <h4>A More Complex Example</h4>
321
322 <pre>
323 function(doc) {
324 var mk = function(name, value, group) {
c6356fd @rnewson update README.md and TODO to reflect progress.
authored
325 var ret = new Document();
2946c9a @rnewson fix example.
authored
326 ret.add(value, {"field": group, "store":"yes"});
c6356fd @rnewson update README.md and TODO to reflect progress.
authored
327 ret.add(group, {"field":"group", "store":"yes"});
c207a60 @rnewson update README
authored
328 return ret;
329 };
330 if(doc.type != "reference") return null;
a313b24 @rnewson lots of stuff.
authored
331 var ret = new Array();
c207a60 @rnewson update README
authored
332 for(var g in doc.groups) {
a313b24 @rnewson lots of stuff.
authored
333 ret.push(mk("library", doc.groups[g].library, g));
334 ret.push(mk("method", doc.groups[g].method, g));
335 ret.push(mk("target", doc.groups[g].target, g));
c207a60 @rnewson update README
authored
336 }
337 return ret;
338 }
339 </pre>
b207965 @rnewson improve README readability.
authored
340
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
341 <h2>Attachment Indexing</h2>
342
8059ce0 @rnewson s/couchdb/couchdb-lucene
authored
343 Couchdb-lucene uses <a href="http://lucene.apache.org/tika/">Apache Tika</a> to index attachments of the following types, assuming the correct content_type is set in couchdb;
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
344
ec94e21 @rnewson updated README.md
authored
345 <h3>Supported Formats</h3>
346
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
347 <ul>
348 <li>Excel spreadsheets (application/vnd.ms-excel)
349 <li>HTML (text/html)
350 <li>Images (image/*)
351 <li>Java class files
352 <li>Java jar archives
353 <li>MP3 (audio/mp3)
354 <li>OpenDocument (application/vnd.oasis.opendocument.*)
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
355 <li>Outlook (application/vnd.ms-outlook)
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
356 <li>PDF (application/pdf)
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
357 <li>Plain text (text/plain)
358 <li>Powerpoint presentations (application/vnd.ms-powerpoint)
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
359 <li>RTF (application/rtf)
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
360 <li>Visio (application/vnd.visio)
361 <li>Word documents (application/msword)
362 <li>XML (application/xml)
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
363 </ul>
364
b207965 @rnewson improve README readability.
authored
365 <h1>Searching with couchdb-lucene</h1>
366
2b14b76 @rnewson describ syntax for all forms of numeric range query
authored
367 You can perform all types of queries using Lucene's default <a href="http://lucene.apache.org/java/2_4_0/queryparsersyntax.html">query syntax</a>.
368
369 <h2>Numeric range queries</h2>
370
e0c2000 @rnewson fix grammar, thanks tisba.
authored
371 In addition to normal text-based range searches (using the "field:[lower TO upper]" syntax), couchdb-lucene also supports numeric range searches for the following types: int, long, float, double and date. The type is specified after the field name, as follows;
2b14b76 @rnewson describ syntax for all forms of numeric range query
authored
372
373 <table>
bf989bc @rnewson make type of range query explicit.
authored
374 <tr><td>type</td><td>example</td></tr>
38ef320 @rnewson fix documentation s/integer/int
authored
375 <tr><td>int</td><td>field&lt;int>:[0 TO 100]</td></tr>
20c9be3 @rnewson update docs on numeric range queries
authored
376 <tr><td>long</td><td>field&lt;long>:[0 TO 100]</td></tr>
377 <tr><td>float</td><td>field&lt;float>:[0.0 TO 100.0]</td></tr>
378 <tr><td>double</td><td>field&lt;double>:[0.0 TO 100.0]</td></tr>
c79d771 @rnewson more README fixes.
authored
379 <tr><td>date</td><td>field&lt;date>:[from TO to] where from and to match any of these patterns: <code>"yyyy-MM-dd'T'HH:mm:ssZ"</code>, <code>"yyyy-MM-dd'T'HH:mm:ss"<code>, <code>"yyyy-MM-ddZ"M/code>, <code>"yyyy-MM-dd"</code>, <code>"yyyy-MM-dd'T'HH:mm:ss.SSSZ"</code>, <code>"yyyy-MM-dd'T'HH:mm:ss.SSS"</code>. So, in order to search for articles published in July, you would issue a following query: <code>published_at&lt;date&gt;:["2010-07-01T00:00:00"+TO+"2010-07-31T23:59:59"]</code></td></tr>
2b14b76 @rnewson describ syntax for all forms of numeric range query
authored
380 </table>
381
460f5c6 @rnewson add example spatial/geographical query
authored
382 An example numeric range query for spatial searching.
383
384 <pre>
1217f9f @rnewson update docs on numeric range queries
authored
385 ?q=pizza AND lat&lt;double>:[51.4707 TO 51.5224] AND long&lt;double>:[-0.6622 TO -0.5775]
460f5c6 @rnewson add example spatial/geographical query
authored
386 </pre>
387
fad4eb1 @rnewson document the syntax for numeric terms.
authored
388 <h2>Numeric term queries</h2>
389
15de3b7 @rnewson escape some <>'s
authored
390 Fields indexed with numeric types can still be queried as normal terms, couchdb-lucene just needs to know the type. For example, ?q=age&lt;long&gt;:12 will find all documents where the field called 'age' has a value of 12 (when the field was indexed as "type":"int".
fad4eb1 @rnewson document the syntax for numeric terms.
authored
391
392 <h2>Search parameters</h2>
393
2b14b76 @rnewson describ syntax for all forms of numeric range query
authored
394 The following parameters can be passed for more sophisticated searches;
b207965 @rnewson improve README readability.
authored
395
396 <dl>
70c8bf6 @rnewson document new analyzer query parameter
authored
397 <dt>analyzer</dt><dd>Override the default analyzer used to parse the q parameter</dd>
398 <dt>callback</dt><dd>Specify a JSONP callback wrapper. The full JSON result will be prepended with this parameter and also placed with parentheses."</dd>
71cbc0b @rnewson correct description of debug setting.
authored
399 <dt>debug</dt><dd>Setting this to true disables response caching (the query is executed every time) and indents the JSON response for readability.</dd>
bba040b @rnewson allow default_operator=and. Closes #95
authored
400 <dt>default_operator</dt><dd>Change the default operator for boolean queries. Defaults to "OR", other permitted value is "AND".</dd>
ea19e55 @rnewson add force_json=true to force 'application/json' response type.
authored
401 <dt>force_json<dt><dd>Usually couchdb-lucene determines the Content-Type of its response based on the presence of the Accept header. If Accept contains "application/json", you get "application/json" in the response, otherwise you get "text/plain;charset=utf8". Some tools, like JSONView for FireFox, do not send the Accept header but do render "application/json" responses if received. Setting force_json=true forces all response to "application/json" regardless of the Accept header.</dd>
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
402 <dt>include_docs</dt><dd>whether to include the source docs</dd>
f9c61e3 @rnewson format README
authored
403 <dt>limit</dt><dd>the maximum number of results to return</dd>
caccea4 @rnewson avoid JSON object encoding problem by changing multiple query feature sy...
authored
404 <dt>q</dt><dd>the query to run (e.g, subject:hello). If not specified, the default field is searched. Multiple queries can be supplied, separated by commas; the resulting JSON will be an array of responses.</dd>
f9c61e3 @rnewson format README
authored
405 <dt>skip</dt><dd>the number of results to skip</dd>
81ea4c1 @rnewson more escapin'
authored
406 <dt>sort</dt><dd>the comma-separated fields to sort on. Prefix with / for ascending order and \ for descending order (ascending is the default if not specified). Type-specific sorting is also available by appending the type between angle brackets (e.g, sort=amount&lt;float&gt;). Supported types are 'float', 'double', 'int', 'long' and 'date'.</dd>
1ddad7b @tisba Update the README to reflect current behavior of
tisba authored
407 <dt>stale=ok</dt><dd>If you set the <i>stale</i> option to <i>ok</i>, couchdb-lucene will not block if the index is not up to date and it will immediately return results. Therefore searches may be faster as Lucene caches important data (especially for sorting). A query without stale=ok will block and use the latest data committed to the index. Unlike CouchDBs stale=ok option for views, couchdb-lucene will trigger an index update unless one is already running.</dd>
ad9096f @rnewson tweak README.md
authored
408 </dl>
b207965 @rnewson improve README readability.
authored
409
410 <i>All parameters except 'q' are optional.</i>
411
ec94e21 @rnewson updated README.md
authored
412 <h2>Special Fields</h2>
413
414 <dl>
087dcec @rnewson update documentation.
authored
415 <dt>_id</dt><dd>The _id of the document.</dd>
46a3a37 @rnewson include all DC attributes, if present.
authored
416 </dl>
417
418 <h2>Dublin Core</h2>
419
420 All Dublin Core attributes are indexed and stored if detected in the attachment. Descriptions of the fields come from the Tika javadocs.
421
422 <dl>
6e99faa @rnewson dc. to _dc.
authored
423 <dt>_dc.contributor</dt><dd> An entity responsible for making contributions to the content of the resource.</dd>
424 <dt>_dc.coverage</dt><dd>The extent or scope of the content of the resource.</dd>
425 <dt>_dc.creator</dt><dd>An entity primarily responsible for making the content of the resource.</dd>
426 <dt>_dc.date</dt><dd>A date associated with an event in the life cycle of the resource.</dd>
427 <dt>_dc.description</dt><dd>An account of the content of the resource.</dd>
428 <dt>_dc.format</dt><dd>Typically, Format may include the media-type or dimensions of the resource.</dd>
429 <dt>_dc.identifier</dt><dd>Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system.</dd>
430 <dt>_dc.language</dt><dd>A language of the intellectual content of the resource.</dd>
431 <dt>_dc.modified</dt><dd>Date on which the resource was changed.</dd>
432 <dt>_dc.publisher</dt><dd>An entity responsible for making the resource available.</dd>
433 <dt>_dc.relation</dt><dd>A reference to a related resource.</dd>
434 <dt>_dc.rights</dt><dd>Information about rights held in and over the resource.</dd>
435 <dt>_dc.source</dt><dd>A reference to a resource from which the present resource is derived.</dd>
436 <dt>_dc.subject</dt><dd>The topic of the content of the resource.</dd>
437 <dt>_dc.title</dt><dd>A name given to the resource.</dd>
438 <dt>_dc.type</dt><dd>The nature or genre of the content of the resource.</dd>
ec94e21 @rnewson updated README.md
authored
439 </dl>
440
b207965 @rnewson improve README readability.
authored
441 <h2>Examples</h2>
442
443 <pre>
6b73dc6 @rnewson commit to _design/foo in paths, use regexp to be sure.
authored
444 http://localhost:5984/dbname/_fti/_design/foo/view_name?q=field_name:value
445 http://localhost:5984/dbname/_fti/_design/foo/view_name?q=field_name:value&sort=other_field
15de3b7 @rnewson escape some <>'s
authored
446 http://localhost:5984/dbname/_fti/_design/foo/view_name?debug=true&sort&lt;long&gt;=billing_size&q=body:document AND customer:[A TO C]
b207965 @rnewson improve README readability.
authored
447 </pre>
448
449 <h2>Search Results Format</h2>
450
0fcf578 @rnewson update docs.
authored
451 The search result contains a number of fields at the top level, in addition to your search results.
452
453 <dl>
454 <dt>etag</dt><dd>An opaque token that reflects the current version of the index. This value is also returned in an ETag header to facilitate HTTP caching.</dd>
455 <dt>fetch_duration</dt><dd>The number of milliseconds spent retrieving the documents.</dd>
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
456 <dt>limit</dt><dd>The maximum number of results that can appear.</dd>
457 <dt>q</dt><dd>The query that was executed.</dd>
24591d9 @rnewson docs
authored
458 <dt>rows</dt><dd>The search results array, described below.</dd>
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
459 <dt>search_duration</dt><dd>The number of milliseconds spent performing the search.</dd>
460 <dt>skip</dt><dd>The number of initial matches that was skipped.</dd>
461 <dt>total_rows</dt><dd>The total number of matches for this query.</dd>
0fcf578 @rnewson update docs.
authored
462 </dl>
463
24591d9 @rnewson docs
authored
464 <h2>The search results array</h2>
465
466 The search results arrays consists of zero, one or more objects with the following fields;
0fcf578 @rnewson update docs.
authored
467
468 <dl>
f420bf7 @rnewson support different Analyzer's at index and query time.
authored
469 <dt>doc</dt><dd>The original document from couch, if requested with include_docs=true</dd>
470 <dt>fields</dt><dd>All the fields that were stored with this match</dd>
0fcf578 @rnewson update docs.
authored
471 <dt>id</dt><dd>The unique identifier for this match.</dd>
472 <dt>score</dt><dd>The normalized score (0.0-1.0, inclusive) for this match</dd>
473 </dl>
474
fd16315 @rnewson update README.md
authored
475 Here's an example of a JSON response without sorting;
b207965 @rnewson improve README readability.
authored
476
118d28e @rnewson JSON example output.
authored
477 <pre>
478 {
c6356fd @rnewson update README.md and TODO to reflect progress.
authored
479 "q": "+content:enron",
fd16315 @rnewson update README.md
authored
480 "skip": 0,
481 "limit": 2,
482 "total_rows": 176852,
483 "search_duration": 518,
484 "fetch_duration": 4,
485 "rows": [
486 {
0fcf578 @rnewson update docs.
authored
487 "id": "hain-m-all_documents-257.",
fd16315 @rnewson update README.md
authored
488 "score": 1.601625680923462
489 },
490 {
0fcf578 @rnewson update docs.
authored
491 "id": "hain-m-notes_inbox-257.",
fd16315 @rnewson update README.md
authored
492 "score": 1.601625680923462
493 }
118d28e @rnewson JSON example output.
authored
494 ]
495 }
496 </pre>
497
fd16315 @rnewson update README.md
authored
498 And the same with sorting;
499
118d28e @rnewson JSON example output.
authored
500 <pre>
501 {
0fcf578 @rnewson update docs.
authored
502 "q": "+content:enron",
fd16315 @rnewson update README.md
authored
503 "skip": 0,
504 "limit": 3,
505 "total_rows": 176852,
506 "search_duration": 660,
507 "fetch_duration": 4,
508 "sort_order": [
509 {
510 "field": "source",
511 "reverse": false,
512 "type": "string"
513 },
514 {
515 "reverse": false,
516 "type": "doc"
517 }
118d28e @rnewson JSON example output.
authored
518 ],
fd16315 @rnewson update README.md
authored
519 "rows": [
520 {
0fcf578 @rnewson update docs.
authored
521 "id": "shankman-j-inbox-105.",
fd16315 @rnewson update README.md
authored
522 "score": 0.6131107211112976,
523 "sort_order": [
524 "enron",
525 6
526 ]
527 },
528 {
0fcf578 @rnewson update docs.
authored
529 "id": "shankman-j-inbox-8.",
fd16315 @rnewson update README.md
authored
530 "score": 0.7492915391921997,
531 "sort_order": [
532 "enron",
533 7
534 ]
535 },
536 {
0fcf578 @rnewson update docs.
authored
537 "id": "shankman-j-inbox-30.",
fd16315 @rnewson update README.md
authored
538 "score": 0.507369875907898,
539 "sort_order": [
540 "enron",
541 8
542 ]
543 }
118d28e @rnewson JSON example output.
authored
544 ]
545 }
546 </pre>
547
a4aa4e1 @rnewson document Content-Type response negotiation
authored
548 <h3>Content-Type of response</h3>
549
d2e1e9e @rnewson add notes about typing.
authored
550 The Content-Type of the response is negotiated via the Accept request header like CouchDB itself. If the Accept header includes "application/json" then that is also the Content-Type of the response. If not, "text/plain;charset=utf-8" is used.
a4aa4e1 @rnewson document Content-Type response negotiation
authored
551
139a78c @rnewson add info retrieval.
authored
552 <h1>Fetching information about the index</h1>
553
0cf4941 @rnewson correct info urls.
authored
554 Calling couchdb-lucene without arguments returns a JSON object with information about the index.
139a78c @rnewson add info retrieval.
authored
555
556 <pre>
6b73dc6 @rnewson commit to _design/foo in paths, use regexp to be sure.
authored
557 http://127.0.0.1:5984/&lt;db>/_fti/_design/foo/&lt;index
139a78c @rnewson add info retrieval.
authored
558 </pre>
559
560 returns;
561
562 <pre>
f6ac048 @rnewson line breaks
authored
563 {"current":true,"disk_size":110674,"doc_count":397,"doc_del_count":0,
564 "fields":["default","number"],"last_modified":"1263066382000",
565 "optimized":true,"ref_count":2}
139a78c @rnewson add info retrieval.
authored
566 </pre>
8203af6 @rnewson support _optimize and _expunge calls.
authored
567
568 <h1>Index Maintenance</h1>
569
570 For optimal query speed you can optimize your indexes. This causes the index to be rewritten into a single segment.
571
572 <pre>
6b73dc6 @rnewson commit to _design/foo in paths, use regexp to be sure.
authored
573 curl -X POST http://localhost:5984/&lt;db>/_fti/_design/foo/&lt;index>/_optimize
8203af6 @rnewson support _optimize and _expunge calls.
authored
574 </pre>
575
576 If you just want to expunge pending deletes, then call;
577
578 <pre>
6b73dc6 @rnewson commit to _design/foo in paths, use regexp to be sure.
authored
579 curl -X POST http://localhost:5984/&lt;db>/_fti/_design/foo/&lt;index>/_expunge
8203af6 @rnewson support _optimize and _expunge calls.
authored
580 </pre>
6e66766 @rnewson most of index cleanup work
authored
581
582 If you recreate databases or frequently change your fulltext functions, you will probably have old indexes lying around on disk. To remove all of them, call;
583
584 <pre>
585 curl -X POST http://localhost:5984/&lt;db>/_fti/_cleanup
586 </pre>
378e322 @rnewson add basic authentication for require_valid_user=true users.
authored
587
588 <h1>Authentication</h1>
589
590 By default couchdb-lucene does not attempt to authenticate to CouchDB. If you have set CouchDB's require_valid_user to true, you will need to modify couchdb-lucene.ini. Change the url setting to include a valid username and password. e.g, the default setting is;
591
592 <pre>
593 [local]
594 url=http://localhost:5984/
595 </pre>
596
597 Change it to;
598
599 <pre>
600 [local]
601 url=http://foo:bar@localhost:5984/
602 </pre>
603
604 and couchdb-lucene will authenticate to couchdb.
Something went wrong with that request. Please try again.