Skip to content
Newer
Older
100644 678 lines (527 sloc) 22.5 KB
4ac245f @rnewson Update README.md
authored
1 [![Build Status](https://secure.travis-ci.org/rnewson/couchdb-lucene.png)](http://travis-ci.org/rnewson/couchdb-lucene)
d8c46d2 Add travis build status image
Robert Newson authored
2
6d1c074 add notes about version compatibility.
Robert Newson authored
3 <h1>Version Compatibility</h1>
4
daf007b Simplify compatibily statement
Robert Newson authored
5 CouchDB-Lucene works with all version of CouchDB from 0.10 upwards.
6d1c074 add notes about version compatibility.
Robert Newson authored
6
c6356fd update README.md and TODO to reflect progress.
Robert Newson authored
7 <h1>Issue Tracking</h1>
a785480 lighthouse sucks at formatting anything, abandon ship.
Robert Newson authored
8
7a0d1d3 lighthouse sucks at formatting anything, abandon ship.
Robert Newson authored
9 Issue tracking at <a href="http://github.com/rnewson/couchdb-lucene/issues">github</a>.
5d4e56a update readme.
Robert Newson authored
10
7dc5bac reorg begins.
Robert Newson authored
11 <h1>Minimum System Requirements</h1>
ef3f787 add sysreq for Sun JDK.
Robert Newson authored
12
d2496d8 @rnewson Raise minimum JDK to 7
authored
13 Java 1.7 (or above) is required; Oracle Java 7 or OpenJDK 7 are recommended (55 or higher)
ef3f787 add sysreq for Sun JDK.
Robert Newson authored
14
0c6db37 make unzip requirement more obvious
Robert Newson authored
15 <h1>Build and run couchdb-lucene</h1>
b207965 improve README readability.
Robert Newson authored
16
0ac37b8 add note about brew recipe.
Robert Newson authored
17 If you are on OS X, you might find it easiest to;
18
19 <pre>
20 brew install couchdb-lucene
21 </pre>
22
b207965 improve README readability.
Robert Newson authored
23 <ol>
5a61a35 @moonmaster9000 IF Maven 3 now works with CouchDB Lucene, update README to reflect that.
moonmaster9000 authored
24 <li>Install Maven (2 or 3).
b207965 improve README readability.
Robert Newson authored
25 <li>checkout repository
26 <li>type 'mvn'
0c6db37 make unzip requirement more obvious
Robert Newson authored
27 <li>cd target
2864de2 and escape html entities..
Robert Newson authored
28 <li>unzip couchdb-lucene-&lt;version&gt;.zip
29 <li>cd couchdb-lucene-&lt;version&gt;
0c6db37 make unzip requirement more obvious
Robert Newson authored
30 <li>./bin/run
b207965 improve README readability.
Robert Newson authored
31 </ol>
32
0c6db37 make unzip requirement more obvious
Robert Newson authored
33 The zip file contains all the couchdb-lucene code, dependencies, startup scripts and configuration files you need, so unzip it wherever you wish to install couchdb-lucene.
2d6180b include properties file in zip. still not available to run script, th…
Robert Newson authored
34
1c08058 @cliffano Added documentation about building a .war file.
cliffano authored
35 If you want to run couchdb-lucene on a servlet container like Tomcat, you can build the war file using Maven
36
37 <pre>
38 mvn war:war
39 </pre>
40
b207965 improve README readability.
Robert Newson authored
41 <h1>Configure CouchDB</h1>
42
0c6db37 make unzip requirement more obvious
Robert Newson authored
43 The following settings are needed in CouchDB's local.ini file in order for it to communicate with couchdb-lucene;
44
9e35287 Improve README even moar
Robert Newson authored
45 <h2>Python hook script (for CouchDB versions prior to 1.1)</h2>
b207965 improve README readability.
Robert Newson authored
46 <pre>
0563120 fixes.
Robert Newson authored
47 [couchdb]
48 os_process_timeout=60000 ; increase the timeout from 5 seconds.
49
b207965 improve README readability.
Robert Newson authored
50 [external]
5675f7a fix path
Robert Newson authored
51 fti=/path/to/python /path/to/couchdb-lucene/tools/couchdb-external-hook.py
b207965 improve README readability.
Robert Newson authored
52
53 [httpd_db_handlers]
95b38b9 more html escaping.
Robert Newson authored
54 _fti = {couch_httpd_external, handle_external_req, &lt;&lt;"fti"&gt;&gt;}
b207965 improve README readability.
Robert Newson authored
55 </pre>
56
233442a one for luck
Robert Newson authored
57 <h3>Hook options</h3>
3c9631e @inator Update README.md
inator authored
58 You can pass options to the python script like so:
59 <pre>
60 [external]
61 fti=/path/to/python "/path/to/couchdb-lucene/tools/couchdb-external-hook.py --option-name value"
62 </pre>
cbfe065 remove registry and indexkey, change output paths
Robert Newson authored
63
64 <table>
65 <tr><th>Option</th><th>Meaning</th><th>Default Value</th></tr>
66 <tr><td>--remote-host</td><td>The hostname of the couchdb-lucene server</td><td>localhost</td></tr>
67 <tr><td>--remote-port</td><td>The port of the couchdb-lucene server</td><td>5985</td></tr>
c47c494 updated README to reflect latest Python scripts settings
Robert Newson authored
68 <tr><td>--local-key</td><td>The key for the local couchdb instance as known to the couchdb-lucene server</td><td>local</td></tr>
cbfe065 remove registry and indexkey, change output paths
Robert Newson authored
69 </table>
b047e4a update README.md to reflect (future) start/stop and config changes.
Robert Newson authored
70
9e35287 Improve README even moar
Robert Newson authored
71 <h2>Proxy handler (for CouchDB versions from 1.1 onward)</h2>
546f744 Improve README.
Robert Newson authored
72 <pre>
73 [httpd_global_handlers]
74 _fti = {couch_httpd_proxy, handle_proxy_req, &lt;&lt;"http://127.0.0.1:5985"&gt;&gt;}
75 </pre>
76
cbfe065 remove registry and indexkey, change output paths
Robert Newson authored
77 <h1>Configure couchdb-lucene</h1>
b047e4a update README.md to reflect (future) start/stop and config changes.
Robert Newson authored
78
cbfe065 remove registry and indexkey, change output paths
Robert Newson authored
79 couchdb-lucene runs in a single, standalone JVM. As such, you can choose to locate your couchdb-lucene server on a different machine to couchdb if you wish, or keep it on the same machine, it's your call.
b047e4a update README.md to reflect (future) start/stop and config changes.
Robert Newson authored
80
81 <h1>Start couchdb-lucene</h1>
82
83 To start couchdb-lucene, run;
84 <pre>
2d6180b include properties file in zip. still not available to run script, th…
Robert Newson authored
85 bin/run
b047e4a update README.md to reflect (future) start/stop and config changes.
Robert Newson authored
86 </pre>
87
2d6180b include properties file in zip. still not available to run script, th…
Robert Newson authored
88 To stop couchdb-lucene, simply kill the Java process.
b047e4a update README.md to reflect (future) start/stop and config changes.
Robert Newson authored
89
b207965 improve README readability.
Robert Newson authored
90 <h1>Indexing Strategy</h1>
91
4a60080 use couchdb's content_type rather than auto-detect.
Robert Newson authored
92 <h2>Document Indexing</h2>
93
4c21806 mention that you can return Document[] from an index function.
Robert Newson authored
94 You must supply a index function in order to enable couchdb-lucene as, by default, nothing will be indexed. To suppress a document from the index, return null. It's more typical to return a single Document object which contains everything you'd like to query and retrieve. You may also return an array of Document objects if you wish.
a2e9024 wip
Robert Newson authored
95
437eae9 s/view/fulltext in README.md
Robert Newson authored
96 You may add any number of index views in any number of design documents. All searches will be constrained to documents emitted by the index functions.
c207a60 update README
Robert Newson authored
97
5077366 clarify design document and add matching query URL's.
Robert Newson authored
98 Here's an complete example of a design document with couchdb-lucene features:
a2e9024 wip
Robert Newson authored
99
697884b documentation of future features.
Robert Newson authored
100 <pre>
101 {
3d2fb72 fix example in README.
Robert Newson authored
102 "_id":"_design/foo",
5077366 clarify design document and add matching query URL's.
Robert Newson authored
103 "fulltext": {
104 "by_subject": {
105 "index":"function(doc) { var ret=new Document(); ret.add(doc.subject); return ret }"
106 },
107 "by_content": {
108 "index":"function(doc) { var ret=new Document(); ret.add(doc.content); return ret }"
109 }
697884b documentation of future features.
Robert Newson authored
110 }
111 }
112 </pre>
113
5077366 clarify design document and add matching query URL's.
Robert Newson authored
114 Here are some example URL's for the given design document;
115
9e35287 Improve README even moar
Robert Newson authored
116 <h2>Using the Python hook script</h2>
5077366 clarify design document and add matching query URL's.
Robert Newson authored
117 <pre>
6b73dc6 commit to _design/foo in paths, use regexp to be sure.
Robert Newson authored
118 http://localhost:5984/database/_fti/_design/foo/by_subject?q=hello
119 http://localhost:5984/database/_fti/_design/foo/by_content?q=hello
5077366 clarify design document and add matching query URL's.
Robert Newson authored
120 </pre>
121
9e35287 Improve README even moar
Robert Newson authored
122 <h2>Using the proxy handler</h2>
546f744 Improve README.
Robert Newson authored
123 <pre>
124 http://localhost:5984/_fti/local/database/_design/foo/by_subject?q=hello
125 http://localhost:5984/_fti/local/database/_design/foo/by_content?q=hello
126 </pre>
127
697884b documentation of future features.
Robert Newson authored
128 A fulltext object contains multiple index view declarations. An index view consists of;
129
130 <dl>
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
131 <dt>analyzer</dt><dd>(optional) The analyzer to use</dd>
132 <dt>defaults</dt><dd>(optional) The default for numerous indexing options can be overridden here. A full list of options follows.</dd>
697884b documentation of future features.
Robert Newson authored
133 <dt>index</dt><dd>The indexing function itself, documented below.</dd>
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
134 </dl>
697884b documentation of future features.
Robert Newson authored
135
136 <h3>The Defaults Object</h3>
137
138 The following indexing options can be defaulted;
139
140 <table>
141 <tr>
142 <th>name</th>
143 <th>description</th>
144 <th>available options</th>
145 <th>default</th>
146 </tr>
147 <tr>
a40523d documentation of future features.
Robert Newson authored
148 <th>field</th>
149 <td>the field name to index under</td>
150 <td>user-defined</td>
151 <td>default</td>
d2e1e9e add notes about typing.
Robert Newson authored
152 </tr>
153 <tr>
154 <th>type</th>
155 <td>the type of the field</td>
38ef320 fix documentation s/integer/int
Robert Newson authored
156 <td>date, double, float, int, long, string</td>
d2e1e9e add notes about typing.
Robert Newson authored
157 <td>string</td>
158 </tr>
a40523d documentation of future features.
Robert Newson authored
159 <tr>
697884b documentation of future features.
Robert Newson authored
160 <th>store</th>
f16fc9c docs
Robert Newson authored
161 <td>whether the data is stored. The value will be returned in the search result.</td>
697884b documentation of future features.
Robert Newson authored
162 <td>yes, no</td>
163 <td>no</td>
d2e1e9e add notes about typing.
Robert Newson authored
164 </tr>
697884b documentation of future features.
Robert Newson authored
165 <tr>
166 <th>index</th>
167 <td>whether (and how) the data is indexed</td>
8328332 typo
Robert Newson authored
168 <td>analyzed, analyzed_no_norms, no, not_analyzed, not_analyzed_no_norms</td>
697884b documentation of future features.
Robert Newson authored
169 <td>analyzed</td>
d2e1e9e add notes about typing.
Robert Newson authored
170 </tr>
f6bfce8 add EXPERIMENTAL index-time field boosting feature.
Robert Newson authored
171 <tr>
b00fb7f add termvector to the possible options for Field.
Santiago M. Mola authored
172 <th>termvector</th>
173 <td>whether and how a field should have term vectors</td>
174 <td>no, with_offsets, with_positions, with_positions_offsets, yes</td>
175 <td>no</td>
176 </tr>
177 <tr>
f6bfce8 add EXPERIMENTAL index-time field boosting feature.
Robert Newson authored
178 <th>boost</th>
179 <td>Sets the boost factor hits on this field. This value will be multiplied into the score of all hits on this this field of this document.</td>
180 <td>floating-point value</td>
181 <td>1.0</td>
182 </tr>
697884b documentation of future features.
Robert Newson authored
183 </table>
087dcec update documentation.
Robert Newson authored
184
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
185 <h3>The Analyzer Option</h3>
186
187 Lucene has numerous ways of converting free-form text into tokens, these classes are called Analyzer's. By default, the StandardAnalyzer is used which lower-cases all text, drops common English words ("the", "and", and so on), among other things. This processing might not always suit you, so you can choose from several others by setting the "analyzer" field to one of the following values;
188
189 <ul>
190 <li>brazilian</li>
191 <li>chinese</li>
192 <li>cjk</li>
193 <li>czech</li>
194 <li>dutch</li>
195 <li>english</li>
196 <li>french</li>
197 <li>german</li>
198 <li>keyword</li>
76d290d add perfield analyzer support
Robert Newson authored
199 <li>perfield</li>
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
200 <li>porter</li>
201 <li>russian</li>
202 <li>simple</li>
484bf40 add Snowball Analyzer.
Robert Newson authored
203 <li>snowball</li>
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
204 <li>standard</li>
205 <li>thai</li>
fe43cf0 add whitespace analyzer.
Robert Newson authored
206 <li>whitespace</li>
0ffa929 @rnewson document the ngram filter in README
authored
207 <li>ngram</li>
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
208 </ul>
209
484bf40 add Snowball Analyzer.
Robert Newson authored
210 <h4>The Snowball Analyzer</h4>
211
3124308 make link to Snowball Analyzer.
Robert Newson authored
212 This analyzer requires an extra argument to specify the language (see <a href="http://lucene.apache.org/java/3_0_3/api/contrib-snowball/org/apache/lucene/analysis/snowball/SnowballAnalyzer.html">here</a> for details);
484bf40 add Snowball Analyzer.
Robert Newson authored
213
214 <pre>
215 "analyzer":"snowball:English"
216 </pre>
217
218 Note: the argument is case-sensitive and is passed directly to the <code>SnowballAnalyzer</code>'s constructor.
219
0ffa929 @rnewson document the ngram filter in README
authored
220 <h4>The Per-field Analyzer</h4>
484bf40 add Snowball Analyzer.
Robert Newson authored
221
76d290d add perfield analyzer support
Robert Newson authored
222 The "perfield" option lets you use a different analyzer for different fields and is configured as follows;
223
224 <pre>
dda9276 more README fixes.
Robert Newson authored
225 "analyzer":"perfield:{field_name:\"analyzer_name\"}"
76d290d add perfield analyzer support
Robert Newson authored
226 </pre>
227
86569b1 use 'default' consistently.
Robert Newson authored
228 Unless overridden, any field name not specified will be handled by the standard analyzer. To change the default, use the special default field name;
76d290d add perfield analyzer support
Robert Newson authored
229
230 <pre>
dda9276 more README fixes.
Robert Newson authored
231 "analyzer":"perfield:{default:\"keyword\"}"
76d290d add perfield analyzer support
Robert Newson authored
232 </pre>
233
0ffa929 @rnewson document the ngram filter in README
authored
234 <h4>The Ngram Analyzer</h4>
235
236 The "ngram" analyzer lets you break down the output of any other analyzer into ngrams ("foo" becomes "fo" and "oo").
237
238 <pre>
239 "analyzer":"ngram:{analyzer:\"simple\",min:2,max:3}"
240 </pre>
241
242 If not specified, the delegated analyzer is "standard" and min and max ngram sizes are 1 and 2 respectively.
243
087dcec update documentation.
Robert Newson authored
244 <h3>The Document class</h3>
245
246 You may construct a new Document instance with;
247
248 <pre>
249 var doc = new Document();
250 </pre>
251
a40523d documentation of future features.
Robert Newson authored
252 Data may be added to this document with the add method which takes an optional second object argument that can override any of the above default values.
087dcec update documentation.
Robert Newson authored
253
254 <pre>
a40523d documentation of future features.
Robert Newson authored
255 // Add with all the defaults.
256 doc.add("value");
257
d2e1e9e add notes about typing.
Robert Newson authored
258 // Add a numeric field.
38ef320 fix documentation s/integer/int
Robert Newson authored
259 doc.add(35, {"type":"int"});
d2e1e9e add notes about typing.
Robert Newson authored
260
4670582 use NumericField. currently broken.
Robert Newson authored
261 // Add a date field.
b4e054e @karmi Added another example of valid format for date string in `date` field…
karmi authored
262 doc.add(new Date("1972/1/6 16:05:00"), {"type":"date"});
7f4e703 correct new Date(str) example and add matching test.
Robert Newson authored
263 doc.add(new Date("January 6, 1972 16:05:00"), {"type":"date"});
4670582 use NumericField. currently broken.
Robert Newson authored
264
d2e1e9e add notes about typing.
Robert Newson authored
265 // Add a date field (object must be a Date object
266
a40523d documentation of future features.
Robert Newson authored
267 // Add a subject field.
268 doc.add("this is the subject line.", {"field":"subject"});
9a71557 formatting
Robert Newson authored
269
a40523d documentation of future features.
Robert Newson authored
270 // Add but ensure it's stored.
271 doc.add("value", {"store":"yes"});
9a71557 formatting
Robert Newson authored
272
a40523d documentation of future features.
Robert Newson authored
273 // Add but don't analyze.
274 doc.add("don't analyze me", {"index":"not_analyzed"});
9a71557 formatting
Robert Newson authored
275
b0663ec correct misleading .attachment examples (but, hey, one of them was co…
Robert Newson authored
276 // Extract text from the named attachment and index it to a named field
277 doc.attachment("attachment field", "attachment name");
9f99f19 add log.info example
Robert Newson authored
278
279 // log an event (trace, debug, info, warn and error are available)
280 if (doc.foo) {
281 log.info("doc has foo property!");
282 }
087dcec update documentation.
Robert Newson authored
283 </pre>
284
2d96db8 remove misleading 'transforms' term.
Robert Newson authored
285 <h3>Example Index Functions</h3>
ccb81a8 add example transforms section.
Robert Newson authored
286
390858a re-add Index Everything example.
Robert Newson authored
287 <h4>Index Everything</h4>
288
289 <pre>
290 function(doc) {
7bad7dc correct syntax error in JS fun.
Robert Newson authored
291 var ret = new Document();
292
293 function idx(obj) {
294 for (var key in obj) {
295 switch (typeof obj[key]) {
296 case 'object':
297 idx(obj[key]);
298 break;
299 case 'function':
300 break;
301 default:
302 ret.add(obj[key]);
303 break;
304 }
305 }
306 };
307
308 idx(doc);
309
310 if (doc._attachments) {
311 for (var i in doc._attachments) {
470171d change examples to index attachments into default field.
Robert Newson authored
312 ret.attachment("default", i);
7bad7dc correct syntax error in JS fun.
Robert Newson authored
313 }
390858a re-add Index Everything example.
Robert Newson authored
314 }
d2e1e9e add notes about typing.
Robert Newson authored
315
7bad7dc correct syntax error in JS fun.
Robert Newson authored
316 return ret;
390858a re-add Index Everything example.
Robert Newson authored
317 }
318 </pre>
319
ccb81a8 add example transforms section.
Robert Newson authored
320 <h4>Index Nothing</h4>
321
322 <pre>
323 function(doc) {
324 return null;
325 }
326 </pre>
327
c207a60 update README
Robert Newson authored
328 <h4>Index Select Fields</h4>
ccb81a8 add example transforms section.
Robert Newson authored
329
330 <pre>
331 function(doc) {
c207a60 update README
Robert Newson authored
332 var result = new Document();
c6356fd update README.md and TODO to reflect progress.
Robert Newson authored
333 result.add(doc.subject, {"field":"subject", "store":"yes"});
334 result.add(doc.content, {"field":"subject"});
5cfa20c fix error in example
Robert Newson authored
335 result.add(new Date(), {"field":"indexed_at"});
c207a60 update README
Robert Newson authored
336 return result;
ccb81a8 add example transforms section.
Robert Newson authored
337 }
338 </pre>
339
c207a60 update README
Robert Newson authored
340 <h4>Index Attachments</h4>
ccb81a8 add example transforms section.
Robert Newson authored
341
342 <pre>
343 function(doc) {
c207a60 update README
Robert Newson authored
344 var result = new Document();
345 for(var a in doc._attachments) {
470171d change examples to index attachments into default field.
Robert Newson authored
346 result.attachment("default", a);
ccb81a8 add example transforms section.
Robert Newson authored
347 }
c207a60 update README
Robert Newson authored
348 return result;
349 }
350 </pre>
351
352 <h4>A More Complex Example</h4>
353
354 <pre>
355 function(doc) {
356 var mk = function(name, value, group) {
c6356fd update README.md and TODO to reflect progress.
Robert Newson authored
357 var ret = new Document();
2946c9a fix example.
Robert Newson authored
358 ret.add(value, {"field": group, "store":"yes"});
c6356fd update README.md and TODO to reflect progress.
Robert Newson authored
359 ret.add(group, {"field":"group", "store":"yes"});
c207a60 update README
Robert Newson authored
360 return ret;
361 };
362 if(doc.type != "reference") return null;
a313b24 lots of stuff.
Robert Newson authored
363 var ret = new Array();
c207a60 update README
Robert Newson authored
364 for(var g in doc.groups) {
a313b24 lots of stuff.
Robert Newson authored
365 ret.push(mk("library", doc.groups[g].library, g));
366 ret.push(mk("method", doc.groups[g].method, g));
367 ret.push(mk("target", doc.groups[g].target, g));
c207a60 update README
Robert Newson authored
368 }
369 return ret;
370 }
371 </pre>
b207965 improve README readability.
Robert Newson authored
372
4a60080 use couchdb's content_type rather than auto-detect.
Robert Newson authored
373 <h2>Attachment Indexing</h2>
374
8059ce0 s/couchdb/couchdb-lucene
Robert Newson authored
375 Couchdb-lucene uses <a href="http://lucene.apache.org/tika/">Apache Tika</a> to index attachments of the following types, assuming the correct content_type is set in couchdb;
4a60080 use couchdb's content_type rather than auto-detect.
Robert Newson authored
376
ec94e21 updated README.md
Robert Newson authored
377 <h3>Supported Formats</h3>
378
4a60080 use couchdb's content_type rather than auto-detect.
Robert Newson authored
379 <ul>
380 <li>Excel spreadsheets (application/vnd.ms-excel)
381 <li>HTML (text/html)
382 <li>Images (image/*)
383 <li>Java class files
384 <li>Java jar archives
385 <li>MP3 (audio/mp3)
386 <li>OpenDocument (application/vnd.oasis.opendocument.*)
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
387 <li>Outlook (application/vnd.ms-outlook)
4a60080 use couchdb's content_type rather than auto-detect.
Robert Newson authored
388 <li>PDF (application/pdf)
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
389 <li>Plain text (text/plain)
390 <li>Powerpoint presentations (application/vnd.ms-powerpoint)
4a60080 use couchdb's content_type rather than auto-detect.
Robert Newson authored
391 <li>RTF (application/rtf)
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
392 <li>Visio (application/vnd.visio)
393 <li>Word documents (application/msword)
394 <li>XML (application/xml)
4a60080 use couchdb's content_type rather than auto-detect.
Robert Newson authored
395 </ul>
396
b207965 improve README readability.
Robert Newson authored
397 <h1>Searching with couchdb-lucene</h1>
398
e711b75 point to 3.6.2
Robert Newson authored
399 You can perform all types of queries using Lucene's default <a href="http://lucene.apache.org/java/3_6_2/queryparsersyntax.html">query syntax</a>.
2b14b76 describ syntax for all forms of numeric range query
Robert Newson authored
400
401 <h2>Numeric range queries</h2>
402
e0c2000 fix grammar, thanks tisba.
Robert Newson authored
403 In addition to normal text-based range searches (using the "field:[lower TO upper]" syntax), couchdb-lucene also supports numeric range searches for the following types: int, long, float, double and date. The type is specified after the field name, as follows;
2b14b76 describ syntax for all forms of numeric range query
Robert Newson authored
404
405 <table>
bf989bc make type of range query explicit.
Robert Newson authored
406 <tr><td>type</td><td>example</td></tr>
38ef320 fix documentation s/integer/int
Robert Newson authored
407 <tr><td>int</td><td>field&lt;int>:[0 TO 100]</td></tr>
20c9be3 update docs on numeric range queries
Robert Newson authored
408 <tr><td>long</td><td>field&lt;long>:[0 TO 100]</td></tr>
409 <tr><td>float</td><td>field&lt;float>:[0.0 TO 100.0]</td></tr>
410 <tr><td>double</td><td>field&lt;double>:[0.0 TO 100.0]</td></tr>
7bd670c fix html tags
Robert Newson authored
411 <tr><td>date</td><td>field&lt;date>:[from TO to] where from and to match any of these patterns: <code>"yyyy-MM-dd'T'HH:mm:ssZ"</code>, <code>"yyyy-MM-dd'T'HH:mm:ss"<code>, <code>"yyyy-MM-ddZ"</code>, <code>"yyyy-MM-dd"</code>, <code>"yyyy-MM-dd'T'HH:mm:ss.SSSZ"</code>, <code>"yyyy-MM-dd'T'HH:mm:ss.SSS"</code>. So, in order to search for articles published in July, you would issue a following query: <code>published_at&lt;date&gt;:["2010-07-01T00:00:00"+TO+"2010-07-31T23:59:59"]</code></td></tr>
2b14b76 describ syntax for all forms of numeric range query
Robert Newson authored
412 </table>
413
460f5c6 add example spatial/geographical query
Robert Newson authored
414 An example numeric range query for spatial searching.
415
416 <pre>
1217f9f update docs on numeric range queries
Robert Newson authored
417 ?q=pizza AND lat&lt;double>:[51.4707 TO 51.5224] AND long&lt;double>:[-0.6622 TO -0.5775]
460f5c6 add example spatial/geographical query
Robert Newson authored
418 </pre>
419
fad4eb1 document the syntax for numeric terms.
Robert Newson authored
420 <h2>Numeric term queries</h2>
421
15de3b7 escape some <>'s
Robert Newson authored
422 Fields indexed with numeric types can still be queried as normal terms, couchdb-lucene just needs to know the type. For example, ?q=age&lt;long&gt;:12 will find all documents where the field called 'age' has a value of 12 (when the field was indexed as "type":"int".
fad4eb1 document the syntax for numeric terms.
Robert Newson authored
423
7a2fa62 Support POST for queries
Robert Newson authored
424 <h2>Search methods</h2>
425
426 You may use HTTP GET or POST. For POST, use application/x-www-form-urlencoded format.
427
fad4eb1 document the syntax for numeric terms.
Robert Newson authored
428 <h2>Search parameters</h2>
429
2b14b76 describ syntax for all forms of numeric range query
Robert Newson authored
430 The following parameters can be passed for more sophisticated searches;
b207965 improve README readability.
Robert Newson authored
431
432 <dl>
70c8bf6 document new analyzer query parameter
Robert Newson authored
433 <dt>analyzer</dt><dd>Override the default analyzer used to parse the q parameter</dd>
434 <dt>callback</dt><dd>Specify a JSONP callback wrapper. The full JSON result will be prepended with this parameter and also placed with parentheses."</dd>
71cbc0b correct description of debug setting.
Robert Newson authored
435 <dt>debug</dt><dd>Setting this to true disables response caching (the query is executed every time) and indents the JSON response for readability.</dd>
bba040b allow default_operator=and. Closes #95
Robert Newson authored
436 <dt>default_operator</dt><dd>Change the default operator for boolean queries. Defaults to "OR", other permitted value is "AND".</dd>
ea19e55 add force_json=true to force 'application/json' response type.
Robert Newson authored
437 <dt>force_json<dt><dd>Usually couchdb-lucene determines the Content-Type of its response based on the presence of the Accept header. If Accept contains "application/json", you get "application/json" in the response, otherwise you get "text/plain;charset=utf8". Some tools, like JSONView for FireFox, do not send the Accept header but do render "application/json" responses if received. Setting force_json=true forces all response to "application/json" regardless of the Accept header.</dd>
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
438 <dt>include_docs</dt><dd>whether to include the source docs</dd>
c9d4f76 Add include_fields option to return a chosen subset of stored fields.…
Robert Newson authored
439 <dt>include_fields</dt><dd>By default, <i>all</i> stored fields are returned with results. Use a comma-separate list of field names with this parameter to refine the response</dd>
8194a9a @nesteffe Update documentation to reflect new highlighter features and result s…
nesteffe authored
440 <dt>highlights</dt><dd>Number of highlights to include with results. Default is <i>0</i>. This uses the <i>fast-vector-highlighter</i> plugin.</dd>
441 <dt>highlight_length</dt><dd>Number of characters to include in a highlight row. Default and minimum is <i>18</i>.</dd>
3978049 @emig add default value for 'limit' request param
emig authored
442 <dt>limit</dt><dd>the maximum number of results to return. Default is <i>25</i>.</dd>
caccea4 avoid JSON object encoding problem by changing multiple query feature…
Robert Newson authored
443 <dt>q</dt><dd>the query to run (e.g, subject:hello). If not specified, the default field is searched. Multiple queries can be supplied, separated by commas; the resulting JSON will be an array of responses.</dd>
f9c61e3 format README
Robert Newson authored
444 <dt>skip</dt><dd>the number of results to skip</dd>
81ea4c1 more escapin'
Robert Newson authored
445 <dt>sort</dt><dd>the comma-separated fields to sort on. Prefix with / for ascending order and \ for descending order (ascending is the default if not specified). Type-specific sorting is also available by appending the type between angle brackets (e.g, sort=amount&lt;float&gt;). Supported types are 'float', 'double', 'int', 'long' and 'date'.</dd>
1ddad7b @tisba Update the README to reflect current behavior of
tisba authored
446 <dt>stale=ok</dt><dd>If you set the <i>stale</i> option to <i>ok</i>, couchdb-lucene will not block if the index is not up to date and it will immediately return results. Therefore searches may be faster as Lucene caches important data (especially for sorting). A query without stale=ok will block and use the latest data committed to the index. Unlike CouchDBs stale=ok option for views, couchdb-lucene will trigger an index update unless one is already running.</dd>
ad9096f tweak README.md
Robert Newson authored
447 </dl>
b207965 improve README readability.
Robert Newson authored
448
449 <i>All parameters except 'q' are optional.</i>
450
ec94e21 updated README.md
Robert Newson authored
451 <h2>Special Fields</h2>
452
453 <dl>
087dcec update documentation.
Robert Newson authored
454 <dt>_id</dt><dd>The _id of the document.</dd>
46a3a37 include all DC attributes, if present.
Robert Newson authored
455 </dl>
456
457 <h2>Dublin Core</h2>
458
459 All Dublin Core attributes are indexed and stored if detected in the attachment. Descriptions of the fields come from the Tika javadocs.
460
461 <dl>
6e99faa dc. to _dc.
Robert Newson authored
462 <dt>_dc.contributor</dt><dd> An entity responsible for making contributions to the content of the resource.</dd>
463 <dt>_dc.coverage</dt><dd>The extent or scope of the content of the resource.</dd>
464 <dt>_dc.creator</dt><dd>An entity primarily responsible for making the content of the resource.</dd>
465 <dt>_dc.date</dt><dd>A date associated with an event in the life cycle of the resource.</dd>
466 <dt>_dc.description</dt><dd>An account of the content of the resource.</dd>
467 <dt>_dc.format</dt><dd>Typically, Format may include the media-type or dimensions of the resource.</dd>
468 <dt>_dc.identifier</dt><dd>Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system.</dd>
469 <dt>_dc.language</dt><dd>A language of the intellectual content of the resource.</dd>
470 <dt>_dc.modified</dt><dd>Date on which the resource was changed.</dd>
471 <dt>_dc.publisher</dt><dd>An entity responsible for making the resource available.</dd>
472 <dt>_dc.relation</dt><dd>A reference to a related resource.</dd>
473 <dt>_dc.rights</dt><dd>Information about rights held in and over the resource.</dd>
474 <dt>_dc.source</dt><dd>A reference to a resource from which the present resource is derived.</dd>
475 <dt>_dc.subject</dt><dd>The topic of the content of the resource.</dd>
476 <dt>_dc.title</dt><dd>A name given to the resource.</dd>
477 <dt>_dc.type</dt><dd>The nature or genre of the content of the resource.</dd>
ec94e21 updated README.md
Robert Newson authored
478 </dl>
479
b207965 improve README readability.
Robert Newson authored
480 <h2>Examples</h2>
481
9e35287 Improve README even moar
Robert Newson authored
482 <h2>Using the Python hook script</h2>
b207965 improve README readability.
Robert Newson authored
483 <pre>
6b73dc6 commit to _design/foo in paths, use regexp to be sure.
Robert Newson authored
484 http://localhost:5984/dbname/_fti/_design/foo/view_name?q=field_name:value
485 http://localhost:5984/dbname/_fti/_design/foo/view_name?q=field_name:value&sort=other_field
4ae6c50 @stefankoegl <int> type specifier belongs to the field, not the "sort" parameter
stefankoegl authored
486 http://localhost:5984/dbname/_fti/_design/foo/view_name?debug=true&sort=billing_size&lt;long&gt;&q=body:document AND customer:[A TO C]
b207965 improve README readability.
Robert Newson authored
487 </pre>
488
9e35287 Improve README even moar
Robert Newson authored
489 <h2>Using the proxy handler</h2>
546f744 Improve README.
Robert Newson authored
490 <pre>
491 http://localhost:5984/_fti/local/dbname/_design/foo/view_name?q=field_name:value
492 http://localhost:5984/_fti/local/dbname/_design/foo/view_name?q=field_name:value&sort=other_field
493 http://localhost:5984/_fti/local/dbname/_design/foo/view_name?debug=true&sort=billing_size&lt;long&gt;&q=body:document AND customer:[A TO C]
494 </pre>
495
b207965 improve README readability.
Robert Newson authored
496 <h2>Search Results Format</h2>
497
0fcf578 update docs.
Robert Newson authored
498 The search result contains a number of fields at the top level, in addition to your search results.
499
500 <dl>
501 <dt>etag</dt><dd>An opaque token that reflects the current version of the index. This value is also returned in an ETag header to facilitate HTTP caching.</dd>
502 <dt>fetch_duration</dt><dd>The number of milliseconds spent retrieving the documents.</dd>
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
503 <dt>limit</dt><dd>The maximum number of results that can appear.</dd>
504 <dt>q</dt><dd>The query that was executed.</dd>
24591d9 docs
Robert Newson authored
505 <dt>rows</dt><dd>The search results array, described below.</dd>
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
506 <dt>search_duration</dt><dd>The number of milliseconds spent performing the search.</dd>
507 <dt>skip</dt><dd>The number of initial matches that was skipped.</dd>
508 <dt>total_rows</dt><dd>The total number of matches for this query.</dd>
0fcf578 update docs.
Robert Newson authored
509 </dl>
510
24591d9 docs
Robert Newson authored
511 <h2>The search results array</h2>
512
513 The search results arrays consists of zero, one or more objects with the following fields;
0fcf578 update docs.
Robert Newson authored
514
515 <dl>
f420bf7 support different Analyzer's at index and query time.
Robert Newson authored
516 <dt>doc</dt><dd>The original document from couch, if requested with include_docs=true</dd>
517 <dt>fields</dt><dd>All the fields that were stored with this match</dd>
0fcf578 update docs.
Robert Newson authored
518 <dt>id</dt><dd>The unique identifier for this match.</dd>
519 <dt>score</dt><dd>The normalized score (0.0-1.0, inclusive) for this match</dd>
520 </dl>
521
fd16315 update README.md
Robert Newson authored
522 Here's an example of a JSON response without sorting;
b207965 improve README readability.
Robert Newson authored
523
118d28e JSON example output.
Robert Newson authored
524 <pre>
525 {
c6356fd update README.md and TODO to reflect progress.
Robert Newson authored
526 "q": "+content:enron",
fd16315 update README.md
Robert Newson authored
527 "skip": 0,
528 "limit": 2,
529 "total_rows": 176852,
530 "search_duration": 518,
531 "fetch_duration": 4,
532 "rows": [
533 {
0fcf578 update docs.
Robert Newson authored
534 "id": "hain-m-all_documents-257.",
fd16315 update README.md
Robert Newson authored
535 "score": 1.601625680923462
536 },
537 {
0fcf578 update docs.
Robert Newson authored
538 "id": "hain-m-notes_inbox-257.",
fd16315 update README.md
Robert Newson authored
539 "score": 1.601625680923462
540 }
118d28e JSON example output.
Robert Newson authored
541 ]
542 }
543 </pre>
544
fd16315 update README.md
Robert Newson authored
545 And the same with sorting;
546
118d28e JSON example output.
Robert Newson authored
547 <pre>
548 {
0fcf578 update docs.
Robert Newson authored
549 "q": "+content:enron",
fd16315 update README.md
Robert Newson authored
550 "skip": 0,
551 "limit": 3,
552 "total_rows": 176852,
553 "search_duration": 660,
554 "fetch_duration": 4,
555 "sort_order": [
556 {
557 "field": "source",
558 "reverse": false,
559 "type": "string"
560 },
561 {
562 "reverse": false,
563 "type": "doc"
564 }
118d28e JSON example output.
Robert Newson authored
565 ],
fd16315 update README.md
Robert Newson authored
566 "rows": [
567 {
0fcf578 update docs.
Robert Newson authored
568 "id": "shankman-j-inbox-105.",
fd16315 update README.md
Robert Newson authored
569 "score": 0.6131107211112976,
570 "sort_order": [
571 "enron",
572 6
573 ]
574 },
575 {
0fcf578 update docs.
Robert Newson authored
576 "id": "shankman-j-inbox-8.",
fd16315 update README.md
Robert Newson authored
577 "score": 0.7492915391921997,
578 "sort_order": [
579 "enron",
580 7
581 ]
582 },
583 {
0fcf578 update docs.
Robert Newson authored
584 "id": "shankman-j-inbox-30.",
fd16315 update README.md
Robert Newson authored
585 "score": 0.507369875907898,
586 "sort_order": [
587 "enron",
588 8
589 ]
590 }
118d28e JSON example output.
Robert Newson authored
591 ]
592 }
593 </pre>
594
a4aa4e1 document Content-Type response negotiation
Robert Newson authored
595 <h3>Content-Type of response</h3>
596
d2e1e9e add notes about typing.
Robert Newson authored
597 The Content-Type of the response is negotiated via the Accept request header like CouchDB itself. If the Accept header includes "application/json" then that is also the Content-Type of the response. If not, "text/plain;charset=utf-8" is used.
a4aa4e1 document Content-Type response negotiation
Robert Newson authored
598
139a78c add info retrieval.
Robert Newson authored
599 <h1>Fetching information about the index</h1>
600
0cf4941 correct info urls.
Robert Newson authored
601 Calling couchdb-lucene without arguments returns a JSON object with information about the index.
139a78c add info retrieval.
Robert Newson authored
602
603 <pre>
0b94022 add missing gt;
Robert Newson authored
604 http://127.0.0.1:5984/&lt;db>/_fti/_design/foo/&lt;index&gt;
139a78c add info retrieval.
Robert Newson authored
605 </pre>
606
607 returns;
608
609 <pre>
f6ac048 line breaks
Robert Newson authored
610 {"current":true,"disk_size":110674,"doc_count":397,"doc_del_count":0,
611 "fields":["default","number"],"last_modified":"1263066382000",
612 "optimized":true,"ref_count":2}
139a78c add info retrieval.
Robert Newson authored
613 </pre>
8203af6 support _optimize and _expunge calls.
Robert Newson authored
614
615 <h1>Index Maintenance</h1>
616
617 For optimal query speed you can optimize your indexes. This causes the index to be rewritten into a single segment.
618
619 <pre>
6b73dc6 commit to _design/foo in paths, use regexp to be sure.
Robert Newson authored
620 curl -X POST http://localhost:5984/&lt;db>/_fti/_design/foo/&lt;index>/_optimize
8203af6 support _optimize and _expunge calls.
Robert Newson authored
621 </pre>
622
623 If you just want to expunge pending deletes, then call;
624
625 <pre>
6b73dc6 commit to _design/foo in paths, use regexp to be sure.
Robert Newson authored
626 curl -X POST http://localhost:5984/&lt;db>/_fti/_design/foo/&lt;index>/_expunge
8203af6 support _optimize and _expunge calls.
Robert Newson authored
627 </pre>
6e66766 most of index cleanup work
Robert Newson authored
628
629 If you recreate databases or frequently change your fulltext functions, you will probably have old indexes lying around on disk. To remove all of them, call;
630
631 <pre>
632 curl -X POST http://localhost:5984/&lt;db>/_fti/_cleanup
633 </pre>
378e322 add basic authentication for require_valid_user=true users.
Robert Newson authored
634
635 <h1>Authentication</h1>
636
637 By default couchdb-lucene does not attempt to authenticate to CouchDB. If you have set CouchDB's require_valid_user to true, you will need to modify couchdb-lucene.ini. Change the url setting to include a valid username and password. e.g, the default setting is;
638
639 <pre>
640 [local]
641 url=http://localhost:5984/
642 </pre>
643
644 Change it to;
645
646 <pre>
647 [local]
648 url=http://foo:bar@localhost:5984/
649 </pre>
650
651 and couchdb-lucene will authenticate to couchdb.
958d492 Document some new options
Robert Newson authored
652
653 <h1>Other Tricks</h1>
654
655 A couple of 'expert' options can be set in the couchdb-lucene.ini file;
656
657 Leading wildcards are prohibited by default as they perform very poorly most of the time. You can enable them as follows;
658
659 <pre>
660 [lucene]
661 allowLeadingWildcard=true
662 </pre>
663
664 Lucene automatically converts terms to lower case in wildcard situations. You can disable this with;
665
666 <pre>
667 [lucene]
668 lowercaseExpandedTerms=false
669 </pre>
d72fe8e Add the option to have _changes feeds time out
Robert Newson authored
670
671 CouchDB-Lucene will keep your indexes up to date automatically but this consumes resources (network sockets). You can ask CouchDB-Lucene to stop updating an index after a timeout with;
672
673 <pre>
674 [lucene]
675 changes_timeout = 60000
676 </pre>
bb9e4e5 @rnewson noop
authored
677
Something went wrong with that request. Please try again.