Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 145 lines (112 sloc) 4.429 kb
5220b65 @rnewson tweak README.md
authored
1 <h1>Build couchdb-lucene</h1>
b207965 @rnewson improve README readability.
authored
2
3 <ol>
4 <li>Install Maven 2.
5 <li>checkout repository
6 <li>type 'mvn'
7 <li>configure couchdb (see below)
8 </ol>
9
10 <h1>Configure CouchDB</h1>
11
12 <pre>
13 [external]
14 fti= /usr/bin/java -jar /path/to/couchdb-lucene*-jar-with-dependencies.jar
15
16 [httpd_db_handlers]
17 _fti = {couch_httpd_external, handle_external_req, <<"fti">>}
18 </pre>
19
20 <h1>Indexing Strategy</h1>
21
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
22 <h2>Document Indexing</h2>
23
b207965 @rnewson improve README readability.
authored
24 Currently all fields of all documents are indexed, javascript control coming soon.
25
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
26 <h2>Attachment Indexing</h2>
27
28 CouchDB uses <a href="http://lucene.apache.org/tika/">Apache Tika</a> to index attachments of the following types, assuming the correct content_type is set in couchdb;
29
ec94e21 @rnewson updated README.md
authored
30 <h3>Supported Formats</h3>
31
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
32 <ul>
33 <li>Excel spreadsheets (application/vnd.ms-excel)
34 <li>Word documents (application/msword)
35 <li>Powerpoint presentations (application/vnd.ms-powerpoint)
36 <li>Visio (application/vnd.visio)
37 <li>Outlook (application/vnd.ms-outlook)
38 <li>XML (application/xml)
39 <li>HTML (text/html)
40 <li>Images (image/*)
41 <li>Java class files
42 <li>Java jar archives
43 <li>MP3 (audio/mp3)
44 <li>OpenDocument (application/vnd.oasis.opendocument.*)
45 <li>Plain text (text/plain)
46 <li>PDF (application/pdf)
47 <li>RTF (application/rtf)
48 </ul>
49
b207965 @rnewson improve README readability.
authored
50 <h1>Searching with couchdb-lucene</h1>
51
52 You can perform all types of queries using Lucene's default <a href="http://lucene.apache.org/java/2_4_0/queryparsersyntax.html">query syntax</a>. The following parameters can be passed for more sophisticated searches;
53
54 <dl>
ad9096f @rnewson tweak README.md
authored
55 <dt>q<dd>the query to run (e.g, subject:hello)
b207965 @rnewson improve README readability.
authored
56 <dt>sort<dd>the comma-separated fields to sort on.
57 <dt>asc<dd>sort ascending (true) or descending (false), only when sorting on a single field.
58 <dt>limit<dd>the maximum number of results to return
59 <dt>skip<dd>the number of results to skip
60 <dt>include_docs<dd>whether to include the source docs
61 <dt>debug<dd>if false, a normal application/json response with results appears. if true, an pretty-printed HTML blob is returned instead.
ad9096f @rnewson tweak README.md
authored
62 </dl>
b207965 @rnewson improve README readability.
authored
63
64 <i>All parameters except 'q' are optional.</i>
65
ec94e21 @rnewson updated README.md
authored
66 <h2>Special Fields</h2>
67
68 <dl>
69 <dt>_id<dd>The _id of the document.
70 <dt>_rev<dd>The _rev of the document.
71 <dt>_db<dd>The source database of the document.
72 <dt>_body<dd>Any text extracted from any attachment (name may change).
73 <dt>_author<dd>The author of any attachment (name may change).
74 <dt>_title<dd>The title of any attachment (name may change).
75 </dl>
76
b207965 @rnewson improve README readability.
authored
77 <h2>Examples</h2>
78
79 <pre>
80 http://localhost:5984/dbname/_fti?q=field_name:value
81 http://localhost:5984/dbname/_fti?q=field_name:value&sort=other_field
82 http://localhost:5984/dbname/_fti?debug=true&sort=billing_size&q=body:document AND customer:[A TO C]
83 http://localhost:5984/dbname/_fti?debug=true&sort=billing_size&q=body:document AND customer:[100 TO 400]
84 </pre>
85
86 <h2>Search Results Format</h2>
87
88 return values is a JSON array of _id, _rev and sort_field values (the latter only when sort= is supplied)
89
118d28e @rnewson JSON example output.
authored
90 <pre>
91 {
92 "total_rows":49999,
93 "rows":
94 [
95 {"_id":"9","_rev":"2779848574","score":1.712123155593872},
96 {"_id":"8","_rev":"670155834","score":1.712123155593872}
97 ]
98 }
99 </pre>
100
101 <pre>
102 {
103 "total_rows":49999,
104 "sort_order":
105 [
106 {"field":"customer","reverse":false,"type":"string"},
107 {"reverse":false,"type":"doc"}
108 ],
109 "rows":
110 [
111 {"_id":"75000","_rev":"372496647","score":1.712123155593872,"sort_order":["00000000000000",50802]},
112 {"_id":"170036","_rev":"3628205594","score":1.712123155593872,"sort_order":["00000000000000",51716]}
113 ]
114 }
115 </pre>
116
b207965 @rnewson improve README readability.
authored
117 <h1>Working With The Source</h1>
118
119 To develop "live", type "mvn dependency:unpack-dependencies" and change the external line to something like this;
120
121 <pre>
490ae39 @rnewson break long lines in README.md
authored
122 fti=/usr/bin/java -cp /path/to/couchdb-lucene/target/classes:\
123 /path/to/couchdb-lucene/target/dependency org.apache.couchdb.lucene.Main
b207965 @rnewson improve README readability.
authored
124 </pre>
125
126 You will need to restart CouchDB if you change couchdb-lucene source code but this is very fast.
127
128 <h1>Configuration</h1>
129
130 couchdb-lucene respects several system properties;
131
132 <dl>
ad9096f @rnewson tweak README.md
authored
133 <dt>couchdb.url<dd>the url to contact CouchDB with (default is "http://localhost:5984")
134 <dt>couchdb.lucene.dir<dd>specify the path to the lucene indexes (the default is to make a directory called 'lucene' relative to couchdb's current working directory.
b207965 @rnewson improve README readability.
authored
135 </dl>
136
137 You can override these properties like this;
138
139 <pre>
490ae39 @rnewson break long lines in README.md
authored
140 fti=/usr/bin/java -D couchdb.lucene.dir=/tmp \
141 -cp /home/rnewson/Source/couchdb-lucene/target/classes:\
142 /home/rnewson/Source/couchdb-lucene/target/dependency\
143 org.apache.couchdb.lucene.Main
b207965 @rnewson improve README readability.
authored
144 </pre>
Something went wrong with that request. Please try again.