Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 218 lines (176 sloc) 5.968 kb
5d4e56a Robert Newson update readme.
authored
1 <h1>News</h1>
2
fd16315 Robert Newson update README.md
authored
3 I've merged the changes from the beta branch which brings many improvements. Notably;
5d4e56a Robert Newson update readme.
authored
4
fd16315 Robert Newson update README.md
authored
5 <ol>
6 <li>Indexing is a separate process to searching and is triggered by update notifications.
7 <li>Rhino integration has landed, user customization of indexing is now possible.
8 </ol>
5d4e56a Robert Newson update readme.
authored
9
fd16315 Robert Newson update README.md
authored
10 You are advised to delete indexes created prior to this update.
5d4e56a Robert Newson update readme.
authored
11
5220b65 Robert Newson tweak README.md
authored
12 <h1>Build couchdb-lucene</h1>
b207965 Robert Newson improve README readability.
authored
13
14 <ol>
15 <li>Install Maven 2.
16 <li>checkout repository
17 <li>type 'mvn'
18 <li>configure couchdb (see below)
19 </ol>
20
21 <h1>Configure CouchDB</h1>
22
23 <pre>
0563120 Robert Newson fixes.
authored
24 [couchdb]
25 os_process_timeout=60000 ; increase the timeout from 5 seconds.
26
b207965 Robert Newson improve README readability.
authored
27 [external]
77d4f67 Robert Newson fix readme.
authored
28 fti=/usr/bin/java -jar /path/to/couchdb-lucene*-jar-with-dependencies.jar -search
a2e9024 Robert Newson wip
authored
29
30 [update_notification]
31 indexer=/usr/bin/java -jar /path/to/couchdb-lucene*-jar-with-dependencies.jar -index
b207965 Robert Newson improve README readability.
authored
32
33 [httpd_db_handlers]
34 _fti = {couch_httpd_external, handle_external_req, <<"fti">>}
35 </pre>
36
37 <h1>Indexing Strategy</h1>
38
4a60080 Robert Newson use couchdb's content_type rather than auto-detect.
authored
39 <h2>Document Indexing</h2>
40
fd16315 Robert Newson update README.md
authored
41 By default all attributes are indexed. You can customize this process by adding a design document at _design/lucene. You must supply an attribute called "transform" which takes and returns a document. For example;
a2e9024 Robert Newson wip
authored
42
43 <pre>
44 {
fd16315 Robert Newson update README.md
authored
45 "transform":"function(doc) { return doc; }"
a2e9024 Robert Newson wip
authored
46 }
47 </pre>
48
49 The function is evaluated by <a href="http://www.mozilla.org/rhino/">Rhino</a>. You may add, modify and remove any attributes. Additionally, returning null will exclude the document from indexing entirely.
b207965 Robert Newson improve README readability.
authored
50
4a60080 Robert Newson use couchdb's content_type rather than auto-detect.
authored
51 <h2>Attachment Indexing</h2>
52
53 CouchDB uses <a href="http://lucene.apache.org/tika/">Apache Tika</a> to index attachments of the following types, assuming the correct content_type is set in couchdb;
54
ec94e21 Robert Newson updated README.md
authored
55 <h3>Supported Formats</h3>
56
4a60080 Robert Newson use couchdb's content_type rather than auto-detect.
authored
57 <ul>
58 <li>Excel spreadsheets (application/vnd.ms-excel)
59 <li>Word documents (application/msword)
60 <li>Powerpoint presentations (application/vnd.ms-powerpoint)
61 <li>Visio (application/vnd.visio)
62 <li>Outlook (application/vnd.ms-outlook)
63 <li>XML (application/xml)
64 <li>HTML (text/html)
65 <li>Images (image/*)
66 <li>Java class files
67 <li>Java jar archives
68 <li>MP3 (audio/mp3)
69 <li>OpenDocument (application/vnd.oasis.opendocument.*)
70 <li>Plain text (text/plain)
71 <li>PDF (application/pdf)
72 <li>RTF (application/rtf)
73 </ul>
74
b207965 Robert Newson improve README readability.
authored
75 <h1>Searching with couchdb-lucene</h1>
76
77 You can perform all types of queries using Lucene's default <a href="http://lucene.apache.org/java/2_4_0/queryparsersyntax.html">query syntax</a>. The following parameters can be passed for more sophisticated searches;
78
79 <dl>
ad9096f Robert Newson tweak README.md
authored
80 <dt>q<dd>the query to run (e.g, subject:hello)
b207965 Robert Newson improve README readability.
authored
81 <dt>sort<dd>the comma-separated fields to sort on.
82 <dt>asc<dd>sort ascending (true) or descending (false), only when sorting on a single field.
83 <dt>limit<dd>the maximum number of results to return
84 <dt>skip<dd>the number of results to skip
85 <dt>include_docs<dd>whether to include the source docs
86 <dt>debug<dd>if false, a normal application/json response with results appears. if true, an pretty-printed HTML blob is returned instead.
ad9096f Robert Newson tweak README.md
authored
87 </dl>
b207965 Robert Newson improve README readability.
authored
88
89 <i>All parameters except 'q' are optional.</i>
90
ec94e21 Robert Newson updated README.md
authored
91 <h2>Special Fields</h2>
92
93 <dl>
94 <dt>_id<dd>The _id of the document.
95 <dt>_rev<dd>The _rev of the document.
96 <dt>_db<dd>The source database of the document.
97 <dt>_body<dd>Any text extracted from any attachment (name may change).
98 <dt>_author<dd>The author of any attachment (name may change).
99 <dt>_title<dd>The title of any attachment (name may change).
100 </dl>
101
b207965 Robert Newson improve README readability.
authored
102 <h2>Examples</h2>
103
104 <pre>
105 http://localhost:5984/dbname/_fti?q=field_name:value
106 http://localhost:5984/dbname/_fti?q=field_name:value&sort=other_field
107 http://localhost:5984/dbname/_fti?debug=true&sort=billing_size&q=body:document AND customer:[A TO C]
108 </pre>
109
110 <h2>Search Results Format</h2>
111
fd16315 Robert Newson update README.md
authored
112 Here's an example of a JSON response without sorting;
b207965 Robert Newson improve README readability.
authored
113
118d28e Robert Newson JSON example output.
authored
114 <pre>
115 {
fd16315 Robert Newson update README.md
authored
116 "q": "+_db:enron +content:enron",
117 "skip": 0,
118 "limit": 2,
119 "total_rows": 176852,
120 "search_duration": 518,
121 "fetch_duration": 4,
122 "rows": [
123 {
124 "_id": "hain-m-all_documents-257.",
125 "_rev": "3750319208",
126 "score": 1.601625680923462
127 },
128 {
129 "_id": "hain-m-notes_inbox-257.",
130 "_rev": "2603032545",
131 "score": 1.601625680923462
132 }
118d28e Robert Newson JSON example output.
authored
133 ]
134 }
135 </pre>
136
fd16315 Robert Newson update README.md
authored
137 And the same with sorting;
138
118d28e Robert Newson JSON example output.
authored
139 <pre>
140 {
fd16315 Robert Newson update README.md
authored
141 "q": "+_db:enron +content:enron",
142 "skip": 0,
143 "limit": 3,
144 "total_rows": 176852,
145 "search_duration": 660,
146 "fetch_duration": 4,
147 "sort_order": [
148 {
149 "field": "source",
150 "reverse": false,
151 "type": "string"
152 },
153 {
154 "reverse": false,
155 "type": "doc"
156 }
118d28e Robert Newson JSON example output.
authored
157 ],
fd16315 Robert Newson update README.md
authored
158 "rows": [
159 {
160 "_id": "shankman-j-inbox-105.",
161 "_rev": "4289412378",
162 "score": 0.6131107211112976,
163 "sort_order": [
164 "enron",
165 6
166 ]
167 },
168 {
169 "_id": "shankman-j-inbox-8.",
170 "_rev": "1417542355",
171 "score": 0.7492915391921997,
172 "sort_order": [
173 "enron",
174 7
175 ]
176 },
177 {
178 "_id": "shankman-j-inbox-30.",
179 "_rev": "951793815",
180 "score": 0.507369875907898,
181 "sort_order": [
182 "enron",
183 8
184 ]
185 }
118d28e Robert Newson JSON example output.
authored
186 ]
187 }
188 </pre>
189
b207965 Robert Newson improve README readability.
authored
190 <h1>Working With The Source</h1>
191
192 To develop "live", type "mvn dependency:unpack-dependencies" and change the external line to something like this;
193
194 <pre>
490ae39 Robert Newson break long lines in README.md
authored
195 fti=/usr/bin/java -cp /path/to/couchdb-lucene/target/classes:\
196 /path/to/couchdb-lucene/target/dependency org.apache.couchdb.lucene.Main
b207965 Robert Newson improve README readability.
authored
197 </pre>
198
199 You will need to restart CouchDB if you change couchdb-lucene source code but this is very fast.
200
201 <h1>Configuration</h1>
202
203 couchdb-lucene respects several system properties;
204
205 <dl>
ad9096f Robert Newson tweak README.md
authored
206 <dt>couchdb.url<dd>the url to contact CouchDB with (default is "http://localhost:5984")
207 <dt>couchdb.lucene.dir<dd>specify the path to the lucene indexes (the default is to make a directory called 'lucene' relative to couchdb's current working directory.
b207965 Robert Newson improve README readability.
authored
208 </dl>
209
210 You can override these properties like this;
211
212 <pre>
490ae39 Robert Newson break long lines in README.md
authored
213 fti=/usr/bin/java -D couchdb.lucene.dir=/tmp \
214 -cp /home/rnewson/Source/couchdb-lucene/target/classes:\
215 /home/rnewson/Source/couchdb-lucene/target/dependency\
216 org.apache.couchdb.lucene.Main
b207965 Robert Newson improve README readability.
authored
217 </pre>
Something went wrong with that request. Please try again.