Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 218 lines (176 sloc) 5.968 kB
5d4e56a update readme.
Robert Newson authored
1 <h1>News</h1>
2
fd16315 update README.md
Robert Newson authored
3 I've merged the changes from the beta branch which brings many improvements. Notably;
5d4e56a update readme.
Robert Newson authored
4
fd16315 update README.md
Robert Newson authored
5 <ol>
6 <li>Indexing is a separate process to searching and is triggered by update notifications.
7 <li>Rhino integration has landed, user customization of indexing is now possible.
8 </ol>
5d4e56a update readme.
Robert Newson authored
9
fd16315 update README.md
Robert Newson authored
10 You are advised to delete indexes created prior to this update.
5d4e56a update readme.
Robert Newson authored
11
5220b65 tweak README.md
Robert Newson authored
12 <h1>Build couchdb-lucene</h1>
b207965 improve README readability.
Robert Newson authored
13
14 <ol>
15 <li>Install Maven 2.
16 <li>checkout repository
17 <li>type 'mvn'
18 <li>configure couchdb (see below)
19 </ol>
20
21 <h1>Configure CouchDB</h1>
22
23 <pre>
0563120 fixes.
Robert Newson authored
24 [couchdb]
25 os_process_timeout=60000 ; increase the timeout from 5 seconds.
26
b207965 improve README readability.
Robert Newson authored
27 [external]
77d4f67 fix readme.
Robert Newson authored
28 fti=/usr/bin/java -jar /path/to/couchdb-lucene*-jar-with-dependencies.jar -search
a2e9024 wip
Robert Newson authored
29
30 [update_notification]
31 indexer=/usr/bin/java -jar /path/to/couchdb-lucene*-jar-with-dependencies.jar -index
b207965 improve README readability.
Robert Newson authored
32
33 [httpd_db_handlers]
34 _fti = {couch_httpd_external, handle_external_req, <<"fti">>}
35 </pre>
36
37 <h1>Indexing Strategy</h1>
38
4a60080 use couchdb's content_type rather than auto-detect.
Robert Newson authored
39 <h2>Document Indexing</h2>
40
fd16315 update README.md
Robert Newson authored
41 By default all attributes are indexed. You can customize this process by adding a design document at _design/lucene. You must supply an attribute called "transform" which takes and returns a document. For example;
a2e9024 wip
Robert Newson authored
42
43 <pre>
44 {
fd16315 update README.md
Robert Newson authored
45 "transform":"function(doc) { return doc; }"
a2e9024 wip
Robert Newson authored
46 }
47 </pre>
48
49 The function is evaluated by <a href="http://www.mozilla.org/rhino/">Rhino</a>. You may add, modify and remove any attributes. Additionally, returning null will exclude the document from indexing entirely.
b207965 improve README readability.
Robert Newson authored
50
4a60080 use couchdb's content_type rather than auto-detect.
Robert Newson authored
51 <h2>Attachment Indexing</h2>
52
53 CouchDB uses <a href="http://lucene.apache.org/tika/">Apache Tika</a> to index attachments of the following types, assuming the correct content_type is set in couchdb;
54
ec94e21 updated README.md
Robert Newson authored
55 <h3>Supported Formats</h3>
56
4a60080 use couchdb's content_type rather than auto-detect.
Robert Newson authored
57 <ul>
58 <li>Excel spreadsheets (application/vnd.ms-excel)
59 <li>Word documents (application/msword)
60 <li>Powerpoint presentations (application/vnd.ms-powerpoint)
61 <li>Visio (application/vnd.visio)
62 <li>Outlook (application/vnd.ms-outlook)
63 <li>XML (application/xml)
64 <li>HTML (text/html)
65 <li>Images (image/*)
66 <li>Java class files
67 <li>Java jar archives
68 <li>MP3 (audio/mp3)
69 <li>OpenDocument (application/vnd.oasis.opendocument.*)
70 <li>Plain text (text/plain)
71 <li>PDF (application/pdf)
72 <li>RTF (application/rtf)
73 </ul>
74
b207965 improve README readability.
Robert Newson authored
75 <h1>Searching with couchdb-lucene</h1>
76
77 You can perform all types of queries using Lucene's default <a href="http://lucene.apache.org/java/2_4_0/queryparsersyntax.html">query syntax</a>. The following parameters can be passed for more sophisticated searches;
78
79 <dl>
ad9096f tweak README.md
Robert Newson authored
80 <dt>q<dd>the query to run (e.g, subject:hello)
b207965 improve README readability.
Robert Newson authored
81 <dt>sort<dd>the comma-separated fields to sort on.
82 <dt>asc<dd>sort ascending (true) or descending (false), only when sorting on a single field.
83 <dt>limit<dd>the maximum number of results to return
84 <dt>skip<dd>the number of results to skip
85 <dt>include_docs<dd>whether to include the source docs
86 <dt>debug<dd>if false, a normal application/json response with results appears. if true, an pretty-printed HTML blob is returned instead.
ad9096f tweak README.md
Robert Newson authored
87 </dl>
b207965 improve README readability.
Robert Newson authored
88
89 <i>All parameters except 'q' are optional.</i>
90
ec94e21 updated README.md
Robert Newson authored
91 <h2>Special Fields</h2>
92
93 <dl>
94 <dt>_id<dd>The _id of the document.
95 <dt>_rev<dd>The _rev of the document.
96 <dt>_db<dd>The source database of the document.
97 <dt>_body<dd>Any text extracted from any attachment (name may change).
98 <dt>_author<dd>The author of any attachment (name may change).
99 <dt>_title<dd>The title of any attachment (name may change).
100 </dl>
101
b207965 improve README readability.
Robert Newson authored
102 <h2>Examples</h2>
103
104 <pre>
105 http://localhost:5984/dbname/_fti?q=field_name:value
106 http://localhost:5984/dbname/_fti?q=field_name:value&sort=other_field
107 http://localhost:5984/dbname/_fti?debug=true&sort=billing_size&q=body:document AND customer:[A TO C]
108 </pre>
109
110 <h2>Search Results Format</h2>
111
fd16315 update README.md
Robert Newson authored
112 Here's an example of a JSON response without sorting;
b207965 improve README readability.
Robert Newson authored
113
118d28e JSON example output.
Robert Newson authored
114 <pre>
115 {
fd16315 update README.md
Robert Newson authored
116 "q": "+_db:enron +content:enron",
117 "skip": 0,
118 "limit": 2,
119 "total_rows": 176852,
120 "search_duration": 518,
121 "fetch_duration": 4,
122 "rows": [
123 {
124 "_id": "hain-m-all_documents-257.",
125 "_rev": "3750319208",
126 "score": 1.601625680923462
127 },
128 {
129 "_id": "hain-m-notes_inbox-257.",
130 "_rev": "2603032545",
131 "score": 1.601625680923462
132 }
118d28e JSON example output.
Robert Newson authored
133 ]
134 }
135 </pre>
136
fd16315 update README.md
Robert Newson authored
137 And the same with sorting;
138
118d28e JSON example output.
Robert Newson authored
139 <pre>
140 {
fd16315 update README.md
Robert Newson authored
141 "q": "+_db:enron +content:enron",
142 "skip": 0,
143 "limit": 3,
144 "total_rows": 176852,
145 "search_duration": 660,
146 "fetch_duration": 4,
147 "sort_order": [
148 {
149 "field": "source",
150 "reverse": false,
151 "type": "string"
152 },
153 {
154 "reverse": false,
155 "type": "doc"
156 }
118d28e JSON example output.
Robert Newson authored
157 ],
fd16315 update README.md
Robert Newson authored
158 "rows": [
159 {
160 "_id": "shankman-j-inbox-105.",
161 "_rev": "4289412378",
162 "score": 0.6131107211112976,
163 "sort_order": [
164 "enron",
165 6
166 ]
167 },
168 {
169 "_id": "shankman-j-inbox-8.",
170 "_rev": "1417542355",
171 "score": 0.7492915391921997,
172 "sort_order": [
173 "enron",
174 7
175 ]
176 },
177 {
178 "_id": "shankman-j-inbox-30.",
179 "_rev": "951793815",
180 "score": 0.507369875907898,
181 "sort_order": [
182 "enron",
183 8
184 ]
185 }
118d28e JSON example output.
Robert Newson authored
186 ]
187 }
188 </pre>
189
b207965 improve README readability.
Robert Newson authored
190 <h1>Working With The Source</h1>
191
192 To develop "live", type "mvn dependency:unpack-dependencies" and change the external line to something like this;
193
194 <pre>
490ae39 break long lines in README.md
Robert Newson authored
195 fti=/usr/bin/java -cp /path/to/couchdb-lucene/target/classes:\
196 /path/to/couchdb-lucene/target/dependency org.apache.couchdb.lucene.Main
b207965 improve README readability.
Robert Newson authored
197 </pre>
198
199 You will need to restart CouchDB if you change couchdb-lucene source code but this is very fast.
200
201 <h1>Configuration</h1>
202
203 couchdb-lucene respects several system properties;
204
205 <dl>
ad9096f tweak README.md
Robert Newson authored
206 <dt>couchdb.url<dd>the url to contact CouchDB with (default is "http://localhost:5984")
207 <dt>couchdb.lucene.dir<dd>specify the path to the lucene indexes (the default is to make a directory called 'lucene' relative to couchdb's current working directory.
b207965 improve README readability.
Robert Newson authored
208 </dl>
209
210 You can override these properties like this;
211
212 <pre>
490ae39 break long lines in README.md
Robert Newson authored
213 fti=/usr/bin/java -D couchdb.lucene.dir=/tmp \
214 -cp /home/rnewson/Source/couchdb-lucene/target/classes:\
215 /home/rnewson/Source/couchdb-lucene/target/dependency\
216 org.apache.couchdb.lucene.Main
b207965 improve README readability.
Robert Newson authored
217 </pre>
Something went wrong with that request. Please try again.