Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Newer
Older
100644 215 lines (174 sloc) 5.898 kB
5d4e56a @rnewson update readme.
authored
1 <h1>News</h1>
2
fd16315 @rnewson update README.md
authored
3 I've merged the changes from the beta branch which brings many improvements. Notably;
5d4e56a @rnewson update readme.
authored
4
fd16315 @rnewson update README.md
authored
5 <ol>
6 <li>Indexing is a separate process to searching and is triggered by update notifications.
7 <li>Rhino integration has landed, user customization of indexing is now possible.
8 </ol>
5d4e56a @rnewson update readme.
authored
9
fd16315 @rnewson update README.md
authored
10 You are advised to delete indexes created prior to this update.
5d4e56a @rnewson update readme.
authored
11
5220b65 @rnewson tweak README.md
authored
12 <h1>Build couchdb-lucene</h1>
b207965 @rnewson improve README readability.
authored
13
14 <ol>
15 <li>Install Maven 2.
16 <li>checkout repository
17 <li>type 'mvn'
18 <li>configure couchdb (see below)
19 </ol>
20
21 <h1>Configure CouchDB</h1>
22
23 <pre>
24 [external]
a2e9024 @rnewson wip
authored
25 searcher=/usr/bin/java -jar /path/to/couchdb-lucene*-jar-with-dependencies.jar -search
26
27 [update_notification]
28 indexer=/usr/bin/java -jar /path/to/couchdb-lucene*-jar-with-dependencies.jar -index
b207965 @rnewson improve README readability.
authored
29
30 [httpd_db_handlers]
31 _fti = {couch_httpd_external, handle_external_req, <<"fti">>}
32 </pre>
33
34 <h1>Indexing Strategy</h1>
35
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
36 <h2>Document Indexing</h2>
37
fd16315 @rnewson update README.md
authored
38 By default all attributes are indexed. You can customize this process by adding a design document at _design/lucene. You must supply an attribute called "transform" which takes and returns a document. For example;
a2e9024 @rnewson wip
authored
39
40 <pre>
41 {
fd16315 @rnewson update README.md
authored
42 "transform":"function(doc) { return doc; }"
a2e9024 @rnewson wip
authored
43 }
44 </pre>
45
46 The function is evaluated by <a href="http://www.mozilla.org/rhino/">Rhino</a>. You may add, modify and remove any attributes. Additionally, returning null will exclude the document from indexing entirely.
b207965 @rnewson improve README readability.
authored
47
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
48 <h2>Attachment Indexing</h2>
49
50 CouchDB uses <a href="http://lucene.apache.org/tika/">Apache Tika</a> to index attachments of the following types, assuming the correct content_type is set in couchdb;
51
ec94e21 @rnewson updated README.md
authored
52 <h3>Supported Formats</h3>
53
4a60080 @rnewson use couchdb's content_type rather than auto-detect.
authored
54 <ul>
55 <li>Excel spreadsheets (application/vnd.ms-excel)
56 <li>Word documents (application/msword)
57 <li>Powerpoint presentations (application/vnd.ms-powerpoint)
58 <li>Visio (application/vnd.visio)
59 <li>Outlook (application/vnd.ms-outlook)
60 <li>XML (application/xml)
61 <li>HTML (text/html)
62 <li>Images (image/*)
63 <li>Java class files
64 <li>Java jar archives
65 <li>MP3 (audio/mp3)
66 <li>OpenDocument (application/vnd.oasis.opendocument.*)
67 <li>Plain text (text/plain)
68 <li>PDF (application/pdf)
69 <li>RTF (application/rtf)
70 </ul>
71
b207965 @rnewson improve README readability.
authored
72 <h1>Searching with couchdb-lucene</h1>
73
74 You can perform all types of queries using Lucene's default <a href="http://lucene.apache.org/java/2_4_0/queryparsersyntax.html">query syntax</a>. The following parameters can be passed for more sophisticated searches;
75
76 <dl>
ad9096f @rnewson tweak README.md
authored
77 <dt>q<dd>the query to run (e.g, subject:hello)
b207965 @rnewson improve README readability.
authored
78 <dt>sort<dd>the comma-separated fields to sort on.
79 <dt>asc<dd>sort ascending (true) or descending (false), only when sorting on a single field.
80 <dt>limit<dd>the maximum number of results to return
81 <dt>skip<dd>the number of results to skip
82 <dt>include_docs<dd>whether to include the source docs
83 <dt>debug<dd>if false, a normal application/json response with results appears. if true, an pretty-printed HTML blob is returned instead.
ad9096f @rnewson tweak README.md
authored
84 </dl>
b207965 @rnewson improve README readability.
authored
85
86 <i>All parameters except 'q' are optional.</i>
87
ec94e21 @rnewson updated README.md
authored
88 <h2>Special Fields</h2>
89
90 <dl>
91 <dt>_id<dd>The _id of the document.
92 <dt>_rev<dd>The _rev of the document.
93 <dt>_db<dd>The source database of the document.
94 <dt>_body<dd>Any text extracted from any attachment (name may change).
95 <dt>_author<dd>The author of any attachment (name may change).
96 <dt>_title<dd>The title of any attachment (name may change).
97 </dl>
98
b207965 @rnewson improve README readability.
authored
99 <h2>Examples</h2>
100
101 <pre>
102 http://localhost:5984/dbname/_fti?q=field_name:value
103 http://localhost:5984/dbname/_fti?q=field_name:value&sort=other_field
104 http://localhost:5984/dbname/_fti?debug=true&sort=billing_size&q=body:document AND customer:[A TO C]
105 </pre>
106
107 <h2>Search Results Format</h2>
108
fd16315 @rnewson update README.md
authored
109 Here's an example of a JSON response without sorting;
b207965 @rnewson improve README readability.
authored
110
118d28e @rnewson JSON example output.
authored
111 <pre>
112 {
fd16315 @rnewson update README.md
authored
113 "q": "+_db:enron +content:enron",
114 "skip": 0,
115 "limit": 2,
116 "total_rows": 176852,
117 "search_duration": 518,
118 "fetch_duration": 4,
119 "rows": [
120 {
121 "_id": "hain-m-all_documents-257.",
122 "_rev": "3750319208",
123 "score": 1.601625680923462
124 },
125 {
126 "_id": "hain-m-notes_inbox-257.",
127 "_rev": "2603032545",
128 "score": 1.601625680923462
129 }
118d28e @rnewson JSON example output.
authored
130 ]
131 }
132 </pre>
133
fd16315 @rnewson update README.md
authored
134 And the same with sorting;
135
118d28e @rnewson JSON example output.
authored
136 <pre>
137 {
fd16315 @rnewson update README.md
authored
138 "q": "+_db:enron +content:enron",
139 "skip": 0,
140 "limit": 3,
141 "total_rows": 176852,
142 "search_duration": 660,
143 "fetch_duration": 4,
144 "sort_order": [
145 {
146 "field": "source",
147 "reverse": false,
148 "type": "string"
149 },
150 {
151 "reverse": false,
152 "type": "doc"
153 }
118d28e @rnewson JSON example output.
authored
154 ],
fd16315 @rnewson update README.md
authored
155 "rows": [
156 {
157 "_id": "shankman-j-inbox-105.",
158 "_rev": "4289412378",
159 "score": 0.6131107211112976,
160 "sort_order": [
161 "enron",
162 6
163 ]
164 },
165 {
166 "_id": "shankman-j-inbox-8.",
167 "_rev": "1417542355",
168 "score": 0.7492915391921997,
169 "sort_order": [
170 "enron",
171 7
172 ]
173 },
174 {
175 "_id": "shankman-j-inbox-30.",
176 "_rev": "951793815",
177 "score": 0.507369875907898,
178 "sort_order": [
179 "enron",
180 8
181 ]
182 }
118d28e @rnewson JSON example output.
authored
183 ]
184 }
185 </pre>
186
b207965 @rnewson improve README readability.
authored
187 <h1>Working With The Source</h1>
188
189 To develop "live", type "mvn dependency:unpack-dependencies" and change the external line to something like this;
190
191 <pre>
490ae39 @rnewson break long lines in README.md
authored
192 fti=/usr/bin/java -cp /path/to/couchdb-lucene/target/classes:\
193 /path/to/couchdb-lucene/target/dependency org.apache.couchdb.lucene.Main
b207965 @rnewson improve README readability.
authored
194 </pre>
195
196 You will need to restart CouchDB if you change couchdb-lucene source code but this is very fast.
197
198 <h1>Configuration</h1>
199
200 couchdb-lucene respects several system properties;
201
202 <dl>
ad9096f @rnewson tweak README.md
authored
203 <dt>couchdb.url<dd>the url to contact CouchDB with (default is "http://localhost:5984")
204 <dt>couchdb.lucene.dir<dd>specify the path to the lucene indexes (the default is to make a directory called 'lucene' relative to couchdb's current working directory.
b207965 @rnewson improve README readability.
authored
205 </dl>
206
207 You can override these properties like this;
208
209 <pre>
490ae39 @rnewson break long lines in README.md
authored
210 fti=/usr/bin/java -D couchdb.lucene.dir=/tmp \
211 -cp /home/rnewson/Source/couchdb-lucene/target/classes:\
212 /home/rnewson/Source/couchdb-lucene/target/dependency\
213 org.apache.couchdb.lucene.Main
b207965 @rnewson improve README readability.
authored
214 </pre>
Something went wrong with that request. Please try again.