Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Lucene 3.6.2, POST queries, full OOXML schemas, increase granularity of Date queries #188

Open
wants to merge 6 commits into from

3 participants

@vjt
vjt commented

Hi Robert,

as discussed on the couchdb-user mailing list, this pull request contains a batch of changes that we found useful in our production environment.

Details are in the commit messages and in the NEWS file.

This contains @patricklodder's commit from #170, closing also #112.

Thanks!

~Marcello

vjt and others added some commits
@vjt vjt Whitespace db09883
@vjt vjt Update Lucene to version 3.6.2 3a19177
@vjt vjt Reduce precisionStep to 1 for Date values
This increases the granularity of Range queries over Date values,
and allows to perform exact sorting over Date fields.
11fd46a
@vjt vjt Add the full OOXML Schemas archive
This allows to parse Office documents that use exotic features.
567ac58
@patricklodder patricklodder allow POSTing queries d237dd2
@rnewson
Owner

I'm taking the commits individually that can go in unmodified. The precisionStep thing will need to be configurable and the POST variant (which is a great idea) might be done more cleanly (not sure yet).

@vjt

Thank you! As you may have inferred, I am not a Java developer, so I basically cargo-culted the POST code and just tested it from the outside, issuing big HTTP requests.

About precisionStep, definitely a configuration option is the best way forward - I tried to implement it but I failed - again, no Java developer here :-).

Thanks again, and looking forward to the developments.

~Marcello

@patricklodder

@rnewson I'm happy to donate some time to clean up the POST code, it is indeed a bit ugly. If you have any specific feedback on what is unacceptable right now, please let me know, so I can focus on that.

@rnewson
Owner

@patricklodder I don't have any specific objections right now, only that it looked a little odd to me. I'll take some time over the holidays to look at it. It's definitely useful to have a POST variant.

@vjt
vjt commented

Hi @rnewson, I've extracted licensing information for all the included dependencies in a new text file. I don't think there's a better way for now, as Maven doesn't provide structured licensing information for all the artifacts, so I had to find the home pages of each dependency and find out which license they are released under.

@rnewson
Owner

Thanks, but I'm not sure it's necessary (and it's not related to this pull request, I think).

@vjt
vjt commented

OK. Yes, it is definitely OT here - sorry for the noise.

@rnewson
Owner

No problem. :) I hope to finish extracting the work here into master next week, the simple bits are already done.

@rnewson rnewson referenced this pull request from a commit
@rnewson Support POST for queries
closes #87, #112, #170, #188.
7a2fa62
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Dec 8, 2013
  1. @vjt

    Whitespace

    vjt authored
  2. @vjt

    Update Lucene to version 3.6.2

    vjt authored
  3. @vjt

    Reduce precisionStep to 1 for Date values

    vjt authored
    This increases the granularity of Range queries over Date values,
    and allows to perform exact sorting over Date fields.
  4. @vjt

    Add the full OOXML Schemas archive

    vjt authored
    This allows to parse Office documents that use exotic features.
  5. @patricklodder @vjt

    allow POSTing queries

    patricklodder authored vjt committed
Commits on Jan 3, 2014
  1. @vjt
This page is out of date. Refresh to see the latest.
View
46 LICENSE.dependencies
@@ -0,0 +1,46 @@
+CouchDB-Lucene includes the following open source Java components, licensed under:
+
+- Apache 2.0 License:
+
+ * httpclient 4.3.1 http://hc.apache.org/
+ * httpcore 4.3.1 http://hc.apache.org/
+ * jetty 6.1.20 http://www.eclipse.org/jetty/licenses.php
+ * commons-io 1.4 http://commons.apache.org/proper/commons-io/
+ * commons-configuration 1.6 http://commons.apache.org/proper/commons-configuration/
+ * commons-codec 1.5 http://commons.apache.org/proper/commons-codec/
+ * lucene 3.6.2 https://lucene.apache.org/
+ * tika 1.2 http://tika.apache.org/license.html
+ * ooxml-schemas 1.1 http://poi.apache.org/legal.html
+ * jinterface 1.5.3.2 https://github.com/erlang/otp
+ * log4j 1.2.14 http://logging.apache.org/log4j/1.2/license.html
+
+
+- Eclipse Public License:
+
+ * junit 4.5 https://github.com/junit-team/junit/blob/master/LICENSE.txt
+
+
+- MIT License
+
+ * slf4j-log4j12 1.5.6 http://slf4j.org/license.html
+
+
+- BSD License
+
+ * hamcrest 1.1 https://code.google.com/p/hamcrest/
+
+
+- MPL 2.0 License:
+
+ * rhino 1.7R4 https://developer.mozilla.org/en/docs/Rhino/License
+
+
+- Creative Commons Attribution 2.5 License:
+
+ * jcip-annotiations 1.0 http://jcip.net.s3-website-us-east-1.amazonaws.com/annotations/doc/index.html
+
+
+- Provided with no support or warranty:
+
+ * org.json 200902122 http://www.json.org/license.html
+
View
12 NEWS
@@ -1,4 +1,16 @@
+Version 0.10.0 (UNRELEASED)
+---------------------------
+
+* Upgrade to Lucene 3.6.2
+* Increase the PrecisionStep of date fields to 1, for more
+ granularity in Range queries and allow exact sorting by
+ timestamp
+* Use the full OOXML Schemas from Apache POI, to make Tika
+ able to parse Office documents that use exotic features
+* Allow sending queries via POST
+
Version 0.9.0
+-------------
* Upgrade to Rhino 1.7R4
* Upgrade to Lucene 3.6.1
View
6 couchdb-external-hook.py
@@ -78,7 +78,11 @@ def respond(res, req, key):
else:
method = req["verb"]
- res.request(method, path, headers=req_headers)
+ if method == "POST":
+ res.request(method, path, req.get('body').encode('utf-8'), headers=req_headers)
+ else:
+ res.request(method, path, headers=req_headers)
+
resp = res.getresponse()
resp_headers = {}
View
7 pom.xml
@@ -99,6 +99,11 @@
<version>${tika-version}</version>
</dependency>
<dependency>
+ <groupId>org.apache.poi</groupId>
+ <artifactId>ooxml-schemas</artifactId>
+ <version>1.1</version>
+ </dependency>
+ <dependency>
<groupId>org.erlang.otp</groupId>
<artifactId>jinterface</artifactId>
<version>1.5.3.2</version>
@@ -127,7 +132,7 @@
</dependency>
</dependencies>
<properties>
- <lucene-version>3.6.1</lucene-version>
+ <lucene-version>3.6.2</lucene-version>
<tika-version>1.2</tika-version>
<jetty-version>6.1.20</jetty-version>
<http-version>4.0.1</http-version>
View
13 src/main/java/com/github/rnewson/couchdb/lucene/DatabaseIndexer.java
@@ -501,6 +501,11 @@ public void run() {
public void search(final HttpServletRequest req,
final HttpServletResponse resp) throws IOException, JSONException {
+ search(req, resp, req.getParameter("q"));
+ }
+
+ public void search(final HttpServletRequest req,
+ final HttpServletResponse resp, final String query) throws IOException, JSONException {
final IndexState state = getState(req, resp);
if (state == null)
return;
@@ -513,7 +518,7 @@ public void search(final HttpServletRequest req,
resp.setStatus(304);
return;
}
- for (final String queryString : getQueryStrings(req)) {
+ for (final String queryString : getQueryStrings(query)) {
final Analyzer analyzer = state.analyzer(req.getParameter("analyzer"));
final Operator operator = "and".equalsIgnoreCase(req.getParameter("default_operator"))
? Operator.AND : Operator.OR;
@@ -720,6 +725,10 @@ public void search(final HttpServletRequest req,
return Utils.splitOnCommas(req.getParameter("q"));
}
+ private String[] getQueryStrings(final String query) {
+ return Utils.splitOnCommas(query);
+ }
+
private void close() {
this.closed = true;
@@ -737,7 +746,7 @@ private void close() {
}
latch.countDown();
}
-
+
public boolean isClosed() {
return closed;
}
View
35 src/main/java/com/github/rnewson/couchdb/lucene/LuceneServlet.java
@@ -16,6 +16,9 @@
package com.github.rnewson.couchdb.lucene;
+import java.io.BufferedReader;
+import java.io.StringWriter;
+
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
@@ -191,7 +194,7 @@ private void doGetInternal(final HttpServletRequest req, final HttpServletRespon
ServletUtils.sendJsonError(req, resp, 500, "error_creating_index");
return;
}
-
+
if (req.getParameter("q") == null) {
indexer.info(req, resp);
} else {
@@ -215,7 +218,7 @@ protected void doPost(final HttpServletRequest req,
}
private void doPostInternal(final HttpServletRequest req, final HttpServletResponse resp)
- throws IOException, JSONException {
+ throws ServletException, IOException, JSONException {
switch (StringUtils.countMatches(req.getRequestURI(), "/")) {
case 3:
if (req.getPathInfo().endsWith("/_cleanup")) {
@@ -223,6 +226,34 @@ private void doPostInternal(final HttpServletRequest req, final HttpServletRespo
return;
}
break;
+ case 5:
+ final DatabaseIndexer indexr = getIndexer(req);
+ if (indexr == null) {
+ ServletUtils.sendJsonError(req, resp, 500, "error_creating_index");
+ return;
+ }
+
+ BufferedReader reader = req.getReader();
+ StringWriter writer = new StringWriter();
+
+ char[] buffer = new char[1024];
+ try {
+ int n;
+ while ((n = reader.read(buffer)) != -1) {
+ writer.write(buffer, 0, n);
+ }
+ } catch(Exception ex) {
+ log("Could not read input", ex);
+ ServletUtils.sendJsonError(req, resp, 500, "could not read input");
+ return;
+ } finally {
+ reader.close();
+ }
+ String query = writer.toString();
+
+ indexr.search(req, resp, query);
+ return;
+
case 6:
final DatabaseIndexer indexer = getIndexer(req);
indexer.admin(req, resp);
View
36 src/main/java/com/github/rnewson/couchdb/lucene/couchdb/FieldType.java
@@ -34,7 +34,7 @@
public enum FieldType {
- DATE(8, SortField.LONG) {
+ DATE(1, SortField.LONG) {
@Override
public NumericField toField(final String name, final Object value, final ViewSettings settings) throws ParseException {
@@ -71,9 +71,9 @@ public Query toTermQuery(final String name, final String text) {
}
private double toDouble(final Object obj) {
- if (obj instanceof Number) {
- return ((Number)obj).doubleValue();
- }
+ if (obj instanceof Number) {
+ return ((Number)obj).doubleValue();
+ }
return Double.parseDouble(obj.toString());
}
@@ -81,7 +81,7 @@ private double toDouble(final Object obj) {
FLOAT(4, SortField.FLOAT) {
@Override
public NumericField toField(final String name, final Object value, final ViewSettings settings) {
- return field(name, 4, settings).setFloatValue(toFloat(value));
+ return field(name, precisionStep, settings).setFloatValue(toFloat(value));
}
@Override
@@ -95,16 +95,16 @@ public Query toTermQuery(final String name, final String text) {
}
private float toFloat(final Object obj) {
- if (obj instanceof Number) {
- return ((Number)obj).floatValue();
- }
+ if (obj instanceof Number) {
+ return ((Number)obj).floatValue();
+ }
return Float.parseFloat(obj.toString());
}
},
INT(4, SortField.INT) {
@Override
public NumericField toField(final String name, final Object value, final ViewSettings settings) {
- return field(name, 4, settings).setIntValue(toInt(value));
+ return field(name, precisionStep, settings).setIntValue(toInt(value));
}
@Override
@@ -118,9 +118,9 @@ public Query toTermQuery(final String name, final String text) {
}
private int toInt(final Object obj) {
- if (obj instanceof Number) {
- return ((Number)obj).intValue();
- }
+ if (obj instanceof Number) {
+ return ((Number)obj).intValue();
+ }
return Integer.parseInt(obj.toString());
}
@@ -137,9 +137,9 @@ public Query toRangeQuery(final String name, final String lower, final String up
}
private long toLong(final Object obj) {
- if (obj instanceof Number) {
- return ((Number)obj).longValue();
- }
+ if (obj instanceof Number) {
+ return ((Number)obj).longValue();
+ }
return Long.parseLong(obj.toString());
}
@@ -206,9 +206,9 @@ public final int toSortField() {
}
public static long toDate(final Object obj) throws ParseException {
- if (obj instanceof Date) {
- return ((Date)obj).getTime();
- }
+ if (obj instanceof Date) {
+ return ((Date)obj).getTime();
+ }
try {
return DateUtils.parseDate(obj.toString().toUpperCase(), DATE_PATTERNS).getTime();
} catch (final java.text.ParseException e) {
View
3  src/main/java/com/github/rnewson/couchdb/lucene/couchdb/ViewSettings.java
@@ -82,8 +82,7 @@ public FieldType getFieldType() {
return type;
}
- public TermVector getTermVector()
- {
+ public TermVector getTermVector() {
return termvector;
}
Something went wrong with that request. Please try again.