Skip to content
This repository has been archived by the owner on May 4, 2023. It is now read-only.

Problem in return value from long query #97

Closed
ldriscoll opened this issue Dec 7, 2010 · 16 comments
Closed

Problem in return value from long query #97

ldriscoll opened this issue Dec 7, 2010 · 16 comments

Comments

@ldriscoll
Copy link

I've created a query that searches for fields that have a -1 in there.
q=myNumericField:-1
The search returns the correct values, but jcouchdb baulks because the response is:
+TermQuery(myNumericField: \u0000���������,boost=1.0)

  • the \u0000 is the null character and svenson doesn't like it.

However if I search
q=myNumericField:[-2 TO -1]
+NumericRangeQuery(-2 TO -1 AS Long,boost=1.0)

The bug basically is that the first query should return
+TermQuery(myNumericField: -1,boost=1.0)

Thanks in advance
Luke Driscoll

@rnewson
Copy link
Owner

rnewson commented Dec 7, 2010

can you try q=myNumericField:-1 ?

@ldriscoll
Copy link
Author

yup, with that I get a could not parse, however if I do
q=myNumericField:-1
I get:
TermQuery(myNumericField:1,boost=1.0)
(note the missing -1) This is the same if I put the -1 in quotes.

@rnewson
Copy link
Owner

rnewson commented Dec 7, 2010

hi,

you need to escape the - as it's a reserved character. try;

q=myNumericField<int>:-1

@ldriscoll
Copy link
Author

interestingly the comments on github hide the bits in triangular brackets!
However
with myNumericField<int>:-1
I don't get results (as it's indexed as a long) but the plan comes back as:

"TermQuery(myNumericField:`\u0007����,boost=1.0)"

whereas with a long
"TermQuery(dischargeDateTime: \u0000���������,boost=1.0)"

Either way there is no -1 in the response, and the unicode characters are going to make jcouchdb/svenson baulk.

@rnewson
Copy link
Owner

rnewson commented Dec 7, 2010

so try myNumericField<long>:-1 :)

@rnewson
Copy link
Owner

rnewson commented Dec 7, 2010

JSON is defined as UTF-8 so that sounds like a serious bug in svenson.

@rnewson
Copy link
Owner

rnewson commented Dec 7, 2010

I think this is not a couchdb-lucene bug but svenson or jcouchdb one, closing. please reopen if not.

@ldriscoll
Copy link
Author

I think the problem is that couchdb-lucene is returning \u0000 instead of the string "-1". I agree that svenson. The problem in svenson is that your json is returning a unicode character that's not printable. I agree that svenson is doing something a little 'unique', but my thought is that the TermQuery details should be returned in the same format that they were given, rather than converting an int to a char.

@rnewson
Copy link
Owner

rnewson commented Dec 7, 2010

Lucene only indexes strings. So \u0000 is really what's searched for. The <long> syntax is used to trigger the int -> char conversion (which is why you get no hit if you omit that).

Even if this issue were fixed somehow, it still means svenson can't handle Unicode characters even if they are legitimately present in search results, right?

@rnewson
Copy link
Owner

rnewson commented Dec 7, 2010

Can you show the full error from svenson? It seems to parse \u0000 fine for me.

@ldriscoll
Copy link
Author

Of course, here you go.

 
org.svenson.JSONParseException: Illegal control character 0x7f
    at org.svenson.tokenize.JSONTokenizer.parseString(JSONTokenizer.java:405)
    at org.svenson.tokenize.JSONTokenizer.next(JSONTokenizer.java:187)
    at org.svenson.JSONParser.parseObjectInto(JSONParser.java:554)
    at org.svenson.JSONParser.parse(JSONParser.java:396)
    at org.svenson.JSONParser.parse(JSONParser.java:372)
    at org.jcouchdb.db.Response.getContentAsBean(Response.java:158)
    at org.jcouchdb.db.Database.queryViewInternal(Database.java:862)
    at org.jcouchdb.db.Database.query(Database.java:777)
    at com.shareableink.serenada.couch.CouchSearchAccess.searchCouchInternal(CouchSearchAccess.java:221)
    at com.shareableink.serenada.couch.CouchSearchAccess.searchCouchRecursive(CouchSearchAccess.java:127)
    at com.shareableink.serenada.couch.CouchSearchAccess.searchCouchRecursive(CouchSearchAccess.java:159)
    at com.shareableink.serenada.couch.CouchSearchAccess.searchCouch(CouchSearchAccess.java:89)
    at com.shareableink.serenada.couch.TestCouchSearchAccess.testForDateRange(TestCouchSearchAccess.java:178)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
    at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
    at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:74)
    at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:82)
    at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:72)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:240)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
    at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
    at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
    at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
    at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:180)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:65)

@rnewson
Copy link
Owner

rnewson commented Dec 7, 2010

So this unit tests passes;

@Test
public void parseJson() throws Exception {
    final Object obj = JSONParser.defaultJSONParser().parse("{\"foo\":\"\\u0000\"}");
    assertThat(obj, notNullValue());
}

Perhaps jcouchdb is mangling the string?

@ldriscoll
Copy link
Author

I think I've found the problem
I'm going to paste the output from couchdb-lucene below, but I don't think it's going to help.

The key thing is that after the \u0000, and inside the quotes are 9 7F characters. The \u0000 was a red herring.

{
  "q": "dischargeDateTime: \u0000���������",
  "plan": "TermQuery(myNumericField: \u0000���������,boost=1.0)",
  "etag": "11ed4144659f7d08",
  "skip": 0,
  "limit": 25,
  "total_rows": 1,
  "search_duration": 0,
  "fetch_duration": 0,
  "rows": [  {
    "id": "encounter_A74333",
    "score": 4.02852201461792,
    "fields": {"patient_id": "patient_A74327"}
  }]
}

@rnewson
Copy link
Owner

rnewson commented Dec 8, 2010

Thanks, I can't immediately think of where those characters could be coming from but I'll try to track it down tomorrow. Thanks!

@ldriscoll
Copy link
Author

No worries.

@ldriscoll
Copy link
Author

Bob,
I've done some digging, and it's lucene that's doing the strangeness. org.apache.lucene.search.Query.toString() is returning the bad characters.

Luke

This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants