UTF-8 field name #180

Closed
matcho opened this Issue Sep 17, 2013 · 2 comments

Projects

None yet

2 participants

@matcho
matcho commented Sep 17, 2013

Hello,

I'm trying to index data with c-l in a field having a name that contains non-ascii characters. In this example "chócacao".

I can find the data indexed if I don't specify the field to search.
But if I do specify it:

curl 'http://localhost:5984/_fti/local/searchtest/_design/datamanager/everyfield?q=chócacao:foobar'

c-l responds:

{"reason":"Bad query syntax: Cannot parse 'chócacao:foobar': Field 'chócacao' not recognized.","code":400}

No error at indexing time, a log confirms that the field is parsed.
Of course, if the data indexed contain any non-ascii characters, it works.

Maybe it's just the query parser?

Thanks,

Mat'

@rnewson
Owner
rnewson commented Sep 17, 2013

It's this;

private static Pattern PATTERN = Pattern.compile("^(\w[\ \w_.-]*)(<([\w]+)>)?$");

throw new ParseException("Field '" + string + "' not recognized.");

@rnewson rnewson pushed a commit that closed this issue Sep 17, 2013
Robert Newson Loosen field name regex (closes #180) b2d7653
@rnewson rnewson closed this in b2d7653 Sep 17, 2013
@matcho
matcho commented Sep 20, 2013

Hi Robert,

Thanks a lot for this fix!
I will try that right now.

Mat

@vjt vjt added a commit to ifad/couchdb-lucene that referenced this issue Feb 5, 2014
@vjt vjt Merge remote-tracking branch 'upstream/master'
* upstream/master:
  Upgrade to Jetty 8.1.14 (requires JDK 6)
  Upgrade slf4j
  Add the full OOXML Schemas archive
  Upgrade Lucene to 3.6.2
  Loosen field name regex (closes #180)
4cbeef8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment