UTF-8 field name #180

matcho opened this Issue Sep 17, 2013 · 2 comments


None yet

2 participants

matcho commented Sep 17, 2013


I'm trying to index data with c-l in a field having a name that contains non-ascii characters. In this example "chócacao".

I can find the data indexed if I don't specify the field to search.
But if I do specify it:

curl 'http://localhost:5984/_fti/local/searchtest/_design/datamanager/everyfield?q=chócacao:foobar'

c-l responds:

{"reason":"Bad query syntax: Cannot parse 'chócacao:foobar': Field 'chócacao' not recognized.","code":400}

No error at indexing time, a log confirms that the field is parsed.
Of course, if the data indexed contain any non-ascii characters, it works.

Maybe it's just the query parser?



rnewson commented Sep 17, 2013

It's this;

private static Pattern PATTERN = Pattern.compile("^(\w[\ \w_.-]*)(<([\w]+)>)?$");

throw new ParseException("Field '" + string + "' not recognized.");

@rnewson rnewson pushed a commit that closed this issue Sep 17, 2013
Robert Newson Loosen field name regex (closes #180) b2d7653
@rnewson rnewson closed this in b2d7653 Sep 17, 2013
matcho commented Sep 20, 2013

Hi Robert,

Thanks a lot for this fix!
I will try that right now.


@vjt vjt added a commit to ifad/couchdb-lucene that referenced this issue Feb 5, 2014
@vjt vjt Merge remote-tracking branch 'upstream/master'
* upstream/master:
  Upgrade to Jetty 8.1.14 (requires JDK 6)
  Upgrade slf4j
  Add the full OOXML Schemas archive
  Upgrade Lucene to 3.6.2
  Loosen field name regex (closes #180)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment