Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locale parsing issues #13229

Closed
javanna opened this issue Aug 31, 2015 · 6 comments
Closed

Locale parsing issues #13229

javanna opened this issue Aug 31, 2015 · 6 comments

Comments

@javanna
Copy link
Member

javanna commented Aug 31, 2015

In the query-refactoring branch we noticed weird behaviours related to parsing locales. The two queries affected are query_string and simple_query_string, as they allow users to set the locale used where applicable, depending on the type of lucene query that gets parsed.

Both queries in master parse the locale through our own LocaleUtils#parse method and print it out usingLocale#toString when needed. It turns out that with some locales, our newly added query tests fail given that printing out and reparsing the same locale leads to a different locale. The problem looks very similar to what is explained here: https://issues.apache.org/jira/browse/LUCENE-4021.

I see that we also use LocaleUtils#parse in DateFieldMapper. I think that we should discuss how to address this issue and streamline the solution. I think the best solution would be to parse through Locale#forLanguageTag and print out using Locale#toLanguageTag, but this breaks backwards compatibility it seems. If I make this change SimpleDateMappingTests#testParseLocal fails for instance. Is also our own LocaleUtils still useful?

@clintongormley
Copy link

@javanna how (and how often) does it break bwc?

@javanna
Copy link
Member Author

javanna commented Sep 1, 2015

how (and how often) does it break bwc?

Some locales that used to get properly parsed might not get parsed correctly anymore. I am not sure how widely these options are used, especially in queries. Maybe removing this option as stated in #9978 would be the best solution.

@rmuir
Copy link
Contributor

rmuir commented Sep 1, 2015

Yes, using the user's already configured analysis chain rather than applying additional configuration (lowercase_expanded_terms, locale, etc) that is still limited to only case is really a better way to go IMO.

If we solve that issue then if they have turkishlowercase, it gets used, if they use asciifolding, it gets used, and so on, and people stop getting confused about why wildcards "don't work" which I bet happens constantly.

@javanna
Copy link
Member Author

javanna commented Sep 4, 2015

seems like the way forward would be to remove these locale parameter from query_string and simple_query_string, where we encountered issues. The remaining question is what we should do with date fields in mappings, which do allow to specify a locale too. Moving to parsing using forLanguageTag seems to break things, but leaving as-is is buggy. @rmuir what would you suggest?

javanna added a commit to javanna/elasticsearch that referenced this issue Oct 5, 2015
…ry_string

The configured analysis chain should be used instead.

Relates to elastic#13229
@javanna
Copy link
Member Author

javanna commented Oct 5, 2015

while looking into removing locale from query_string and simple_query_string as previously discussed, I bumped into org.apache.lucene.queryparser.classic.QueryParserBase#setLocale in lucene. I am then wondering if the removal is the way forward or something else should happen in lucene first. Seems like if we remove we might remove stuff that is still relevant and useful...

@javanna
Copy link
Member Author

javanna commented Oct 5, 2015

Closing in favour of #9978. Once we have made it possible to use the analysis chain, by taking out components that are not multi term aware, we can go ahead and remove lowercase_expanded_terms as well as locale.

@javanna javanna closed this as completed Oct 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants