Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject out of range numeric values at index time #25534

Closed
colings86 opened this issue Jul 4, 2017 · 6 comments

Comments

Projects
None yet
5 participants
@colings86
Copy link
Member

commented Jul 4, 2017

Currently if you index a document that contains a numeric value which is out of the range of the numeric type for the field we accept the document and index an infinite value. This can lead to issues like #23003.

Instead we should reject value which are out of range for the selected numeric type (we do this for scaled_float but not currently for float, double and half_float, and we should check integer and long types). For completeness we should also reject explicit infinite values and NaN values in case a script produces these values. JSON does not have a concept of NaN or infinite values so they could only be explicitly inserted by scripts and never from standard index requests.

An open question is whether ignore_malformed should ignore out of range values?

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2017

+1

An open question is whether ignore_malformed should ignore out of range values?

To me this option means "index it if you can, but please never fail, I'd rather have my data not searchable" so I believe we should simply ignore out-of-range values when this option is set to true. Separately maybe we should reopen the discussion about its removal since it makes it too easy to have data not searchable silently, which is barely better than a data loss.

@colings86 colings86 added help wanted and removed discuss labels Jul 7, 2017

@fred84

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2017

@colings86 May I try to solve this issue?

@colings86

This comment has been minimized.

Copy link
Member Author

commented Jul 10, 2017

@fred84 that would be great if you are able to work on this. Please do let me know if you have any questions, and feel free to add me as a reviewer when you have a PR ready

@fred84

This comment has been minimized.

Copy link
Contributor

commented Jul 12, 2017

@colings86 I think it is suitable to duplicate logic from subclasses of Lucene's Field for handling out-of-bound values in half float, float and double. Example: fred84/elasticsearch@2d0e5c1 . Am I going in right direction?

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Jul 12, 2017

I think we should check that Double.isFinite is true for doubles, Float.isFinite is true for floats and that Float.isFinite(HalfFloatPoint.shortBitsToHalfFloat(HalfFloatPoint.halfFloatToShortBits(f))) is true for half floats (converting to a short and back to a float simulates the rounding that happens at index time).

@cbuescher

This comment has been minimized.

Copy link
Member

commented Jan 16, 2018

I think this was adressed by #25826 and can be closed.
@colings86 please reopen if you think there is anything left to do here.

@cbuescher cbuescher closed this Jan 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.