Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent NaN float values from being stored #4587

Merged
merged 6 commits into from
Oct 28, 2015
Merged

Prevent NaN float values from being stored #4587

merged 6 commits into from
Oct 28, 2015

Conversation

jwilder
Copy link
Contributor

@jwilder jwilder commented Oct 27, 2015

This PR has a few changes to how points are accepted and stored.

  • Float NaN values were not supported, but were able to be written through the line protocol, collectd, and other inputs. These values caused problem during query time as well as with the the tsm1 engine since they were not intended to be stored. This change returns parse error for NaN float values as well as prevents NewPoint from returning a valid point if given a NaN float value. The tsm1 float encoding will now also return an error if a NaN value is attempted to be encoded.
  • The change to NewPoint returning an error propagated up to the client packages which will require upgrades to users that are not controlling the version of the client that they are using. They will need to handle the error and determine the best thing to do with points that can't be stored.
  • Since many clients already assume NaN can be stored, the inputs have been updated to drop and log (when appropriate) points with invalid field values. Graphite, UDP, Collectd, and OpenTSDB inputs already dropped points so this change makes the behavior consistent across all inputs.
  • HTTP line protocol handler will now write all successfully parsed points instead of failing the entire batch if one failed. When some points fail to parse and the remaining are written successfully, the endpoint returns a partial write error with the failed lines in the response. This makes the behavior more consistent with the other inputs.
  • Since NaN values cannot be parsed correctly now, TSM WAL entries will fail to parse and the WAL will abort startup. TSM data will need to be removed, but we'll log the point data that failed to parse to help with troubleshooting.

Fixes #4521 #4513

@beckettsean
Copy link
Contributor

Thanks, this is a great fix, @jwilder.

Regarding "HTTP line protocol handler will now write all successfully parsed points instead of failing the entire batch if one failed. When some points fail to parse and the remaining are written successfully, the endpoint returns a partial write error with the failed lines in the response." this sounds like a global change to the system, yes? We need to update the docs to reflect this, I believe.

@jwilder
Copy link
Contributor Author

jwilder commented Oct 27, 2015

@beckettsean For the HTTP line protocol handler, previously you would get an error and not points written. With this change you would still get an error, but part of you batch would be written. We already return this error in some cluster failure scenarios (e.g. replication of 3 but only 1 succeeded would return a partial write error depending on the write consistency level).

Float values are not supported in the existing engine and the tsm1
engines.  This changes NewPoint to return an error if a field value
contains a NaN field.  It also allows us to validate fields to prevent
other unsupported types from sneaking in through other input plugins.
Mainly for debugging as since this should not happen going forward.  Since
there may be points with NaN already stored in the WAL, this is helpful for
troubleshooting panics.
This changes the HTTP line protocol handler to behave similar to the other
handler in that they will write as many points as possible.  Previously, we
would fail the entire batch if one point failed.  This can happen more frequently
now with NaN being more explicitly unsupported.  Now it will write as many points
that parse successfully and return a "partial write" error to the client with the
lines that failed to parse.
name, tags, map[string]interface{}{"value": value}, timestamp,
))
)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider changing the return value of this function to support sending back an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I didn't. This func is only called in the tests.

@pauldix
Copy link
Member

pauldix commented Oct 27, 2015

@jwilder does the update break the client? Possible to just do it on the v2 client?

}

func (s *FloatEncoder) Finish() {
if !s.finished {
// write an end-of-stream record
s.finished = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Did it matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The Finish call writes a NaN as sentinel value using Push. I need to mark the stream as finished so I could allow the NaN value to be written via Push. See https://github.com/influxdb/influxdb/pull/4587/files#diff-4fcee93b376ee05d583de8af541239b8R71

@otoolep
Copy link
Contributor

otoolep commented Oct 27, 2015

All generally makes sense, some minor feedback and questions. +1 from me.

@jwilder
Copy link
Contributor Author

jwilder commented Oct 28, 2015

@pauldix Should not affect the current client. The only change is https://github.com/influxdb/influxdb/pull/4587/files#diff-12d7ec39292af18c588a073414f3e853R477 which would return a commented out point with an error message as the comment. That will parse on the server and be ignored. I didn't want to change the signature or panic since either of those would break existing clients.

@jwilder
Copy link
Contributor Author

jwilder commented Oct 28, 2015

The v2 client will have a compile breakage when updating as they will need to handle the error returned by NewPoint now.

@pauldix
Copy link
Member

pauldix commented Oct 28, 2015

@jwilder sounds good, +1

jwilder added a commit that referenced this pull request Oct 28, 2015
Prevent NaN float values from being stored
@jwilder jwilder merged commit 7508a2a into master Oct 28, 2015
@jwilder jwilder deleted the jw-nan branch October 28, 2015 15:28
@Issif
Copy link

Issif commented Oct 28, 2015

Wonderful !!

I will try this tomorrow with next nighlty.

Thanks a lot guys !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants