New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using seconds as the default precision #6041

Open
gunnaraasen opened this Issue Mar 17, 2016 · 21 comments

Comments

Projects
None yet
9 participants
@gunnaraasen
Member

gunnaraasen commented Mar 17, 2016

InfluxDB currently defaults to nanosecond precision for writes and queries. Most other tools and languages use second precision. Given that

  • a common beginner mistake is using timestamps with second precision (resulting in a bunch of 1970 points that trip people up), and
  • significant compression gains can be realized when using seconds as the precision with the TSM engine,

Should we consider changing the default precision for timestamps to seconds?

For reference, the default precision is set here:
https://github.com/influxdata/influxdb/blob/master/models/points.go#L1241

@sparrc

This comment has been minimized.

Contributor

sparrc commented Mar 17, 2016

I'm a big fan of this, currently our approach has been to default to nanoseconds on the database, but recommend that people use seconds. I think it makes sense to default to the time precision that we recommend people use (and what is probably the more common timestamp precision that is necessary too).

Also +1 to most other tools using seconds by default, this would be a boon to new users.

@joelegasse

This comment has been minimized.

Contributor

joelegasse commented Mar 17, 2016

We could add some auto-guessing code for when no precision has been set anywhere. We're talking about three orders of magnitude between precision settings smsusns. I think javascript defaults to milliseconds since epoch. If a ms timestamp gets interpreted as seconds, it'll be over 46000 years in the future, which can't even be represented with an int64 of "nanoseconds since epoch", which only has a span of ±292 years. Going the other direction (interpreting seconds as milliseconds) is still 46 years in the past.

We could simply pick the precision that is closest to the current time with an order of magnitude check.

We would of course still highly recommend setting a precision, but this change would remove some of the shock factor for new users

@gunnaraasen

This comment has been minimized.

Member

gunnaraasen commented Mar 17, 2016

@joelegasse An order of magnitude check sounds like a straightforward way to handle timestamps with a lack of user-supplied precision. We could even generate a log message if a batch contains a timestamp close to 1970 even at second precision.

@sparrc

This comment has been minimized.

Contributor

sparrc commented Mar 17, 2016

didn't think of that, sounds like a good idea 👍

@pauldix

This comment has been minimized.

Member

pauldix commented Mar 17, 2016

I'd much prefer the order of magnitude check than changing the default.

This is really a problem around the client libraries. Client libraries should make it clear which precision you're using thus they wouldn't have the problem.

@joelegasse

This comment has been minimized.

Contributor

joelegasse commented Mar 17, 2016

I think the order of magnitude check would mean there is no longer a "default" precision, right? If we go that route, I think we should also write a warning to the logs for each batch of points that doesn't specify a precision. Something that will tell them, "We're guessing what you meant, but you've probably done something bad, and you should feel bad..." 😛

@gunnaraasen

This comment has been minimized.

Member

gunnaraasen commented Mar 17, 2016

I'd be worried about the logs getting spammed with that message if it was printed on every write without a precision. We already generate a ton of logs.

@pauldix

This comment has been minimized.

Member

pauldix commented Mar 17, 2016

@joelegasse, @gunnaraasen yeah, logging that on every write would be way too loud

@beckettsean

This comment has been minimized.

Contributor

beckettsean commented Mar 19, 2016

Some of the awesome is lost if users don't experience nanosecond support initially, I think. It's very powerful to SHOW that the database is so modern and powerful that it handles nanoseconds with aplomb. If we default to seconds, that's a power-user feature that almost never gets noticed except by the people explicitly looking for it. Not really a strong argument, I know, but I do think it's important to consider the perceptual impact of this change.

Seconds precision is also the default for devops tools, but what are the defaults in the IoT world? What do historians typically use? In APM, milliseconds is the default. The default we pick shows an opinion as to the primary use case. Why not leave that at nanoseconds, which is allegiant to none and forward-looking?

@joelegasse

This comment has been minimized.

Contributor

joelegasse commented Mar 19, 2016

@beckettsean The order-of-magnitude check would replace the concept of a "default precision", and would instead pick the scale that would have the timestamp closest to the current time. Points without a timestamp would still be tagged with the nanosecond-precision time of when they were received by the server.

This check would mitigate some of the confusion/frustration that comes from just assuming an unlabeled timestamp is in "nanoseconds since epoch". It certainly would not be removing support for nanoseconds, but it would mean that users aren't wondering why their data was "lost", when it's really just stored as a couple minutes in to January 1970.

@joelegasse joelegasse added the RFC label Mar 19, 2016

@beckettsean

This comment has been minimized.

Contributor

beckettsean commented Mar 23, 2016

@joelegasse I like the order of magnitude check, provided that query responses continue to provide nanosecond unless otherwise explicitly restricted.

@steverweber

This comment has been minimized.

steverweber commented Apr 3, 2016

would be nice if line protocol supported a timestamp with unit [h,m,s,ms,us,ns].
https://docs.influxdata.com/influxdb/v0.11/write_protocols/write_syntax/#line-protocol
The default should likely stay as ns

disk_free,tag=t  value=1  timestamp[s,ms,us,ns]
# if a blank timestamp is given could still include the precision
disk_free,tag=t  value=1  [s,ms,us,ns]

I never knew I could get significant compression gains when using seconds as the precision wish that was included in the help page for the line protocol.

@gunnaraasen

This comment has been minimized.

Member

gunnaraasen commented Apr 4, 2016

@steverweber Specifying the unit on the timestamp will likely be a feature of the next iteration of the line protocol. See the discussion at #6037 for more details. I've added a comment about allowing precision per point without timestamps [s,ms,us,ns] since it hadn't been suggested before.

I've also opened influxdata/docs.influxdata.com#372 to get the improved compression benefits documented in more places.

Is there a reason you'd prefer the default to remain ns when no precision is provided, versus the order-of-magnitude check suggested above?

@steverweber

This comment has been minimized.

steverweber commented Apr 4, 2016

@gunnaraasen Thanks for managing the suggestions.

Is there a reason you'd prefer the default to remain ns when no precision is provided, versus the order-of-magnitude check suggested above?

As a beginner I assumed the timestamp was in seconds, and failed. If/when #6037 is resolved I assume timestamps being used incorrectly in respect to the line protocol will be largely reduced.

Changing the default from ns seems to require some fun code changes to maintain compatibility. Is the added code complexity worth the gains.. I don't know,

@steverweber

This comment has been minimized.

steverweber commented Apr 4, 2016

Also... what happens when someone really does want to use some strange times that is in a distant past.. Also the fun to read documentation with a paragraph describing this time check nuance.

@gunnaraasen

This comment has been minimized.

Member

gunnaraasen commented Apr 5, 2016

@steverweber thanks for the feedback!

As a beginner I assumed the timestamp was in seconds, and failed.

This is an initial pitfall which would be greatly improved by auto-setting a precision based on the order of magnitude of the timestamp. The order of magnitude check will only occur when no precision parameter is set.

The change would add some documentation and code complexity. However, we frequently have issues opened by new users who write seconds precision timestamps without specifying precision and are confused when their data shows up at Jan 1, 1970. Doing the right thing in the majority cases feels like it trumps sticking with an overly precise default which actively causes confusion among new users.

In terms of maintainability, only clients that don't already set a precision and write points within specific time ranges (1969-1971 and >2400) will need to be updated and it'll probably be a one line code change for most clients.

For the timeline, I think we'd like to get the new version of the line protocol into the 0.13 or 0.14 release and both line protocol versions would be supported for a couple releases to allow a smooth transition.

@steverweber

This comment has been minimized.

steverweber commented Apr 5, 2016

I was playing devils advocate. Looks like a good migration strategy.. Like how the influxdata devs are not afraid to nip theses things in the bud /early/.

@toddboom toddboom added this to the 0.13.0 milestone Apr 5, 2016

@toddboom

This comment has been minimized.

Contributor

toddboom commented Apr 11, 2016

We talked about this as a group last week and decided to go ahead and roll forward with this for v0.13.0. As a reminder, this is only applicable when a precision isn't specified. In other words, the specified precision will always be used, but in the absence of that, we'll try to intelligently guess the precision based on timestamp magnitude.

@shaunwarman

This comment has been minimized.

shaunwarman commented Jan 13, 2017

If a precision is not set during write, will it truncate a timestamp to 10 digits? I'm seeing our node.js client send ms, but we lose 3 digits in influxDB. (We aren't specifying precision).

Yet a query like select * from request where time > now() - 30m shows no results. Where as, the padded select * from request where time > now() - 2455 weeks (47 years ~ 1970 😭 ) starts to show results.

@sparrc

This comment has been minimized.

Contributor

sparrc commented Jan 13, 2017

@shaunwarman if you aren't specifying precision then InfluxDB thinks you are sending nanoseconds, which is why your metrics are close to the epoch.

@shaunwarman

This comment has been minimized.

shaunwarman commented Jan 13, 2017

thanks @sparrc added the precision flag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment