Please reconsider 'guessing' whether a timestamp is in milliseconds or seconds #7940

ariebovenberg · 2023-10-27T12:28:33Z

Initial Checks

I confirm that I'm using Pydantic V2

Description

The following surprising behavior recently tripped me up. When parsing datetimes from unix timestamps, Pydantic
determines automagically whether you mean seconds or milliseconds:

(from the docs)

int or float; assumed as Unix time, i.e. seconds (if >= -2e10 and <= 2e10) or
milliseconds (if < -2e10or > 2e10) since 1 January 1970

This seems sensible at first...but if you're working with milliseconds, Pydantic will decide that most timestamps in 1970 and 1969 are just too small, and they will be interpreted as seconds instead — leading to wildly different times.

While most Pydantic users won't encounter this, it is not a stretch to imagine some Pydantic users are handling data from the 1970s and their validation library, ironically, is silently changing their timestamps.

If you'd rather not change this, please consider adding a big red warning to the docs on the implications of this behavior on timestamps in 1970 and 1969.

Example Code

from datetime import datetime
from pydantic import BaseModel

class Foo(BaseModel):
    a: datetime

# some timestamps in ms
recent = datetime(2023, 10, 4, 12, 22).timestamp() * 1000
apollo17_launch = datetime(1972, 12, 7, 5, 33).timestamp() * 1000
apollo13_launch = datetime(1970, 4, 11, 19, 13).timestamp() * 1000

print(Foo(a=recent))           # 2023-10-04 (correct)
print(Foo(a=apollo17_launch))  # 1972-12-07 (correct)
print(Foo(a=apollo13_launch))  # 2245-11-14 (INCORRECT)

Python, Pydantic & OS Version

pydantic version: 2.4.2
        pydantic-core version: 2.10.1
          pydantic-core build: profile=release pgo=false
                 install path: /Users/arie/.pyenv/versions/3.12.0/envs/sandbox312/lib/python3.12/site-packages/pydantic
               python version: 3.12.0 (main, Oct  3 2023, 08:33:34) [Clang 15.0.0 (clang-1500.0.40.1)]
                     platform: macOS-14.0-arm64-arm-64bit
             related packages: typing_extensions-4.8.0

The text was updated successfully, but these errors were encountered:

samuelcolvin · 2023-10-27T16:01:33Z

This has been around since the very earliest days of pydantic, it has saved me personally hours in combined time not having to thing about whether something is in seconds or milliseconds.

I won't be changing the default.

But I would consider a config switch or similar to stop using the watershed.

ariebovenberg · 2023-10-27T17:19:03Z

Thanks for the quick reply.

I will certainly agree the behavior is more convenient 😄, and probably saves many developers a lot of time.

Although I am curious: is there another validation library or language that also does it this way? Part of the reason for my bemusement is that I've never encountered this type of behavior before 😅.

Regarding a potential config switch: how about an option called timestamp_unit which can be set to either "auto" (the current behavior and ongoing default), "s", or "ms".

samuelcolvin · 2023-10-28T18:52:52Z

Solution sounds good to me, "infer" or "guess" might be better than "auto".

samuelcolvin · 2023-10-28T19:03:43Z

I'm just realized you're the he same person who asked the question on HN, sorry for my dumb initial response. 🙏

By the way, to fix this I think you'll need to update speedate, then pydantic-core.

ariebovenberg · 2023-10-28T19:21:06Z

I'm just realized you're the he same person who asked the question on HN, sorry for my dumb initial response. 🙏

Thanks for mentioning. Combined with your twitter post today regarding issue submission quality, I was concerned I hadn't brought it up correctly.
In any case...the HN post wasn't the lively discussion I had hoped it to be 🦗.

Solution sounds good to me, "infer" or "guess" might be better than "auto".

"infer" has a nice, neutral sound to it.

I won't be changing the default.

I do feel compelled to say that I disagree with keeping the default behavior like this. If you've already made up your mind, I won't argue — it's your project after all, and there might be some context I'm missing.
But would you be open to putting this question to the community/contributors in some way?

Part of me really wants to write a blog post to convince the world 🔥 😉 , but it's a lot more constructive to work with you all on this 🤝

By the way, to fix this I think you'll need to update speedate, then pydantic-core.

Happy to submit the PRs — if you're OK with a slow pace. I'll need to brush up on Rust, and the codebase in general.

samuelcolvin · 2023-10-28T20:00:11Z

No you definitely weren't the trigger for my twitter idea.

I'm not pretending this is a democracy, and I'm not resorting to the tyranny of the masses / hegemony of the upvote.

More or the point, we definitely can't change the behaviour anyway until V3.

Obviously happy to accept the config switch asap.

The only thing I'd consider in V3 is to move away from a single watershed, to a ranges which are accepted, with errors outside those ranges. That would mean it doesn't "suddenly" change.

E.g. timestamps are only accepted 1980-2100 or whatever in "infer" mode.

ariebovenberg · 2023-10-28T20:21:12Z

E.g. timestamps are only accepted 1980-2100 or whatever in "infer" mode.

Indeed, this is the way. It would remove any ambiguous cases, in favor of an explicit boundary 🙌 . Actually, I think you can even move the upper bound way up 🤔. The only requirement is that there is no overlap between the numerical ranges...

Sadly we can't move it all the way to datetime.max, but at least to 3000...

maximum timestamp (year 9999) in seconds is: 253_402_210_800
minimum timestamp (year 1980) in seconds is 315_529_200
maximum timestamp (year 9999) in milliseconds is: 253_402_210_800_000
minimum timestamp (year 1980) in milliseconds is 315_529_200_000

or alternatively, put the lower bound at 1985 and you don't need an upper bound — maybe cleaner?

ariebovenberg · 2023-10-28T20:36:07Z

I'm not pretending this is a democracy, and I'm not resorting to the tyranny of the masses / hegemony of the upvote.

Agree that decisions shouldn't be taken this way. Although, discussion could yield interesting perspectives from people that use Pydantic daily.

The only thing I'd consider in V3 is to move away from a single watershed, to a ranges which are accepted, with errors outside those ranges. That would mean it doesn't "suddenly" change.

Makes total sense to save this for a breaking release. Also logical to make the transition itself as small as possible.

Assuming that defaults are trivial to change, the big decision can be deferred if you like. What's the ballpark date for V3? This will also help me plan my work on the PRs.

samuelcolvin · 2023-10-28T20:50:21Z

I guess V3 should be expected in the middle of next year.

But as I said, we can add the config flag asap.

sydney-runkle · 2023-10-29T01:36:02Z

@ariebovenberg,

I'd be happy to review your PRs for the config flag in both pydantic and pydantic-core. Thanks for your work on this 😄

PrettyWood · 2023-10-29T18:39:41Z

I'm also interested in making sure we can ensure the unit used for timestamps. Our usecase is to validate data pipelines created by clients and running with polars.

@ariebovenberg I would be more than happy to work on speedate myself if you don't have time. You tell me!
@samuelcolvin Would you also consider adding extra units to have the full range 'ns', 'us', 'ms' and 's'?

samuelcolvin · 2023-10-29T20:49:25Z

Yes more units should be fine on e we have agreed ranges.

ariebovenberg · 2023-10-30T06:55:39Z

@PrettyWood Thanks for the offer, but I'd like to take a stab at it myself first. I'll let you know if it doesn't work out.

My 2 cents on more time units: Makes sense to support it, but note that this would make the "infer" behavior very broad. I'd myself prefer a timestamp of 1.000.000 times the usual size would raise an exception, instead of being silently interpreted as nanoseconds 🤔. An alternative would be "infer_ms_or_s" but that also feels a bit wonky 🙃

ariebovenberg · 2023-11-08T21:20:48Z

A naive implementation of this in speedate does cause lots of breaking changes (i.e. a new parameter to parse_str, parse_bytes, etc). Let me know what you think in the PR 🤔

ariebovenberg · 2023-11-27T14:02:52Z

@samuelcolvin could you have a look at the speedate PR? Some open questions are blocking progress at the moment.

ariebovenberg · 2023-12-26T12:05:41Z

@PrettyWood you're welcome to take over this issue, if you like. I won't have time for it in the near future. Feel free to build on my earlier "work in progress" PRs, or to start over 👍

PrettyWood · 2023-12-26T12:49:18Z

Ok I'll take some time to work on it soon

ariebovenberg added bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation labels Oct 27, 2023

sydney-runkle added change Suggested alteration to pydantic, not a new feature nor a bug and removed bug V2 Bug related to Pydantic V2 pending Awaiting a response / confirmation labels Oct 27, 2023

sydney-runkle added the awaiting author response awaiting response from issue opener label Oct 27, 2023

sydney-runkle added feature request and removed awaiting author response awaiting response from issue opener change Suggested alteration to pydantic, not a new feature nor a bug labels Oct 28, 2023

sydney-runkle assigned samuelcolvin Oct 30, 2023

This was referenced Nov 3, 2023

[WIP] Implement timestamp_unit setting pydantic/pydantic-core#1052

Draft

Support explicitly parsing timestamps as seconds/milliseconds pydantic/speedate#54

Open

sydney-runkle mentioned this issue Feb 15, 2024

Unable to parse Unix Timestamp #8828

Closed

1 task

davidhewitt mentioned this issue Sep 23, 2024

Add a ser_json_datetime config option for configuring how datetimes are serialized. #10454

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please reconsider 'guessing' whether a timestamp is in milliseconds or seconds #7940

Please reconsider 'guessing' whether a timestamp is in milliseconds or seconds #7940

ariebovenberg commented Oct 27, 2023 •

edited

Loading

samuelcolvin commented Oct 27, 2023

ariebovenberg commented Oct 27, 2023

samuelcolvin commented Oct 28, 2023

samuelcolvin commented Oct 28, 2023

ariebovenberg commented Oct 28, 2023 •

edited

Loading

samuelcolvin commented Oct 28, 2023

ariebovenberg commented Oct 28, 2023

ariebovenberg commented Oct 28, 2023

samuelcolvin commented Oct 28, 2023

sydney-runkle commented Oct 29, 2023

PrettyWood commented Oct 29, 2023

samuelcolvin commented Oct 29, 2023

ariebovenberg commented Oct 30, 2023 •

edited

Loading

ariebovenberg commented Nov 8, 2023

ariebovenberg commented Nov 27, 2023

ariebovenberg commented Dec 26, 2023

PrettyWood commented Dec 26, 2023

Please reconsider 'guessing' whether a timestamp is in milliseconds or seconds #7940

Please reconsider 'guessing' whether a timestamp is in milliseconds or seconds #7940

Comments

ariebovenberg commented Oct 27, 2023 • edited Loading

Initial Checks

Description

Example Code

Python, Pydantic & OS Version

samuelcolvin commented Oct 27, 2023

ariebovenberg commented Oct 27, 2023

samuelcolvin commented Oct 28, 2023

samuelcolvin commented Oct 28, 2023

ariebovenberg commented Oct 28, 2023 • edited Loading

samuelcolvin commented Oct 28, 2023

ariebovenberg commented Oct 28, 2023

ariebovenberg commented Oct 28, 2023

samuelcolvin commented Oct 28, 2023

sydney-runkle commented Oct 29, 2023

PrettyWood commented Oct 29, 2023

samuelcolvin commented Oct 29, 2023

ariebovenberg commented Oct 30, 2023 • edited Loading

ariebovenberg commented Nov 8, 2023

ariebovenberg commented Nov 27, 2023

ariebovenberg commented Dec 26, 2023

PrettyWood commented Dec 26, 2023

ariebovenberg commented Oct 27, 2023 •

edited

Loading

ariebovenberg commented Oct 28, 2023 •

edited

Loading

ariebovenberg commented Oct 30, 2023 •

edited

Loading