-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: time: make UnmarshalText, UnmarshalJSON enforce strict RFC 3339 semantics #57912
Comments
Hoisting my question from http://go.dev/cl/462285, I wondered how the Would that counter increment every time an RFC3339 parse is performed, with the additional strictness checks being skipped completely if Or would the additional strictness checks be performed, but with failures ignored if |
In general a counter for #56986 needs to increment only when the behavior changes. CL 462285 would need revision to get the increments into exactly the right places. |
If anyone knows or can easily chase down the sources of any almost-RFC3339 times, that would be useful information. |
Note that what actually ended up being implemented for #54580 (in https://go.dev/cl/444277) was for |
Thank you, retitled. |
git svn was modified to accept a single digit hour, as some subversion servers seems to be producing them. |
Here is some data. On my laptop I have a copy of the latest tagged and untagged version of every module stored on the proxy (I throw away all the non-source files and zip them and they fit). badrfc.txt contains all the matches for any of the three invalid RFC3339 times, including in test files. Sometimes a mention in a test file is testing that something is invalid, of course. hour1.txt contains all the matches for times with single-digit hours. The counts of unique lines matched in non-test files is:
comma.txt contains all the matches for times with decimal commas. The non-test line counts are:
Does Elasticsearch maybe localize its RFC3339 strings? In the test files, there are 123 instances of the comment
Does utmpdump maybe localize its RFC3339 strings? 2460.txt contains all the matches for times with bad zones +hh:mm where hh >= 24 or mm >= 60. There are no non-test matches. Disallowing +24:00 and +00:60 seem OK. The outputs mentioning things like elasticsearch and utmpdump give me pause. I really didn't expect any comma times. The quantity of output with shortened hours also gives me pause. This is clearly a common thing to find on the internet. Is there a strong security reason to reject any of these, like there was for #30999? |
Elaborating on the comparison to #30999, we disallowed non-zero octets with leading zeros because Go read those as decimal while BSD-derived systems read them as octal: Go read 18.032.4.011 as 18.32.4.11 while most C libraries read it as 18.26.4.9. That's a major inconsistency that could lead to any number of interesting security problems. Rejecting them outright eliminates the inconsistency: if Go and C both parse successfully, they agree on the result. Are there any known inconsistencies in the handling of these three kinds of RFC3339 times?
Any change we make should have a GODEBUG, of course, but if people have old data with these times, it will mean that they have to set the GODEBUG indefinitely when parsing that data. And if libraries connect to servers or read from programs that generate these times (such as apparently some versions of Subversion, possibly Elasticsearch, and utmpdump), those libraries have to either stop doing direct JSON unmarshal into time.Time, force callers to set the godebug, set the GODEBUG themselves by environment manipulation, or rewrite the entire JSON input before calling unmarshal. These are all sufficiently painful as to caution against making the change at all, without some overriding benefit like the security argument in #30999. With the current data, it sounds like we should permanently remove the 1-digit hour and decimal comma checks. Maybe we could keep the +24:00 and +00:60 behind a GODEBUG, or maybe it's not worth the bother. |
I went looking to see which C programs I could coerce into printing decimal commas in RFC3339 times using locales. Utmpdump doesn't need any coercion at all - it appears to have the decimal comma hard-coded.
|
Utmpdump uses ISO_TIMESTAMP_COMMA_GT: https://github.com/util-linux/util-linux/blob/master/login-utils/utmpdump.c#L113 which adds the comma here: https://github.com/util-linux/util-linux/blob/master/lib/timeutils.c#L454 |
I agree that setting
|
Good points, thanks. More data about how common these are (or alternately why they arise) would still help. |
👋 Hello from AWS SDK for Go team. The issue and rollback CR both mention AWS SDK test(s) breaking.
We just want to clarify was the breakage only in the minio project or were there also issues in the AWS SDK for Go (v1 or v2) projects? It would not surprise me if there were AWS services that return non compliant timestamps with e.g. single digit hours. However, v1 and v2 AWS SDK for Go projects don't rely on |
@aajtodd so far what we know broke is in this issue - #57897 (comment) - we don't have evidence that AWS SDK is broken. yes like you i feel the same ("It would not surprise me if there were AWS services that return non compliant timestamps with e.g. single digit hours"), but that's something we need to chase internally. |
I wonder how much of the compatibility issues arise from the difference between ISO 8601 and RFC 3339. The former is a complicated collection of grammars for valid timestamps, while RFC 3339 is a specific subset of ISO 8601 with the primary intent of avoiding the very compatibility issues we're discussing now. The more systems that claim to use RFC 3339, but not really are making the problem worse as it dilutes the exact meaning of RFC 3339. However, that doesn't explain everything, such as why single-digit hours are so common. For example, ISO 8601 still seems to require a two-digit hour fields as it says "hour is represented by two digits from [00] to [24]." That said, ISO 8601 is behind a paywall and I doubt many people actually read it. Some second hand sources that claim to describe ISO 8601 incorrectly mention that hour fields can have one or two digits.
According to the man page for
The ability to use comma was recently added in Go 1.17 for #43823. Most likely, the impact of that change to
This could also be solved with #21990, where we could somehow support parsing timestamps according to ISO 8601 (or some looser format) instead of RFC 3339.
AWS itself is internally inconsistent about whether it uses RFC 3339 or ISO 8601 as both appear in their documentation: I'm not claiming AWS SDK is broken, just that there's much evidence that ISO 8601 is still in heavy use and the |
Removes timestamp with single digit hour code to avoid confusion. Also fixes lint issues. References: * golang/go#57912 (comment) * GOSDK-2932
Removes timestamp with single digit hour code to avoid confusion. Also fixes lint issues. References: * golang/go#57912 (comment) * GOSDK-2932
I was originally in favor of having the functions which are specifically documented as accepting RFC 3339 strings strictly implement that specification, while retaining However, once I saw the impact of this on existing code I started to lean more towards changing the documentation of With that said, I do maintain an application which itself exposes RFC 3339 parsing to its users, originally via Go's implementation, and that application is not required to consume such a wide range of input and so I prefer to strictly require RFC 3339 to minimize the risk of someone accidentally relying on a behavior that isn't guaranteed nor documented. In response to the decision to revert the stricter parsing in stdlib I have for now just copied the stricter parser into my own package so that my application's interpretation of timestamps will be independent of the Go stdlib's. f the Overall it feels most pragmatic to loosen the documented contract for the existing functions and let those with stricter requirements (like me) solve that independently, and so I'm a soft 👎 for this proposal as currently written, but I personally wouldn't be negatively effected by the change and would slightly benefit from it. |
Removes timestamp with single digit hour code to avoid confusion. Also fixes lint issues. References: * golang/go#57912 (comment) * GOSDK-2932
Removes timestamp with single digit hour code to avoid confusion. Also fixes lint issues. References: * golang/go#57912 (comment) * GOSDK-2932
Regarding the comma (',') in seconds. The grammar in the appendix A of RFC 3339 describes ISO 8601:1998 but the Internet Date/Time Format is defined in section 5 of the RFC. The two grammars differ. Section 5 does not support the comma, but appendix A does. There are a lot of differences: e.g. Appendix A allows two-digit years, Section 5 requires four-digit years. The Internet Date/Time Format specifies a subset of date/time stamps defined by ISO8601. By the way ISO8601 (or at least ISO 8601:2004) says that the comma is the preferred sign. |
Change https://go.dev/cl/521501 mentions this issue: |
Work for #54580 made RFC3339 parsing adhere more strictly to the RFC. This was treated as a bug fix instead of a proposal, but we have evidence that it breaks at least some tests in the AWS SDK, so we've rolled it back for Go 1.20.
This new issue proposes to roll the change forward for Go 1.21, disabled by GODEBUG=timerfc3339strict=0. Thanks to #56986, this will mean that Go programs with 'go 1.20' or earlier in their go.mod files will get the lax parsing by default. Only modules that say 'go 1.21' or later will get the strict parser.
The text was updated successfully, but these errors were encountered: