New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: Parse too greedy when parsing some formats #12919

Open
ksshannon opened this Issue Oct 13, 2015 · 4 comments

Comments

Projects
None yet
4 participants
@ksshannon
Copy link
Contributor

ksshannon commented Oct 13, 2015

  1. What version of Go are you using (go version)?

    $ go version go version devel +2d823fd Tue Oct 13 01:04:42 2015 +0000 linux/amd64

  2. What operating system and processor architecture are you using?

    linux/amd64

  3. What did you do?

    Attempted to parse this time: "618 AM MDT TUE OCT 13 2015" with this format string: "304 PM MST Mon Jan 02 2006"

  4. What did you expect to see?

    No parsing error, and a time.Time representing that moment.

  5. What did you see instead?

    The parsing fails, and the error reported: parsing time "618 AM MDT TUE OCT 13 2015": hour out of range. The hour is interpreted as 61.

    The format works properly with a two digit hour. It seems getnum() in time/format.go takes up to two characters whenever the second character is valid which leads to the issue.

  6. Playground example with examples that fail and pass for the given format:

    http://play.golang.org/p/mZ-m7gJlvy

Apologies if my code is messy...

The format is used as a time/version stamp for the US National Weather Service in their forecast discussion products, for a real world example, http://www.wrh.noaa.gov/total_forecast/getprod.php?wfo=boi&pil=AFD&sid=BOI&version=1

Thank you.

@ianlancetaylor ianlancetaylor changed the title time.Parse() too greedy when parsing some formats time: Parse too greedy when parsing some formats Oct 13, 2015

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Oct 13, 2015

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

ianlancetaylor commented Oct 13, 2015

Basically, this means that we should change getnum to only take a single digit if taking two digits would cause the value to be out of range. Right now it always takes two digits if there are two digits, and then later checks whether they are out of range. I can't decide whether this change would be a good idea or not.

@ksshannon

This comment has been minimized.

Copy link
Contributor

ksshannon commented Oct 13, 2015

That feels like it would invalidly parse (or attempt to) a lot of values. I think that would cause more problems that solve. Without field width, or field separators, I don't think you can solve all cases for this format (without making assumptions that may make some invalid strings valid). Thank you.

@nerdatmath

This comment has been minimized.

Copy link
Contributor

nerdatmath commented Oct 13, 2015

This particular format is unambiguous, since "304 PM" indicates a 1-2 digit hour followed by exactly 2 digits for minutes, then whitespace, then an AM/PM indicator. The problem is, when you're looking at a starting digit of 1 for the hour, you don't know whether you're dealing with hour 1, 10, 11, or 12. Without the AM/PM indicator (24 hour clock), we would have the same issue with starting digits of 0 or 1.

One way to resolve this would be to look at 304 as a special case, in which we gather 3-4 digits and then take the last 2 as minutes and the first 1-2 as the hour. 30405 could be treated the same way (5-6 digits). But then there's also 102, 106, 12006, 10206, 1022006, and so on. There are 5 variable width numeric fields and 5 corresponding fixed width numeric fields, plus 2 numeric year representations, so 5 * (2^4-1) * 3 + 5 * 2 = 235 unambiguous representations that we don't handle currently. Even more if you allow the variable width field to be somewhere other than at the front.

Still, I can envision code that could handle all those situations gracefully. One option: when encountering a single digit format specifier, verify that the remaining adjacent numeric fields are fixed width, and look ahead to count the number of digits available. If there is an extra digit, parse 2, otherwise parse 1.

If you have adjacent variable width numeric fields in the format, such as "34", this should probably be an "ambiguous layout" error, at least when parsing.

@bitfurry

This comment has been minimized.

Copy link

bitfurry commented Oct 22, 2015

Instead of treating all ambiguities I suggest that getnum function is doing its job perfectly only problem I see is that we need to throw a different error when input layout for HH:MM is given as H:MM / HH:M.
As our program is failing in only these cases only. And also changing getnum will have multiple side effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment