Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is a valid date? #1119

Closed
icambron opened this issue Sep 18, 2013 · 10 comments

Comments

Projects
None yet
3 participants
@icambron
Copy link
Member

commented Sep 18, 2013

While working on a somewhat nicer version of #1098, I discovered we have a lot of isValid() issues. They all basically have the property that there are moments that work perfectly and we expect to work perfectly, but which are considered invalid. They fall into two categories:

Dates that trip our input-matches-output check. These include moment('300', 'DDD') and moment([2009, 13]). This is because we're comparing the input array to toArray() which breaks when we "wrap" units.

A big part of the issue is that the array comparison check in isValid() is useful in that it tells you that the date may be "valid" in a calendar sense, but that it didn't match your input, which comes up for DSTs. That's a different question than whether we failed to parse the input or not, and it's their conflation that makes things weird.

Inputs where the parser has failed but we have a good date anyway. For example, moment('X.SSS', '1234567890.123'), which is used in a test, produces the right value and even round trips with format(), but isn't valid. That's because X as a parser token eats the decimals and SSS can't match. But the time, including the decimals, worked just fine, and "X.SSS" is the right formatting token to print it.

This stuff doesn't use the code in isValid(); when the parser runs out text but still has tokens, it just sets the _isValid property to false, which short-circuits the check.

Here's my proposal, which I'll fold into #1098 if there's enough agreement:

  1. isValid() just means "do we have a real date"? No array comparison.
  2. A new function, matchesInput() tries to tell you if the input matches the output. Basically, it does the array comparison we're removing from isValid(), but we'll keep track if the input is weird, like "DDD" or months > 12, and skip the array check in those case. (What I'm unsure about is whether matchesInput should return true or false for those. I guess true, and then matchesInput() could be called isDSTShifted() or somesuch instead.)
  3. When the parser can't match some of the tokens but still produces a date, it short-circuits matchesInput() to false, but leaves isValid() true, unless strict is set, in which case it forces them both to false.
  4. We can do really ungentle things with moments where isValid returns false, like just have format() return a localized version of "Invalid date" (I discovered all this by implementing that, actually)

What do you guys think?

@ichernev

This comment has been minimized.

Copy link
Contributor

commented Sep 19, 2013

Let me try to reword the 'abnormal' cases:

  • if an input doesn't match our regexes and doesn't match Date's regexes, and Date says Invalid Date
  • if it matches a regex but there is clearly no such date, meaning overflow in any of the units (13 months, 65 minutes, 31 days in a month that only has 30, but NOT DST related (basically whether a UTC date with this spec exists)
  • if it matches a regex, is valid date, but because of the locale is shifted forward or backward because of DST. If there is a gap (clock jumps from 2:00 AM to 3:00 AM) then everything 2:xx AM is subtracted one hour, and if there is overlap, I think Date picks the second one, there is no jump, but the user might have had something else in mind
  • it partially matches a regex, yielding some Date value that is not Invalid Date, but is somehow out of sync with the format. I can see the following subcategories
    • non-tokens didn't match verbatim DD/MM/YYYY matches 01 02 2000
    • tokens left out DD YYYY matches 01
    • strign left out DD YYYY matches 01 2010 yoo-hoo
    • conflicting tokens -- unix epoch seconds and other tokens that do not match

So now the question is what are users interested in, what are they used to, and how can we make it better :)

I think the first two cases are definitely bad, meaning this date should not be used under any reasonable circumstances.

The dst shifting is mostly problematic for tests, I don't think users hit that often, or care about it too much. I guess we can add a reader for that isOnDST which might say whether the date was shifted (gap case, duration of shift might be accessible), or ambiguous (overlap case, duration to other case might be accessible) or neither.

About the fourth case ... if people care they should use strict. We may export parse flags as to which of the 4 sub-cases failed (there might be other), but I'd say that's a bit excessive. If I'm a developer and I decide to use non-strict, and then I get back a date, should I care how accurate was it, and wouldn't it be easier to either use strict, or display the date to a user for verification (presuming the input came from a user).

We can add a semi-strict mode that only matches if there is a mismatch in non-tokens, but the whole string should match exactly and it should make sense (no conflicting tokens).

@icambron

This comment has been minimized.

Copy link
Member Author

commented Sep 19, 2013

To clarify, my goal here is to eliminate false negatives for validity. The user should basically be able to say:

if (!m.isValid()) {
   shitTheBed();
}

That doesn't work right now because way too many perfectly good moments returns false for isValid. See #1075 for another recently fixed example.

Getting to the specifics in terms of your categories, your case 2 actually contains two different cases:

a. Dates that are passed in as overflowing, like passing 13 for the month argument. Our tests currently assume these are valid dates, which is why I listed it, but I'm happy to continue calling them invalid, per your comments. We should just remove them from the tests.

b. Tokens whose implementation uses overflow internally, e.g. moment('300', 'DDD') (i.e. 300th day of the year). These return false for isValid(), because our validity checker doesn't know the difference between "we parsed an overflowing date" and "we used Date's overflow capabilities as an implementation mechanism". It's just a bug.

On case case 3, I largely agree with you that it's not really that important to signal the user that there's a shift. In fact, I'm happy with almost any solution in terms of signaling (or not signaling) to the user that there's a shift, except the current one, which is to return false for isValid (it's actually the same check that makes 2a and 2b return false).

On case case 4, strict doesn't help us here; we're being too strict, as well as inconsistent. The specific subcategory that's problematic is the second one, tokens left out. Just like with case 3, I agree with you that we don't need to signal the user (like you said, they can just use strict if they care about that). But the problem is that we currently do signal them, regardless of strictness. We call the whole thing invalid. We should only consider left-out tokens invalid if we're parsing as strict. I don't think semi-strictness is necessary.

To put this all another way, we currently use isValid to signal both A) we can't find a UTC time that corresponds to that input and B) we found a UTC time that corresponds to that input but we don't think you'll like it. I want to restrict it to A and call everything else something else (or just let them through and not bother informing anyone of anything).

@icambron

This comment has been minimized.

Copy link
Member Author

commented Sep 24, 2013

Ok, after a bit offline discussion and some more thought, here's my clarified proposal:

  1. Add a parsingFlags() which returns an object with details like {dateOverflow: false, unusedTokens: false, isDSTShifted: false} that indicate any weirdness in the parsing.
  2. Change isValid() only return false if we can't find a date or the date has overflowed unexpectedly, or if strict is true and any of the other parsing flags are true

If the parser is permissive, isValid should be too.

@icambron

This comment has been minimized.

Copy link
Member Author

commented Sep 26, 2013

For a bit more evidence about why the current state of affairs results in all kinds of confusion, see #1133, where the undertolerant isValid() combines with the overtolerant parser to result in behavior that's hard to even explain.

@ichernev

This comment has been minimized.

Copy link
Contributor

commented Sep 26, 2013

The clarified proposal looks good. My patch

  • add leftInput that holds the string that was not parsed (or empty string if everything was parsed)
  • dateOverflow should give the id of the overflowing token (what invalidAt currently gives)
  • unusedTokens should return a string (or at least array) of tokens that were not used
  • rename isDSTShifted to dstShifted
  • so unexpected overflow is like 61 minutes, 15 month, right? And expected is when you use day-or-year format token. I agree with that

We can start by putting together a beefy test case. Any volunteers :)

@icambron

This comment has been minimized.

Copy link
Member Author

commented Sep 26, 2013

I'm implementing this now. Good thoughts on the flag values; I'll incorporate that.

@icambron

This comment has been minimized.

Copy link
Member Author

commented Sep 26, 2013

@ichernev You can see work in progress in #1098. It works and passes all the existing tests, but it needs a bunch of new tests of its own. But I could use some feedback in the meantime.

Edit: also @Xotic750

@Xotic750

This comment has been minimized.

Copy link
Contributor

commented Sep 27, 2013

All looks pretty reasonable to me, just from reading the posts. Haven't had any time to play and notice any inconsistencies at present, but if I do I will let you know.

@icambron

This comment has been minimized.

Copy link
Member Author

commented Sep 27, 2013

Moved the work to #1140, since #1098's actually been merged.

@icambron

This comment has been minimized.

Copy link
Member Author

commented Oct 4, 2013

Closed by #1140.

@icambron icambron closed this Oct 4, 2013

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.