'H' format string broken in strict parsing #1401

tp · 2014-01-07T16:34:40Z

I hit an unexpected error when using 'H' to parse 24h-time values without a leading zero:

> var moment = require('moment');
undefined
> moment.version
'2.5.0'
> moment().format('H')
'17'
> var parsed = moment('17', 'H');
undefined
> parsed.isValid()
true
> var parsed = moment('17', 'H', true);
undefined
> parsed.isValid()
false

I would expect H to be usable here, so that the code can also work with times before noon.

The text was updated successfully, but these errors were encountered:

ichernev · 2014-01-09T16:46:18Z

You're right. We made strict parsing a bit too strict in the last version.

icambron · 2014-01-09T18:04:12Z

Hmm, did we? If H eats 2 tokens whenever possible, that will lead to broken parses:

moment('3141113', 'HDDMMYY', true).format(); //=> '2013-11-14T03:00:00-05:00'
moment('3141113', 'HHDDMMYY', true).format(); //=> 'Invalid date'

That's a big part of why we made strict act the way it does. (Of course, the real solution would be to use a global regex so we get backtracking, but that's a separate discussion)

More generally, why even have "H" and "HH" if they're the same?

ichernev · 2014-01-09T18:16:17Z

Well, we have H and HH for format. I don't think we need them for parsing. For parsing they can mean the same, I'm not sure if other libraries are doing something similar. The same applies for years -- if you use YYYY then you can not parse a year that has different number of digits, which means everybody using YYYY to parse is wrong :) According to the cldr token docs, the number of repetitions (in YYYY -- 4), means pad to that much, meaning bigger years will always be left intact. I guess if you print something with YYYY then you'd expect to parse it back with YYYY (same goes for H).

icambron · 2014-01-09T18:58:26Z

Meaning, you're proposing that in strict mode, H always parses two characters?

I'm OK with that, but it won't allow you to roundtrip H from the formatter back to the parser. That's never possible to guarantee, because the whole point of H as a formatting token is that it outputs a variable number of characters.

I think maybe the best thing to do would be to eliminate H as a parsing token altogether, but we probably can't because that would break too much for non-strict users.

ichernev · 2014-01-14T06:29:35Z

@icambron H is NOT intended to produce a specific number of characters. Its intended to NOT pad to a specific number of characters. In the same way that Y doesn't mean a single digit year, it means the year, verbatim. (YY is a special case that always displays 2).

I see two types of issues here. The year family of tokens (/Y+/, /g+/, /G+/). The problem here is that you don't know how many digits they would generate. There is a lower bound (YYYY would produce at least 4 digits), but not an upper one. I can see these use cases:

regular years (meaning +/- 100 years from now) displayed with 2 digits (YY)
regular years displayed with 4 digits (YYYY)
irregular years (negative, too small, too big) displayed with any number of digits (rest of tokens)

So I suggest

YYYY to parse exactly 4 digits
YY to parse exactly 2 digits
Y to parse anything /[-+]?\d+/

This behavior is kind of different than the formatting behavior, but I think it has the least amount of surprise for most users. Because most users use regular years, and YY and YYYY make sense for them.

The second issue is with all other tokens (h, H, d, D, M, s, w, W..., DDD). These can be mostly one or two digits, and only padding is changed with the different formatting tokens. I think if you want rock solid parsing, you'd use HH (or equivalent) for both formatting and parsing. H would mean -- this comes from a user, and it should match any number of digits (until a non-digit is found). In general you don't pad units going to the user, and user won't pad units when entering. But user also won't put date and month numbers glued together, so greedy scanning works in this case.

So I propose:

non padded version (h, H, D, DDD, ...) mean eager /\d+/ tokenization
padded versions (hh, HH, DD, DDDD) mean strict /\d\d/ tokenization

tp · 2014-01-14T08:46:11Z

@ichernev: I think if you want rock solid parsing, you'd use HH (or equivalent) for both formatting and parsing. H would mean -- this comes from a user, and it should match any number of digits (until a non-digit is found).

But one can neither use H for all 24 hours of the day (see above), nor HH if single-digit hours don't have a leading 0:

var parsed = moment('1', 'HH', true);
parsed.isValid() => false

So what would you suggest?

ichernev · 2014-01-14T16:49:35Z

/\d+/ means any number of digits.

icambron · 2014-01-14T17:44:03Z

@ichernev I think that's what I said too. I just wanted to clarify that you won't be able to guarantee round tripping. It sounds like you're OK with that and only intend H (as a parsing token) to be for user-generated strings. I'm just imagining this ticket:

moment(moment().hours(1).format('HDD'), 'HDD', true); //WHY YOU NO WORK?

If you're OK with answering that with "dude, use HH", then I'm good with it too.

ichernev · 2014-01-14T17:47:43Z

Well, you're shooting yourself in the foot with that code IMHO. So yeah :)

Btw the initial implementation of strict would have worked here -- because it first builds the entire regex, and the second token would be \d\d, so the first \d+ greedy would not match 2 digits -- just one ;-)

icambron · 2014-01-14T17:50:48Z

Right, the advantage of a global regex is that we get backtracking and we don't have to deal with greed as much.

tp mentioned this issue Jan 10, 2014

D/DD format string broken in strict parsing #1412

Closed

ichernev mentioned this issue Jan 14, 2014

Improve non-padded strict tokens #1418

Merged

icambron closed this as completed in #1418 Jan 14, 2014

theurere mentioned this issue Jul 1, 2014

Update moment.js to 2.7.0 strukturag/spreed-webrtc#58

Merged

kkirsche mentioned this issue Feb 10, 2015

Update Moment.js to from 2.0 -> 2.9 rubyloco/rubyloco.com#14

Closed

icambron mentioned this issue Nov 6, 2016

Issue with only passing in time... #3527

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'H' format string broken in strict parsing #1401

'H' format string broken in strict parsing #1401

tp commented Jan 7, 2014

ichernev commented Jan 9, 2014

icambron commented Jan 9, 2014

ichernev commented Jan 9, 2014

icambron commented Jan 9, 2014

ichernev commented Jan 14, 2014

tp commented Jan 14, 2014

ichernev commented Jan 14, 2014

icambron commented Jan 14, 2014

ichernev commented Jan 14, 2014

icambron commented Jan 14, 2014

'H' format string broken in strict parsing #1401

'H' format string broken in strict parsing #1401

Comments

tp commented Jan 7, 2014

ichernev commented Jan 9, 2014

icambron commented Jan 9, 2014

ichernev commented Jan 9, 2014

icambron commented Jan 9, 2014

ichernev commented Jan 14, 2014

tp commented Jan 14, 2014

ichernev commented Jan 14, 2014

icambron commented Jan 14, 2014

ichernev commented Jan 14, 2014

icambron commented Jan 14, 2014