Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can js-joda parse "UTC strings" out of the box? #549

Closed
cubuspl42 opened this issue Oct 5, 2021 · 20 comments
Closed

Can js-joda parse "UTC strings" out of the box? #549

cubuspl42 opened this issue Oct 5, 2021 · 20 comments
Labels

Comments

@cubuspl42
Copy link

cubuspl42 commented Oct 5, 2021

It appears that there is an old date/time representation format, sometimes referred to as "UTC string". It was originally specified in RFC 7321 (HTTP/1.1 Semantics and Content). String in this format are returned by Date.prototype.toUTCString.

ECMAScript standard does not require Date.parse to support this format.

The function first attempts to parse the String according to the format described in Date Time String Format (20.3.1.15), including expanded years.

(Date Time String Format (20.3.1.15) means ISO 8601-like, e.g. 1995-02-05T00:00)

If the String does not conform to that format the function may fall back to any implementation-specific heuristics or implementation-specific date formats.

It seems that, in practice, some popular JavaScript implementations decide to support "UTC string" as one of implementation-specific date formats. It kind of makes sense, as it mirrors toUTCString. I haven't found any documentation to support this, though. It can be verified empirically. Unfortunately, depending on such implementation-specific behaviour is not portable or guaranteed to be forward-compatible, especially when the implementation-specific behaviour is undocumented.

What is more, MDN recommends to use third-party libraries to parse implementation-specific date formats like "UTC string":

It is not recommended to use Date.parse as until ES5, parsing of strings was entirely implementation dependent. There are still many differences in how different hosts parse date strings, therefore date strings should be manually parsed (a library can help if many different formats are to be accommodated).

Strings in "UTC string" format are returned at least by Node.js Firebase Admin Auth, but probably many more.

Is there a simple way to parse "UTC strings" using js-joda? I've searched the API and the source code, without success.

@pithu
Copy link
Member

pithu commented Oct 5, 2021

Not out of the box, but you can define a parser for it. see code below.
The advantage of defining patterns is, that parsing is very fast.

let df = DateTimeFormatter.ofPattern("EEE, dd MMM yyyy HH:mm:ss z").withLocale(Locale.ENGLISH);
let z = ZonedDateTime.parse('Tue, 05 Oct 2021 17:08:24 GMT', df)
z.format(df); //  'Tue, 05 Oct 2021 17:08:24 GMT'

Does this answer your question ?

@cubuspl42
Copy link
Author

@pithu It kind of does... But at the same time, it shows why it would be beneficial for the standard format like "UTC String" / RFC 7321 to be included in the library itself.

Your hand-crafted solution has a bug, exactly like the one I've crafted myself...

@pithu
Copy link
Member

pithu commented Oct 5, 2021

@pithu It kind of does... But at the same time, it shows why it would be beneficial for the standard format like "UTC String" / RFC 7321 to be included in the library itself.

There many, many time format patterns, with often mis-leading names and unclear spec. We have no plans to support any special patterns, beside the ones defined in threeten.

Your hand-crafted solution has a bug, exactly like the one I've crafted myself...

Whats the bug ?

@cubuspl42
Copy link
Author

cubuspl42 commented Oct 5, 2021

There many, many time format patterns, with often mis-leading names and unclear spec. We have no plans to support any special patterns, beside the ones defined in threeten.

I won't disagree with the fact that there are many date/time string patterns in the wild! It's true. But at the same time, js-joda is a JavaScript library, not a Java one. It would still, in my opinion, be great to include the counterpart of the toUTCString (I've put all the arguments in the original message), even though it wouldn't be probably that useful in the Java ecosystem.

I'm personally a great fan of js-joda, as it solves the problem in the way that feels right for me. I'm sad that it's not as popular as Moment / Luxon. I believe that leaning more towards JS, less towards Java, could help a bit! Without sacrificing the fundamental ideas that Java's Joda got right, of course.

In the end, the decision will be yours, of course.

Whats the bug ?

I was just editing the answer to include the code 😉

const instant = ZonedDateTime.parse('Tue, 05 Oct 2021 17:08:24 GMT', formatter).toInstant();
instant.toString();
// Expected: "2021-10-05T17:08:24Z"
// Received: "2021-10-05T16:08:24Z"

It seems that "GMT" parsed as z gives "Europe/Jersey" timezone, as you've observed, which currently uses daylight saving time. For me 17:08:24 GMT should always be 17:08:24 [UTC] (and I'm pretty sure that was the idea in RFC 7321).

I guess the correct format is EEE, dd MMM yyyy HH:mm:ss 'GMT'. Poor guys who would implement and deploy the former solution into production when DST is off in "Europe/Jersey"!

@pithu
Copy link
Member

pithu commented Oct 5, 2021

const instant = ZonedDateTime.parse('Tue, 05 Oct 2021 17:08:24 GMT', formatter).toInstant();
instant.toString();
// Expected: "2021-10-05T17:08:24Z"
// Received: "2021-10-05T16:08:24Z"

Ok, looks like this is a bug, parsing GMT in a Locale context, we are open for a PR to fix this

@pithu pithu added the bug label Oct 5, 2021
@pithu
Copy link
Member

pithu commented Oct 6, 2021

I won't disagree with the fact that there are many date/time string patterns in the wild! It's true. But at the same time, js-joda is a JavaScript library, not a Java one. It would still, in my opinion, be great to include the counterpart of the toUTCString (I've put all the arguments in the original message), even though it wouldn't be probably that useful in the Java ecosystem.

Not sure if i got your point right, i agree js-joda is a javascript library, we used to have plans to make js-joda more "javascriptish". But since the temporal proposal gets closer, we see the future of js-joda more as a one to one port of the threeten and java.time API.

And yes, the js-joda documentation is more then improvable, so it's hard to find, but the recommended way to convert from native js to js-joda is using the convert and nativeJs helper functions. eg. const instant = Instant.from(nativeJs(new Date())) and back like that const nativeDate = convert(ZonedDateTime.from(nativeJs(new Date()))).toDate().

And actually i find the threeten pattern parsing quite powerful and extendable to any use-case (as long as you don't stumble over this terrible GMT convert bug, we just found). Yes, it could be much better documented, true!

But IMO much better than to provide a bunch of special implemented custom made parser functions that we would have to attach to each Temporal. How would the API look like ? parseRFC7231() implemented for Instant, ZonedDateTime and OffsetTime. And if we would add it to the core package, we could not use the DateTimeFormatter because it needs the locale and the timezone package for that case. Would mean a special, extra "handmade" implementation just for this format, that is quit outdated, at least as how i understand it.

@pithu
Copy link
Member

pithu commented Oct 6, 2021

I could narrow down the bug, its located here

for (const id of ZoneRulesProvider.getAvailableZoneIds()) {

The corresponding threeten code is here and it seems that there have been some fixes for that function over the time, that we did not port. https://github.com/ThreeTen/threetenbp/blob/973f2b7120d2c173b0181bde39ce416d1e8edfe0/src/main/java/org/threeten/bp/format/DateTimeFormatterBuilder.java#L3385

pithu added a commit that referenced this issue Oct 6, 2021
@cubuspl42
Copy link
Author

cubuspl42 commented Oct 6, 2021

But since the temporal proposal gets closer, we see the future of js-joda more as a one to one port of the threeten and java.time API.

Oh, I get it. I've heard about that proposal some time ago, but didn't know it's close to being released.

but the recommended way to convert from native js to js-joda is using the convert and nativeJs helper functions

Oh, I did't know that... But still, like I said earlier, ECMAScript doesn't guarantee that new Date / Date.parse support the discussed format, so the whole solution would still rely on undocumented implementation-specific behavior.

And actually i find the threeten pattern parsing quite powerful and extendable to any use-case

I agree! Still, meany great libraries and programming interfaces I know provide both powerful and extendable mechanisms and built-in ready-to-use solutions for popular problems or standard formats / concepts / schemes. I understand that there's always a cutoff line, though; nobody wants their libraries bloated.

In my specific case, even ignoring the GMT bug (I've just learned that this is actual bug, I thought it's just unexpected consequence of these mysterious tz databases, or something like that), it took me a while to build the proper pattern string. Actually, finding the right number of repetitions for these pattern letters made me a bit irritated, if I was to be honest. At the same time, I knew I just wanted to parse the opposite of ECMA-standard toUTCString without relying on undocumented stuff. With teammates with attitude like "Man, new Date works here, don't overcomplicate it".

But IMO much better than to provide a bunch of special implemented custom made parser functions that we would have to attach to each Temporal. How would the API look like ? parseRFC7231() implemented for Instant, ZonedDateTime and OffsetTime.

We're going into details now, from my perspective. I don't think such method would be needed for all these types, because RFC7231 date/time string seems to conceptually represent instants (the GMT suffix is obligatory).

IMF-fixdate = day-name "," SP date1 SP time-of-day SP GMT

So we've went down to one parseRFC7231, not three, assuming that parseRFC7231 method is the chosen path.

And if we would add it to the core package, we could not use the DateTimeFormatter because it needs the locale and the timezone package for that case

I don't think it needs timezone database, as the "GMT" suffix is obligatory, and I don't think it actually needs the en locale, as It uses exactly 19 English words (the whole grammar is about 40 lines long).

Would mean a special, extra "handmade" implementation just for this format, that is quit outdated, at least as how i understand it.

Yeah, probably. Well, the format is as outdated as the toUTCString standard method, which is about to stay there! But we've come to the essential point here, i.e. opinion whether this format is actually popular enough to be worthy of inclusion. I'm definitely biased, as I needed to parse it just yesterday.

In my personal opinion, 1. being the mirror of toUTCString, where a standard counterpart is lacking. 2. being a part of HTTP RFC and 3. being a part of a popular cloud provider JS framework, could be enough. But that also depends on the difficulty of implementing such custom format. I don't know if it requires 50, 100 or 1000 lines.

@pithu Thank you very much for all your responses, it was great to see that js-joda has such quick and good support from the authors. As for the RFC7231 string format, everything has been said. I'll leave the decision to your best judgement! If you decide that it would be acceptable to have such custom implementation, but the project has no resources to provide it (i.e. "waiting for PR"), please also note it.

From my perspective the issue is closed (as it was stated as a question, which was answered). I see that you use it to track the "GMT bug", so I'll leave eventual closing to you.

@pithu
Copy link
Member

pithu commented Oct 6, 2021

@cubuspl42 just for curiosity. What is your special use case for parsing an http-date RFC7231 string ?

@cubuspl42
Copy link
Author

It's not really special, I use (the mentioned before) Node.js Firebase Admin Auth library for automated management of Firebase users. Why they chose this format, god knows.

@pithu
Copy link
Member

pithu commented Oct 6, 2021

From my perspective the issue is closed (as it was stated as a question, which was answered). I see that you use it to track the "GMT bug", so I'll leave eventual closing to you.

Thanks @cubuspl42 for your input, i will leave the Issue open and use it for the bug.

pithu added a commit that referenced this issue Oct 7, 2021
@pithu pithu closed this as completed Oct 7, 2021
pithu added a commit that referenced this issue Oct 9, 2021
@pithu
Copy link
Member

pithu commented Oct 10, 2021

I stumbled over the RFC_1123_DATE_TIME implemented in threeten.

I managed to port it to @js-joda/locale as a built-in formatter. it will be available in the next locale release. So @js-joda/locale is still required, but it will work with any locale (EN and non EN locales)

@cubuspl42
Copy link
Author

I stumbled over the RFC_1123_DATE_TIME implemented in threeten.

I managed to port it to @js-joda/locale as a built-in formatter. it will be available in the next locale release. So @js-joda/locale is still required, but it will work with any locale (EN and non EN locales)

Woah, that's great news! 🥳

Thank you, I'll switch to the built-in formatter as soon as it's available in NPM!

@cubuspl42
Copy link
Author

Just out of curiosity, I've tried to understand this standards / RFC dependency hell...

So, if I got this correctly:

The ultimate specification for toUTCString lives in ECMA-262 (ECMAScript language specification), which doesn't refer to any RFC.

MDN interprets the algorithm specified in ECMA-262 as "based on" RFC 7231 (HTTP/1.1 Semantics and Content).

From RFC 7231, chapter 7.1.1.1. Date/Time Formats:

The preferred format is a fixed-length and single-zone subset of the date and time specification used by the Internet Message Format [RFC5322].

(full format specification follows, including grammar)

RFC 5322 (Internet Message Format) states, broadly, that...

This specification is an update to [RFC2822], which itself superseded [RFC0822], updating it to reflect current practice and incorporating incremental changes that were specified in other RFCs such as [RFC1123].

From RFC 5322, chapter 3.3. Date and Time Specification:

[...] This section specifies the syntax for a full date and time specification.

(full format specification follows, including grammar)

RFC 5322 recommends switching to offset-based time zones, but still supports obs-zone ("obsolete zones"), re-redefining the legacy zones like in RFC 822).

From RFC 1123 (Requirements for Internet Hosts -- Application and Support), chapter 5.2.14 RFC-822 Date and Time Specification: RFC-822 Section 5

The syntax for the date is hereby changed to:
date = 1*2DIGIT month 2*4DIGIT
[...]

(so this RFC just amends RFC 822 when it comes to date formats)

From RFC 822 (Standard for ARPA Internet Text Messages), chapter 5.1:

(full format specification, supporting time zones: UTC ("UT" back then?) / "GMT" and a few (American) time zones: "EST" / "EDT / "CST" / "CDT" / "MST" / "MDT" / "PST" / "PDT" and military zones, whatever they are/were)

Time zone may be indicated in several ways. "UT" is Universal Time (formerly called "Greenwich Mean Time"); "GMT" is permitted as a reference to Universal Time. The military standard uses a single character for each zone. "Z" is Universal Time. "A" indicates one hour earlier, and "M" indicates 12 hours earlier; "N" is one hour later, and "Y" is 12 hours later. The letter "J" is not used. The other remaining two forms are taken from ANSI standard X3.51-1975. One allows explicit indication of the amount of offset from UT; the other uses common 3-character strings for indicating time zones in North America.

So, it seems that the provided implementation supports:

  • A superset of RFC 7321 (as it supports a fixed-length single zone format)
  • A sane subset of RFC 5322, RFC 1123, RFC 822 (without the weird stuff, like military zones, I guess?)

@InExtremaRes
Copy link
Collaborator

@pithu Did you update the TS type definitions for the new "stock" formatter? The types should be augmented when the locale package is imported, I think.

@InExtremaRes
Copy link
Collaborator

Also I'm curious. I think is great to have this new parser/formatter built into the library but this could also be solved using nativeJs right? I mean something like this:

const i = Instant.from(nativeJs(new Date('Tue, 05 Oct 2021 17:08:24 GMT')));

This way you ensure compatibility with whatever algorithm Date is using. I think I had to use this format only once in my life and a code like that was good enough for me.

@pithu
Copy link
Member

pithu commented Oct 10, 2021

As mentioned above, its not recommended to use Date for parsing -> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/parse.

And as its part of the threeten API, i guess its fine to have it, even if it seems to be a kind of old fashion format.

@cubuspl42
Copy link
Author

cubuspl42 commented Oct 10, 2021

@InExtremaRes

This way you ensure compatibility with whatever algorithm Date is using.

That's exactly the problem, it's unspecified what algorithm Date is using, beside that it must support a single specific ISO format. All implementations are free to choose any set of extra formats, maybe including "UTC String" (RFC 7321 / 1123...), maybe not.

So it boils down to whether you code for a standard (if possible!), or just for a set of implementations that you want to support. It's a broad (and a bit philosophical) issue, but I'm personally on the "standard if possible" side.

@pithu
Copy link
Member

pithu commented Oct 10, 2021

@pithu Did you update the TS type definitions for the new "stock" formatter? The types should be augmented when the locale package is imported, I think.

Good point, i will add it

@pithu
Copy link
Member

pithu commented Oct 10, 2021

So, it seems that the provided implementation supports:

  • A superset of RFC 7321 (as it supports a fixed-length single zone format)
  • A sane subset of RFC 5322, RFC 1123, RFC 822 (without the weird stuff, like military zones, I guess?)

I did not went through this specs, but yes probably its very close to this.

It expects fixed-length format up to the timezone, but actually it would accept any timezone, that is defined in the @js-joda/timezone package, so not only things like GMT, UTC or EST, but even things like Europe/Berlin. To fix that, there is some more work to do in the locale print parser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants