Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Official support for parsing RFC2822 dates #2530

Closed
mitar opened this issue Aug 2, 2015 · 24 comments
Closed

Official support for parsing RFC2822 dates #2530

mitar opened this issue Aug 2, 2015 · 24 comments

Comments

@mitar
Copy link

mitar commented Aug 2, 2015

In #1407 it is claimed that only ISO dates are parsed because others can't be reliably parsed. But RFC2822 date is support by Date since JavaScript 1.0 in all browsers. In fact, RFC2822 date is the most supported format in JavaScript. ISO-8601 was added just in ECMAScript 5.

So, I understand why arbitrary strings might not tried to be parsed automatically, but why not allow parsing RFC2822 strings? They are as clearly defined as ISO strings?

In my opinion moment.js should support what Date supports across platforms, and add more. Not that is supports less?

@mattjohnsonpint
Copy link
Contributor

I begrudgingly agree. I dislike that particular format, but it is standardized and in common use - such as HTTP headers, cookies, etc.

@mattjohnsonpint
Copy link
Contributor

One thing to note, this format has English abbreviations for weekday names and month names. This means to support it properly, we need to use the English strings regardless of the current locale.

@mitar
Copy link
Author

mitar commented Aug 4, 2015

This means to support it properly, we need to use the English strings regardless of the current locale.

Yes. It is just a set of hard-coded strings.

@mattjohnsonpint
Copy link
Contributor

In addition to automatic recognition, we should also have a format specifier for explicit parsing and for display formatting. I suggest "R". This should be functionally equivalent to "ddd, DD MMM YYYY HH:mm:ss [GMT]" - but always in English, and always in UTC.

moment().format("R") // "Tue, 04 Aug 2015 14:38:41 GMT"
moment("Tue, 04 Aug 2015 14:38:41 GMT", "R").isValid() // true
moment("Tue, 04 Aug 2015 14:38:41 GMT", "R", true).isValid() // true
moment("Tue, 04 Aug 2015 14:38:41 GMT").isValid() // true (no deprecation warning)

@ichernev
Copy link
Contributor

Related: #1722

@ichernev ichernev added the todo label Sep 21, 2015
@ben-page
Copy link

Please, this would be very helpful.

@Kicu
Copy link

Kicu commented Dec 14, 2015

Hi, I have the exact same problem with parsing server 'Date' header.

Before this issue gets resolved in some way, are there any workarounds for moment that I could use right now?
I'll probably just do a moment(Date(xxx)) to get rid of the Warning from moment, but I'd much rather have an explicit string with format or any other indicator in the code that this is parsed into date.

Any input on this would be much appreciated ;)

@hairyhenderson
Copy link

@Kicu - the format string is ddd, D MMM YYYY H:mm:ss [GMT].

To parse a date header, you should be able to do:

moment(dateHeader, 'ddd, D MMM YYYY H:mm:ss Z')

And to output the current time into the correct format:

moment.utc().format('ddd, D MMM YYYY H:mm:ss [GMT]')

Hope that helps :)

@mattjohnsonpint
Copy link
Contributor

This should be a constant similar to moment.ISO_8601

@mattjohnsonpint
Copy link
Contributor

One other note. This format will need to recognize the time zone abbreviations that are spelled out in section 4.3 the RFC. While these are obsoleted by RFC 2822, they are still used by some implementations that use the earlier RFC 822 standard. We should recognize these for parsing, but not use them in formatting.

@TracyGJG
Copy link
Contributor

TracyGJG commented Oct 12, 2016

I am looking into a solution for this issue as my first foray in contributing to OSS projects. Here are my finding to date. I would like feedback before commencing, just to make sure I am on the right track.
Moment.pdf, Sorry for the PDF, there was just quite a bit to it and I wanted to retain the layout and style. I will use markdown in future.

@maggiepint
Copy link
Member

@TracyGJG at first glance, I want to mention that there is no reason to do much string cleaning. Instead, just delegate that responsibility to the existing parsing code. This is what the ISO8601 format stuff does.

@ichernev
Copy link
Contributor

@TracyGJG thank you for taking the time to write the specification for RFC 2822. About the replacing before parsing -- the part with extra whitespaces we can probably integrate with our strict parser (it will be a mode, where its still strict but doesn't complain about whitespace), but the part with the comments -- I don't see how this can easily be implemented outside of your proposal.

So given that we'll be replacing stuff we might as well replace the whitespaces and later on see what the community has to offer in terms of improvements.

About the timezone -- only the specified timezones are supported as numbers, and there is a catch-all almost [A-IK-Z] -- I don't completely follow that, but we can add a token that eats up a string-representation of a zone, and then either understand it (if its one of the given ones), or just ignore it.

@maggiepint yes, except the comments and the whitespace. If we decide we don't want to handle them then yes. But people will start complaining at some point. We can ship it half-finished as we did with ISO parsing for so long.

@ichernev
Copy link
Contributor

@TracyGJG I think you can add an rfc2822 parsing function (like we have an iso one) that does the replace and ends up relying on our formats for the rest of the parse, with the addition of timezone token (z and zz are currently only used for formatting, we can make them count of parsing). So basically you won't do any out-of-parser parsing, only string-replace to get rid of space and comments.

If the replace code is too ugly (which it shouldn't really) we might decide to drop it to save some space (and keep the not-so-rfc2822 code).

I hope this will keep it simple and understandable and short. If during implementation it starts to feel different than that we can discuss it again.

@mattjohnsonpint
Copy link
Contributor

Two changes I'd like to make to the request:

  1. Originally I said I wanted the output to always be UTC based. Actually, it should follow the mode of the moment object being formatted. So in UTC mode, it would use a GMT abbreviation. in any other mode, it should use the form with a fixed offset having no colon, +HHMM. It shouldn't change the time value before formatting. We don't need other variations of the format (such as without day of week, etc.)
  2. Let's not have the R token (or variants). We don't have a token for ISO8601, so we shouldn't introduce that convention here. It should simply have a moment.RFC_2822 dummy function similar to our moment.ISO_8601 function. (I'm also fine with having it be detected along with ISO and the aspnet format when parsing without a format specified, though it should be last on the list to test.)

This should reduce the footprint significantly.

@TracyGJG
Copy link
Contributor

Thank you all for you feedback. I will revise my thinking in view of your comments.
@ichernev, the [A-IK-Z] timezones are the military format you will fine hidden in the 2822 spec, of which I an particularly familiar.

@TracyGJG
Copy link
Contributor

@mj1856, I have just sat your PluralSight course on Date and Time Fundamentals and have a question regarding the punctuation of the RFC2822 format. Should we enforce having a comma after the weekday, if the weekday is specified, or should we be flexible. I have constructed a RegExp pattern which enforces the restriction; along with some test cases.
Following cleansing the input string I hope to have a normalised format that will split into upto 4 matches using the above RegExp. I can then feed this into a Date object, extract the toUTCString value and do a string comparison but RegExping the output and comparing the elements to ensure semantic validity, i.e. 13 October 2016 really is a Thursday.

@mattjohnsonpint
Copy link
Contributor

Hope you liked the course! 😄

WRT military times, you almost never see anything other than Z in the wild. But I suppose if we're supporting obs-zone then we should do so in entirety. It can't hurt.

On parsing RFC2822, we should allow with or without day of week, with or without comma. On formatting RFC2822 for output, we should include the day of week and the comma.

Also, don't forget that for this format, all weekday and month names are ENGLISH, regardless of the moment locale in effect.

One thing I'd like to point out before you get too far - you really should look through the existing parser code and try to understand what pieces can be reused, rather than writing this in isolation. You may find that you do not need as much new regex as you think.

Perhaps you would consider opening a pull request to work on as you go? Simply put in the comments "work in progress - do not merge yet". Then you can push to your fork/branch whenever you like and we can see how you're doing and offer suggestions along the way. At the end you can rebase to cleanup. It's a much better process than building up a large commit and waiting for feedback until everything is perfect.

Also, keep in mind we have a backlog of existing pull requests and other issues, and we are all volunteers around here. It is pretty normal to have days or weeks go by before someone can look things over.

@TracyGJG
Copy link
Contributor

TracyGJG commented Nov 7, 2016

Pull Request #3579 provided for RFC2822 Parsing implementation - Near-candidate solution

@TracyGJG
Copy link
Contributor

TracyGJG commented Nov 9, 2016

Iskren @ichernev any chance you could provide some advice in the issues I raised on PR #3579 please?

@TracyGJG
Copy link
Contributor

TracyGJG commented Mar 7, 2017

Candidate solution now ready for merge. Documentation for the website revised to include the new interface (parsing.)

@marwahaha
Copy link
Member

What's the status of this? @mitar @TracyGJG 2.18.0+ has parsing for moment.RFC_2822. Was there anything else we should build?

@TracyGJG
Copy link
Contributor

@marwahaha, I am not entirely sure what the question is. The implementation of RFC 2822 string parsing follows a similar approach to that of ISO 8601 so requires nothing more than Moment already has built in. I have submitted supporting documentation for the web site (http://momentjs.com/docs/#/parsing/string/) and AFAIK the capability is complete and fully functional, until someone discovers an edge case I had not considered.

@ichernev
Copy link
Contributor

He's just asking if this issue can be closed. I think it can :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants