-
Notifications
You must be signed in to change notification settings - Fork 19
Trailing whitespace in LISTs should not be fatal #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trailing whitespace in LISTs should not be fatal #8
Conversation
On an account hosted at yahoo.co.jp one of the messages you get at signup has a BODYSTRUCTURE that includes trailing whitespace in the nested paren list rep for multipart messages, specifically, whitespace after the "body-fld-dsp" and preceding "body-fld-lang" per the grammar. The whitespace violates RFC 3501 but it seems silly to fail to parse the BODYSTRUCTURE in this case. The raw FETCH line is: ``` * 2 FETCH (BODYSTRUCTURE (("text" "plain" ("charset" "ISO-2022-JP") NIL NIL "7bit" 2515 64 NIL NIL NIL NIL)("text" "html" ("charset" "ISO-2022-JP") NIL NIL "7bit" 7723 151 NIL NIL NIL NIL) "alternative" ("boundary" "pzu8t3naw7b5therma1t") NIL ) FLAGS (\Seen) INTERNALDATE "30-Oct-2014 03:23:45 +0000" UID 2 BODY[HEADER.FIELDS (FROM TO CC BCC SUBJECT REPLY-TO MESSAGE-ID REFERENCES)] {312} Message-Id: <5451af3d-000e6070-af32c615041a5c3dae408c7ef67e5444-18793@tx203.storage.kks.yahoo.co.jp> From: box-master@mail.yahoo.co.jp To: gaialibs@yahoo.co.jp Subject: Yahoo! JAPAN =?ISO-2022-JP?B?SUQbJEIkNEVQTz8kTiQqNVJNTSRYIUobKEJZYWhvbyE=?= =?ISO-2022-JP?B?GyRCJVwlQyUvJTkzK0BfJEskRCQkJEYhSxsoQg==?= ) ``` The relevant excerpts of the MIME message are: ``` Message-Id: <5451af3d-000e6070-af32c615041a5c3dae408c7ef67e5444-18793@tx203.storage.kks.yahoo.co.jp> From: box-master@mail.yahoo.co.jp To: gaialibs@yahoo.co.jp Subject: Yahoo! JAPAN =?ISO-2022-JP?B?SUQbJEIkNEVQTz8kTiQqNVJNTSRYIUobKEJZYWhvbyE=?= =?ISO-2022-JP?B?GyRCJVwlQyUvJTkzK0BfJEskRCQkJEYhSxsoQg==?= Content-Type:multipart/alternative; boundary="pzu8t3naw7b5therma1t" --pzu8t3naw7b5therma1t Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit --pzu8t3naw7b5therma1t Content-Type: text/html; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit --pzu8t3naw7b5therma1t-- ``` The code prior to this patch would throw an Error if trailing whitespace was detected following an ATOM in the context of a LIST or a SECTION. I've loosened this restriction by removing it entirely for both cases and added test coverage to match this intentional decision. Note that I don't have any rationale for letting the much more strict SECTION case have trailing white space other than it also seems harmless there as well. There are some similar other restrictions in this place that I have not loosened. For future analysis, the server greeting of yahoo.co.jp at this time is: ``` * OK [CAPABILITY IMAP4rev1 ID NAMESPACE UIDPLUS LITERAL+ CHILDREN XAPPLEPUSHSERVICE AUTH=PLAIN AUTH=LOGIN] IMAP4rev1 imapgate-0.7.68_11_1.61475 ```
Trailing whitespace in LISTs should not be fatal
Thanks! I agree, the parser should not fail at this position. This behavior was like this because the parser was initially extracted from Hoodiecrow which is strict by design. |
I tried to create a yahoo.co.jp account to add it to http://capa.kreata.ee/ but failed miserably at the captcha question, I don't even understand where one characters ends and another one starts 👹 Chinese QQ at least had an english translation of the signup page |
Yeah, that was a hard one to get an account for. I used Chrome and its magic translation feature and picked the second tab there which is an audio CAPTCHA since magic translation does nothing for the visual CAPTCHA there. It still took me something like 10 minutes using http://en.wikipedia.org/wiki/Japanese_numerals because they have a voice saying numbers over top a voice saying other things. Things got faster when I realized that they did seem to only be using the "preferred reading". Glad to see capa.kreata.ee is back up! It's very handy! I'll send you a copy of the credentials I created privately. Feel free to use them for testing purposes; I just don't want them to leak and then try and have to figure out how to get through the account unlock procedure! ;). yahoo.co.jp is potentially interesting also for the root reason I was looking at it, they rewrite the From header, but only when there's a sufficiently large attachment involved (1.3Megs or so). Short summary at https://bugzilla.mozilla.org/show_bug.cgi?id=1084216#c24 |
Great, thanks! I added the japanese yahoo to capa.kreata.ee (and this time I used an init script to keep the service running – so far i ran it under To trade bugs, here's one for Gmail: If you use parameter continuation to encode non-ascii attachment filenames and put the value in quotes (you do not have to as the urlencoded value is an atom, so quotes are optional) and ask for BODYSTRUCTURE from Gmail, you get the filename decoded properly but the original charset is prepended to the value. For example I have the following message with an attachment called “jõgeva.txt”:
When I send such a message to Gmail and fetch bodystructure of it I get the following response
As you can see the filename is listed as "=?ISO-8859-1?Q?utf-8''j=F5geva.txt?=“ (note the unexpected utf-8’’ before filename) that decodes to “utf8’’jõgeva.txt”. If I remove the quotes around the filename and resend the message
I get the following response from Gmail IMAP:
Now the filename is listed as "=?ISO-8859-1?Q?j=F5geva.txt?= that decodes to “jõgeva.txt”. |
It looks like quotes are forbidden in that case when using the RFC 2231 character set representation. This is definitely an important area for all our clients and I find it confusing but perversely interesting, so I just did some digging, taking notes as I went and inlining the grammar constructs/etc. since jumping between the various RFC's can be quite a hassle: Specifically, according to the grammar at http://tools.ietf.org/html/rfc2231#section-7 you are constrained to use the "extended-other-values" term for encoding the value (without quotes) because:
and tspecials in http://tools.ietf.org/html/rfc2045#section-5.1 is as follows (and includes double-quote, notably):
...And for paranoia I checked and MIME words are not allowed even in the normal "value" case because the 2045 grammar says:
And http://tools.ietf.org/html/rfc2047#section-5 says:
and
|
Oh, and is that a fix that's coming or is already in mailbuild or something like that? I'd like to make sure I have a no-quotes version when I land the current set of email.js upgrades to our trunk |
Inferesting, you're right, I always went with this note from section 3: "Note that quotes around parameter values are part of the value syntax; they are NOT part of the value itself. Furthermore, it is explicitly permitted to have a mixture of quoted and unquoted continuation fields." but now I that I checked the grammar this only applies to non-extended values. So I guess Gmail isn't doing anything wrong here but the strange result is because of some choices made when normalizing an invalid value (quotes are dropped and the resulting value is urldecoded as a whole, using utf-8 as the default charset). Anyhow, this is fixed in emailjs for some time by now. I know about mime-words not allowed into quoted strings and I think that rule is the most important reason why rfc2231 was created in the first place - otherwise you could split the long paramter value into smaller mime-words, separate these with spaces and let the regular line wrapping manage long header lines. As mime words are not allowed into quotes and without quotes the result would be conflicting something else was needed. |
This is primarily a move to the upgrade startTLS functionality. This included a fix to the SMTP implementation where a server could avoid TLS-initiation by not implementing EHLO. (emailjs/emailjs-smtp-client#20). This goal was bug 1060558. An additional fix we had made locally but not upstreamed (but have now upstreamed) and that has improved test coverage on our end (see test_mine) is a problem relating to newlines. See emailjs/emailjs-imap-client#35. Other (new) fixes folded in here: - Namespace NIL delimiter: emailjs/emailjs-imap-client#36 - trailing whitespace in bodystructure list: emailjs/emailjs-imap-handler#8 Fixes we had made locally and upstreamed but not taken via the release process: - NIL delimiters for LIST emailjs/emailjs-imap-client#27 and LSUB emailjs/emailjs-imap-client#29 (discovered while investigating bug 1091295 and bug 1084216)
This is primarily a move to the upgrade startTLS functionality. This included a fix to the SMTP implementation where a server could avoid TLS-initiation by not implementing EHLO. (emailjs/emailjs-smtp-client#20). This goal was bug 1060558. An additional fix we had made locally but not upstreamed (but have now upstreamed) and that has improved test coverage on our end (see test_mine) is a problem relating to newlines. See emailjs/emailjs-imap-client#35. Other (new) fixes folded in here: - Namespace NIL delimiter: emailjs/emailjs-imap-client#36 - trailing whitespace in bodystructure list: emailjs/emailjs-imap-handler#8 Fixes we had made locally and upstreamed but not taken via the release process: - NIL delimiters for LIST emailjs/emailjs-imap-client#27 and LSUB emailjs/emailjs-imap-client#29 (discovered while investigating bug 1091295 and bug 1084216)
On an account hosted at yahoo.co.jp one of the messages you get at signup has a
BODYSTRUCTURE that includes trailing whitespace in the nested paren list rep
for multipart messages, specifically, whitespace after the "body-fld-dsp" and
preceding "body-fld-lang" per the grammar.
The whitespace violates RFC 3501 but it seems silly to fail to parse the
BODYSTRUCTURE in this case.
The raw FETCH line is:
The relevant excerpts of the MIME message are:
The code prior to this patch would throw an Error if trailing whitespace was
detected following an ATOM in the context of a LIST or a SECTION. I've
loosened this restriction by removing it entirely for both cases and added test
coverage to match this intentional decision. Note that I don't have any
rationale for letting the much more strict SECTION case have trailing white
space other than it also seems harmless there as well.
There are some similar other restrictions in this place that I have not
loosened.
For future analysis, the server greeting of yahoo.co.jp at this time is: