Trailing whitespace in LISTs should not be fatal #8

asutherland · 2014-11-11T22:13:46Z

On an account hosted at yahoo.co.jp one of the messages you get at signup has a
BODYSTRUCTURE that includes trailing whitespace in the nested paren list rep
for multipart messages, specifically, whitespace after the "body-fld-dsp" and
preceding "body-fld-lang" per the grammar.

The whitespace violates RFC 3501 but it seems silly to fail to parse the
BODYSTRUCTURE in this case.

The raw FETCH line is:

* 2 FETCH (BODYSTRUCTURE (("text" "plain" ("charset" "ISO-2022-JP") NIL NIL "7bit" 2515 64 NIL NIL NIL NIL)("text" "html" ("charset" "ISO-2022-JP") NIL NIL "7bit" 7723 151 NIL NIL NIL NIL) "alternative" ("boundary" "pzu8t3naw7b5therma1t") NIL ) FLAGS (\Seen) INTERNALDATE "30-Oct-2014 03:23:45 +0000" UID 2 BODY[HEADER.FIELDS (FROM TO CC BCC SUBJECT REPLY-TO MESSAGE-ID REFERENCES)] {312}
Message-Id: <5451af3d-000e6070-af32c615041a5c3dae408c7ef67e5444-18793@tx203.storage.kks.yahoo.co.jp>
From: box-master@mail.yahoo.co.jp
To: gaialibs@yahoo.co.jp
Subject: Yahoo! JAPAN =?ISO-2022-JP?B?SUQbJEIkNEVQTz8kTiQqNVJNTSRYIUobKEJZYWhvbyE=?=
 =?ISO-2022-JP?B?GyRCJVwlQyUvJTkzK0BfJEskRCQkJEYhSxsoQg==?=

)

The relevant excerpts of the MIME message are:

Message-Id: <5451af3d-000e6070-af32c615041a5c3dae408c7ef67e5444-18793@tx203.storage.kks.yahoo.co.jp>
From: box-master@mail.yahoo.co.jp
To: gaialibs@yahoo.co.jp
Subject: Yahoo! JAPAN =?ISO-2022-JP?B?SUQbJEIkNEVQTz8kTiQqNVJNTSRYIUobKEJZYWhvbyE=?=
 =?ISO-2022-JP?B?GyRCJVwlQyUvJTkzK0BfJEskRCQkJEYhSxsoQg==?=
Content-Type:multipart/alternative; boundary="pzu8t3naw7b5therma1t"

--pzu8t3naw7b5therma1t
Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

--pzu8t3naw7b5therma1t
Content-Type: text/html; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

--pzu8t3naw7b5therma1t--

The code prior to this patch would throw an Error if trailing whitespace was
detected following an ATOM in the context of a LIST or a SECTION. I've
loosened this restriction by removing it entirely for both cases and added test
coverage to match this intentional decision. Note that I don't have any
rationale for letting the much more strict SECTION case have trailing white
space other than it also seems harmless there as well.

There are some similar other restrictions in this place that I have not
loosened.

For future analysis, the server greeting of yahoo.co.jp at this time is:

* OK [CAPABILITY IMAP4rev1 ID NAMESPACE UIDPLUS LITERAL+ CHILDREN XAPPLEPUSHSERVICE AUTH=PLAIN AUTH=LOGIN] IMAP4rev1 imapgate-0.7.68_11_1.61475

On an account hosted at yahoo.co.jp one of the messages you get at signup has a BODYSTRUCTURE that includes trailing whitespace in the nested paren list rep for multipart messages, specifically, whitespace after the "body-fld-dsp" and preceding "body-fld-lang" per the grammar. The whitespace violates RFC 3501 but it seems silly to fail to parse the BODYSTRUCTURE in this case. The raw FETCH line is: ``` * 2 FETCH (BODYSTRUCTURE (("text" "plain" ("charset" "ISO-2022-JP") NIL NIL "7bit" 2515 64 NIL NIL NIL NIL)("text" "html" ("charset" "ISO-2022-JP") NIL NIL "7bit" 7723 151 NIL NIL NIL NIL) "alternative" ("boundary" "pzu8t3naw7b5therma1t") NIL ) FLAGS (\Seen) INTERNALDATE "30-Oct-2014 03:23:45 +0000" UID 2 BODY[HEADER.FIELDS (FROM TO CC BCC SUBJECT REPLY-TO MESSAGE-ID REFERENCES)] {312} Message-Id: <5451af3d-000e6070-af32c615041a5c3dae408c7ef67e5444-18793@tx203.storage.kks.yahoo.co.jp> From: box-master@mail.yahoo.co.jp To: gaialibs@yahoo.co.jp Subject: Yahoo! JAPAN =?ISO-2022-JP?B?SUQbJEIkNEVQTz8kTiQqNVJNTSRYIUobKEJZYWhvbyE=?= =?ISO-2022-JP?B?GyRCJVwlQyUvJTkzK0BfJEskRCQkJEYhSxsoQg==?= ) ``` The relevant excerpts of the MIME message are: ``` Message-Id: <5451af3d-000e6070-af32c615041a5c3dae408c7ef67e5444-18793@tx203.storage.kks.yahoo.co.jp> From: box-master@mail.yahoo.co.jp To: gaialibs@yahoo.co.jp Subject: Yahoo! JAPAN =?ISO-2022-JP?B?SUQbJEIkNEVQTz8kTiQqNVJNTSRYIUobKEJZYWhvbyE=?= =?ISO-2022-JP?B?GyRCJVwlQyUvJTkzK0BfJEskRCQkJEYhSxsoQg==?= Content-Type:multipart/alternative; boundary="pzu8t3naw7b5therma1t" --pzu8t3naw7b5therma1t Content-Type: text/plain; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit --pzu8t3naw7b5therma1t Content-Type: text/html; charset=ISO-2022-JP Content-Transfer-Encoding: 7bit --pzu8t3naw7b5therma1t-- ``` The code prior to this patch would throw an Error if trailing whitespace was detected following an ATOM in the context of a LIST or a SECTION. I've loosened this restriction by removing it entirely for both cases and added test coverage to match this intentional decision. Note that I don't have any rationale for letting the much more strict SECTION case have trailing white space other than it also seems harmless there as well. There are some similar other restrictions in this place that I have not loosened. For future analysis, the server greeting of yahoo.co.jp at this time is: ``` * OK [CAPABILITY IMAP4rev1 ID NAMESPACE UIDPLUS LITERAL+ CHILDREN XAPPLEPUSHSERVICE AUTH=PLAIN AUTH=LOGIN] IMAP4rev1 imapgate-0.7.68_11_1.61475 ```

Trailing whitespace in LISTs should not be fatal

andris9 · 2014-11-12T09:36:50Z

Thanks! I agree, the parser should not fail at this position. This behavior was like this because the parser was initially extracted from Hoodiecrow which is strict by design.

andris9 · 2014-11-12T12:52:50Z

I tried to create a yahoo.co.jp account to add it to http://capa.kreata.ee/ but failed miserably at the captcha question, I don't even understand where one characters ends and another one starts 👹 Chinese QQ at least had an english translation of the signup page

asutherland · 2014-11-12T18:12:12Z

Yeah, that was a hard one to get an account for. I used Chrome and its magic translation feature and picked the second tab there which is an audio CAPTCHA since magic translation does nothing for the visual CAPTCHA there. It still took me something like 10 minutes using http://en.wikipedia.org/wiki/Japanese_numerals because they have a voice saying numbers over top a voice saying other things. Things got faster when I realized that they did seem to only be using the "preferred reading".

Glad to see capa.kreata.ee is back up! It's very handy!

I'll send you a copy of the credentials I created privately. Feel free to use them for testing purposes; I just don't want them to leak and then try and have to figure out how to get through the account unlock procedure! ;).

yahoo.co.jp is potentially interesting also for the root reason I was looking at it, they rewrite the From header, but only when there's a sufficiently large attachment involved (1.3Megs or so). Short summary at https://bugzilla.mozilla.org/show_bug.cgi?id=1084216#c24

andris9 · 2014-11-12T21:27:25Z

Great, thanks! I added the japanese yahoo to capa.kreata.ee (and this time I used an init script to keep the service running – so far i ran it under screen command, so it stopped working every time I had to restart the server and forgot to start the service).

To trade bugs, here's one for Gmail: If you use parameter continuation to encode non-ascii attachment filenames and put the value in quotes (you do not have to as the urlencoded value is an atom, so quotes are optional) and ask for BODYSTRUCTURE from Gmail, you get the filename decoded properly but the original charset is prepended to the value.

For example I have the following message with an attachment called “jõgeva.txt”:

Content-Type: multipart/mixed; boundary="abc"

--abc
Content-Disposition: attachment; filename*0*="utf-8''j%C3%B5geva.txt"

tere tere
--abc—

When I send such a message to Gmail and fetch bodystructure of it I get the following response

* 418 FETCH (BODYSTRUCTURE (("TEXT" "PLAIN" NIL NIL NIL "7BIT" 9 0 NIL ("ATTACHMENT" ("FILENAME" "=?ISO-8859-1?Q?utf-8''j=F5geva.txt?=")) NIL) "MIXED" ("BOUNDARY" "abc") NIL NIL))

As you can see the filename is listed as "=?ISO-8859-1?Q?utf-8''j=F5geva.txt?=“ (note the unexpected utf-8’’ before filename) that decodes to “utf8’’jõgeva.txt”.

If I remove the quotes around the filename and resend the message

Content-Disposition: attachment; filename*0*=utf-8''j%C3%B5geva.txt

I get the following response from Gmail IMAP:

* 419 FETCH (BODYSTRUCTURE (("TEXT" "PLAIN" NIL NIL NIL "7BIT" 9 0 NIL ("ATTACHMENT" ("FILENAME" "=?ISO-8859-1?Q?j=F5geva.txt?=")) NIL) "MIXED" ("BOUNDARY" "abc") NIL NIL))

Now the filename is listed as "=?ISO-8859-1?Q?j=F5geva.txt?= that decodes to “jõgeva.txt”.

asutherland · 2014-11-13T02:12:15Z

It looks like quotes are forbidden in that case when using the RFC 2231 character set representation. This is definitely an important area for all our clients and I find it confusing but perversely interesting, so I just did some digging, taking notes as I went and inlining the grammar constructs/etc. since jumping between the various RFC's can be quite a hassle:

Specifically, according to the grammar at http://tools.ietf.org/html/rfc2231#section-7 you are constrained to use the "extended-other-values" term for encoding the value (without quotes) because:

parameter := regular-parameter / extended-parameter

regular-parameter := regular-parameter-name "=" value

extended-parameter := (extended-initial-name "="
                          extended-value) /  ;; probably typo, should be extended-initial-value
                         (extended-other-names "="
                          extended-other-values)

extended-initial-value := [charset] "'" [language] "'"
                             extended-other-values

extended-other-values := *(ext-octet / attribute-char)

ext-octet := "%" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")

attribute-char := <any (US-ASCII) CHAR except SPACE, CTLs,
                     "*", "'", "%", or tspecials>

and tspecials in http://tools.ietf.org/html/rfc2045#section-5.1 is as follows (and includes double-quote, notably):

     tspecials :=  "(" / ")" / "<" / ">" / "@" /
                   "," / ";" / ":" / "\" / <">
                   "/" / "[" / "]" / "?" / "="
                   ; Must be in quoted-string,
                   ; to use within parameter values

...And for paranoia I checked and MIME words are not allowed even in the normal "value" case because the 2045 grammar says:

 value := token / quoted-string

token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
                 or tspecials>

And http://tools.ietf.org/html/rfc2047#section-5 says:

 + An 'encoded-word' MUST NOT appear within a 'quoted-string'.

and

  + An 'encoded-word' MUST NOT be used in parameter of a MIME
     Content-Type or Content-Disposition field, or in any structured
     field body except within a 'comment' or 'phrase'.

asutherland · 2014-11-13T02:13:23Z

Oh, and is that a fix that's coming or is already in mailbuild or something like that? I'd like to make sure I have a no-quotes version when I land the current set of email.js upgrades to our trunk

andris9 · 2014-11-13T04:26:41Z

Inferesting, you're right, I always went with this note from section 3: "Note that quotes around parameter values are part of the value syntax; they are NOT part of the value itself. Furthermore, it is explicitly permitted to have a mixture of quoted and unquoted continuation fields." but now I that I checked the grammar this only applies to non-extended values. So I guess Gmail isn't doing anything wrong here but the strange result is because of some choices made when normalizing an invalid value (quotes are dropped and the resulting value is urldecoded as a whole, using utf-8 as the default charset). Anyhow, this is fixed in emailjs for some time by now.

I know about mime-words not allowed into quoted strings and I think that rule is the most important reason why rfc2231 was created in the first place - otherwise you could split the long paramter value into smaller mime-words, separate these with spaces and let the regular line wrapping manage long header lines. As mime words are not allowed into quotes and without quotes the result would be conflicting something else was needed.

This is primarily a move to the upgrade startTLS functionality. This included a fix to the SMTP implementation where a server could avoid TLS-initiation by not implementing EHLO. (emailjs/emailjs-smtp-client#20). This goal was bug 1060558. An additional fix we had made locally but not upstreamed (but have now upstreamed) and that has improved test coverage on our end (see test_mine) is a problem relating to newlines. See emailjs/emailjs-imap-client#35. Other (new) fixes folded in here: - Namespace NIL delimiter: emailjs/emailjs-imap-client#36 - trailing whitespace in bodystructure list: emailjs/emailjs-imap-handler#8 Fixes we had made locally and upstreamed but not taken via the release process: - NIL delimiters for LIST emailjs/emailjs-imap-client#27 and LSUB emailjs/emailjs-imap-client#29 (discovered while investigating bug 1091295 and bug 1084216)

andris9 added a commit that referenced this pull request Nov 12, 2014

Merge pull request #8 from asutherland/accept-trailing-ws-in-lists

67e5dec

Trailing whitespace in LISTs should not be fatal

andris9 merged commit 67e5dec into emailjs:master Nov 12, 2014

asutherland deleted the accept-trailing-ws-in-lists branch November 12, 2014 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trailing whitespace in LISTs should not be fatal #8

Trailing whitespace in LISTs should not be fatal #8

Uh oh!

asutherland commented Nov 11, 2014

Uh oh!

andris9 commented Nov 12, 2014

Uh oh!

andris9 commented Nov 12, 2014

Uh oh!

asutherland commented Nov 12, 2014

Uh oh!

andris9 commented Nov 12, 2014

Uh oh!

asutherland commented Nov 13, 2014

Uh oh!

asutherland commented Nov 13, 2014

Uh oh!

andris9 commented Nov 13, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Trailing whitespace in LISTs should not be fatal #8

Trailing whitespace in LISTs should not be fatal #8

Uh oh!

Conversation

asutherland commented Nov 11, 2014

Uh oh!

andris9 commented Nov 12, 2014

Uh oh!

andris9 commented Nov 12, 2014

Uh oh!

asutherland commented Nov 12, 2014

Uh oh!

andris9 commented Nov 12, 2014

Uh oh!

asutherland commented Nov 13, 2014

Uh oh!

asutherland commented Nov 13, 2014

Uh oh!

andris9 commented Nov 13, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants