Skip to content

Conversation

asutherland
Copy link
Contributor

On an account hosted at yahoo.co.jp one of the messages you get at signup has a
BODYSTRUCTURE that includes trailing whitespace in the nested paren list rep
for multipart messages, specifically, whitespace after the "body-fld-dsp" and
preceding "body-fld-lang" per the grammar.

The whitespace violates RFC 3501 but it seems silly to fail to parse the
BODYSTRUCTURE in this case.

The raw FETCH line is:

* 2 FETCH (BODYSTRUCTURE (("text" "plain" ("charset" "ISO-2022-JP") NIL NIL "7bit" 2515 64 NIL NIL NIL NIL)("text" "html" ("charset" "ISO-2022-JP") NIL NIL "7bit" 7723 151 NIL NIL NIL NIL) "alternative" ("boundary" "pzu8t3naw7b5therma1t") NIL ) FLAGS (\Seen) INTERNALDATE "30-Oct-2014 03:23:45 +0000" UID 2 BODY[HEADER.FIELDS (FROM TO CC BCC SUBJECT REPLY-TO MESSAGE-ID REFERENCES)] {312}
Message-Id: <5451af3d-000e6070-af32c615041a5c3dae408c7ef67e5444-18793@tx203.storage.kks.yahoo.co.jp>
From: box-master@mail.yahoo.co.jp
To: gaialibs@yahoo.co.jp
Subject: Yahoo! JAPAN =?ISO-2022-JP?B?SUQbJEIkNEVQTz8kTiQqNVJNTSRYIUobKEJZYWhvbyE=?=
 =?ISO-2022-JP?B?GyRCJVwlQyUvJTkzK0BfJEskRCQkJEYhSxsoQg==?=

)

The relevant excerpts of the MIME message are:

Message-Id: <5451af3d-000e6070-af32c615041a5c3dae408c7ef67e5444-18793@tx203.storage.kks.yahoo.co.jp>
From: box-master@mail.yahoo.co.jp
To: gaialibs@yahoo.co.jp
Subject: Yahoo! JAPAN =?ISO-2022-JP?B?SUQbJEIkNEVQTz8kTiQqNVJNTSRYIUobKEJZYWhvbyE=?=
 =?ISO-2022-JP?B?GyRCJVwlQyUvJTkzK0BfJEskRCQkJEYhSxsoQg==?=
Content-Type:multipart/alternative; boundary="pzu8t3naw7b5therma1t"

--pzu8t3naw7b5therma1t
Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

--pzu8t3naw7b5therma1t
Content-Type: text/html; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

--pzu8t3naw7b5therma1t--

The code prior to this patch would throw an Error if trailing whitespace was
detected following an ATOM in the context of a LIST or a SECTION. I've
loosened this restriction by removing it entirely for both cases and added test
coverage to match this intentional decision. Note that I don't have any
rationale for letting the much more strict SECTION case have trailing white
space other than it also seems harmless there as well.

There are some similar other restrictions in this place that I have not
loosened.

For future analysis, the server greeting of yahoo.co.jp at this time is:

* OK [CAPABILITY IMAP4rev1 ID NAMESPACE UIDPLUS LITERAL+ CHILDREN XAPPLEPUSHSERVICE AUTH=PLAIN AUTH=LOGIN] IMAP4rev1 imapgate-0.7.68_11_1.61475

On an account hosted at yahoo.co.jp one of the messages you get at signup has a
BODYSTRUCTURE that includes trailing whitespace in the nested paren list rep
for multipart messages, specifically, whitespace after the "body-fld-dsp" and
preceding "body-fld-lang" per the grammar.

The whitespace violates RFC 3501 but it seems silly to fail to parse the
BODYSTRUCTURE in this case.

The raw FETCH line is:
```
* 2 FETCH (BODYSTRUCTURE (("text" "plain" ("charset" "ISO-2022-JP") NIL NIL "7bit" 2515 64 NIL NIL NIL NIL)("text" "html" ("charset" "ISO-2022-JP") NIL NIL "7bit" 7723 151 NIL NIL NIL NIL) "alternative" ("boundary" "pzu8t3naw7b5therma1t") NIL ) FLAGS (\Seen) INTERNALDATE "30-Oct-2014 03:23:45 +0000" UID 2 BODY[HEADER.FIELDS (FROM TO CC BCC SUBJECT REPLY-TO MESSAGE-ID REFERENCES)] {312}
Message-Id: <5451af3d-000e6070-af32c615041a5c3dae408c7ef67e5444-18793@tx203.storage.kks.yahoo.co.jp>
From: box-master@mail.yahoo.co.jp
To: gaialibs@yahoo.co.jp
Subject: Yahoo! JAPAN =?ISO-2022-JP?B?SUQbJEIkNEVQTz8kTiQqNVJNTSRYIUobKEJZYWhvbyE=?=
 =?ISO-2022-JP?B?GyRCJVwlQyUvJTkzK0BfJEskRCQkJEYhSxsoQg==?=

)
```

The relevant excerpts of the MIME message are:
```
Message-Id: <5451af3d-000e6070-af32c615041a5c3dae408c7ef67e5444-18793@tx203.storage.kks.yahoo.co.jp>
From: box-master@mail.yahoo.co.jp
To: gaialibs@yahoo.co.jp
Subject: Yahoo! JAPAN =?ISO-2022-JP?B?SUQbJEIkNEVQTz8kTiQqNVJNTSRYIUobKEJZYWhvbyE=?=
 =?ISO-2022-JP?B?GyRCJVwlQyUvJTkzK0BfJEskRCQkJEYhSxsoQg==?=
Content-Type:multipart/alternative; boundary="pzu8t3naw7b5therma1t"

--pzu8t3naw7b5therma1t
Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

--pzu8t3naw7b5therma1t
Content-Type: text/html; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit

--pzu8t3naw7b5therma1t--
```

The code prior to this patch would throw an Error if trailing whitespace was
detected following an ATOM in the context of a LIST or a SECTION.  I've
loosened this restriction by removing it entirely for both cases and added test
coverage to match this intentional decision.  Note that I don't have any
rationale for letting the much more strict SECTION case have trailing white
space other than it also seems harmless there as well.

There are some similar other restrictions in this place that I have not
loosened.

For future analysis, the server greeting of yahoo.co.jp at this time is:
```
* OK [CAPABILITY IMAP4rev1 ID NAMESPACE UIDPLUS LITERAL+ CHILDREN XAPPLEPUSHSERVICE AUTH=PLAIN AUTH=LOGIN] IMAP4rev1 imapgate-0.7.68_11_1.61475
```
andris9 added a commit that referenced this pull request Nov 12, 2014
Trailing whitespace in LISTs should not be fatal
@andris9 andris9 merged commit 67e5dec into emailjs:master Nov 12, 2014
@andris9
Copy link
Member

andris9 commented Nov 12, 2014

Thanks! I agree, the parser should not fail at this position. This behavior was like this because the parser was initially extracted from Hoodiecrow which is strict by design.

@andris9
Copy link
Member

andris9 commented Nov 12, 2014

I tried to create a yahoo.co.jp account to add it to http://capa.kreata.ee/ but failed miserably at the captcha question, I don't even understand where one characters ends and another one starts 👹 Chinese QQ at least had an english translation of the signup page

@asutherland asutherland deleted the accept-trailing-ws-in-lists branch November 12, 2014 18:04
@asutherland
Copy link
Contributor Author

Yeah, that was a hard one to get an account for. I used Chrome and its magic translation feature and picked the second tab there which is an audio CAPTCHA since magic translation does nothing for the visual CAPTCHA there. It still took me something like 10 minutes using http://en.wikipedia.org/wiki/Japanese_numerals because they have a voice saying numbers over top a voice saying other things. Things got faster when I realized that they did seem to only be using the "preferred reading".

Glad to see capa.kreata.ee is back up! It's very handy!

I'll send you a copy of the credentials I created privately. Feel free to use them for testing purposes; I just don't want them to leak and then try and have to figure out how to get through the account unlock procedure! ;).

yahoo.co.jp is potentially interesting also for the root reason I was looking at it, they rewrite the From header, but only when there's a sufficiently large attachment involved (1.3Megs or so). Short summary at https://bugzilla.mozilla.org/show_bug.cgi?id=1084216#c24

@andris9
Copy link
Member

andris9 commented Nov 12, 2014

Great, thanks! I added the japanese yahoo to capa.kreata.ee (and this time I used an init script to keep the service running – so far i ran it under screen command, so it stopped working every time I had to restart the server and forgot to start the service).

To trade bugs, here's one for Gmail: If you use parameter continuation to encode non-ascii attachment filenames and put the value in quotes (you do not have to as the urlencoded value is an atom, so quotes are optional) and ask for BODYSTRUCTURE from Gmail, you get the filename decoded properly but the original charset is prepended to the value.

For example I have the following message with an attachment called “jõgeva.txt”:

Content-Type: multipart/mixed; boundary="abc"

--abc
Content-Disposition: attachment; filename*0*="utf-8''j%C3%B5geva.txt"

tere tere
--abc—

When I send such a message to Gmail and fetch bodystructure of it I get the following response

* 418 FETCH (BODYSTRUCTURE (("TEXT" "PLAIN" NIL NIL NIL "7BIT" 9 0 NIL ("ATTACHMENT" ("FILENAME" "=?ISO-8859-1?Q?utf-8''j=F5geva.txt?=")) NIL) "MIXED" ("BOUNDARY" "abc") NIL NIL))

As you can see the filename is listed as "=?ISO-8859-1?Q?utf-8''j=F5geva.txt?=“ (note the unexpected utf-8’’ before filename) that decodes to “utf8’’jõgeva.txt”.

If I remove the quotes around the filename and resend the message

Content-Disposition: attachment; filename*0*=utf-8''j%C3%B5geva.txt

I get the following response from Gmail IMAP:

* 419 FETCH (BODYSTRUCTURE (("TEXT" "PLAIN" NIL NIL NIL "7BIT" 9 0 NIL ("ATTACHMENT" ("FILENAME" "=?ISO-8859-1?Q?j=F5geva.txt?=")) NIL) "MIXED" ("BOUNDARY" "abc") NIL NIL))

Now the filename is listed as "=?ISO-8859-1?Q?j=F5geva.txt?= that decodes to “jõgeva.txt”.

@asutherland
Copy link
Contributor Author

It looks like quotes are forbidden in that case when using the RFC 2231 character set representation. This is definitely an important area for all our clients and I find it confusing but perversely interesting, so I just did some digging, taking notes as I went and inlining the grammar constructs/etc. since jumping between the various RFC's can be quite a hassle:

Specifically, according to the grammar at http://tools.ietf.org/html/rfc2231#section-7 you are constrained to use the "extended-other-values" term for encoding the value (without quotes) because:

parameter := regular-parameter / extended-parameter

regular-parameter := regular-parameter-name "=" value

extended-parameter := (extended-initial-name "="
                          extended-value) /  ;; probably typo, should be extended-initial-value
                         (extended-other-names "="
                          extended-other-values)

extended-initial-value := [charset] "'" [language] "'"
                             extended-other-values

extended-other-values := *(ext-octet / attribute-char)

ext-octet := "%" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")

attribute-char := <any (US-ASCII) CHAR except SPACE, CTLs,
                     "*", "'", "%", or tspecials>

and tspecials in http://tools.ietf.org/html/rfc2045#section-5.1 is as follows (and includes double-quote, notably):

     tspecials :=  "(" / ")" / "<" / ">" / "@" /
                   "," / ";" / ":" / "\" / <">
                   "/" / "[" / "]" / "?" / "="
                   ; Must be in quoted-string,
                   ; to use within parameter values

...And for paranoia I checked and MIME words are not allowed even in the normal "value" case because the 2045 grammar says:

 value := token / quoted-string

token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
                 or tspecials>

And http://tools.ietf.org/html/rfc2047#section-5 says:

 + An 'encoded-word' MUST NOT appear within a 'quoted-string'.

and

  + An 'encoded-word' MUST NOT be used in parameter of a MIME
     Content-Type or Content-Disposition field, or in any structured
     field body except within a 'comment' or 'phrase'.

@asutherland
Copy link
Contributor Author

Oh, and is that a fix that's coming or is already in mailbuild or something like that? I'd like to make sure I have a no-quotes version when I land the current set of email.js upgrades to our trunk

@andris9
Copy link
Member

andris9 commented Nov 13, 2014

Inferesting, you're right, I always went with this note from section 3: "Note that quotes around parameter values are part of the value syntax; they are NOT part of the value itself. Furthermore, it is explicitly permitted to have a mixture of quoted and unquoted continuation fields." but now I that I checked the grammar this only applies to non-extended values. So I guess Gmail isn't doing anything wrong here but the strange result is because of some choices made when normalizing an invalid value (quotes are dropped and the resulting value is urldecoded as a whole, using utf-8 as the default charset). Anyhow, this is fixed in emailjs for some time by now.

I know about mime-words not allowed into quoted strings and I think that rule is the most important reason why rfc2231 was created in the first place - otherwise you could split the long paramter value into smaller mime-words, separate these with spaces and let the regular line wrapping manage long header lines. As mime words are not allowed into quotes and without quotes the result would be conflicting something else was needed.

asutherland added a commit to asutherland/gaia-email-libs-and-more that referenced this pull request Nov 20, 2014
This is primarily a move to the upgrade startTLS functionality.  This included
a fix to the SMTP implementation where a server could avoid TLS-initiation by
not implementing EHLO.  (emailjs/emailjs-smtp-client#20).
This goal was bug 1060558.

An additional fix we had made locally but not upstreamed (but have now
upstreamed) and that has improved test coverage on our end (see test_mine) is
a problem relating to newlines.  See
emailjs/emailjs-imap-client#35.

Other (new) fixes folded in here:
- Namespace NIL delimiter: emailjs/emailjs-imap-client#36
- trailing whitespace in bodystructure list:
  emailjs/emailjs-imap-handler#8

Fixes we had made locally and upstreamed but not taken via the release process:
- NIL delimiters for LIST emailjs/emailjs-imap-client#27 and
  LSUB emailjs/emailjs-imap-client#29 (discovered while
  investigating bug 1091295 and bug 1084216)
asutherland added a commit to asutherland/gaia-email-libs-and-more that referenced this pull request Nov 27, 2014
This is primarily a move to the upgrade startTLS functionality.  This included
a fix to the SMTP implementation where a server could avoid TLS-initiation by
not implementing EHLO.  (emailjs/emailjs-smtp-client#20).
This goal was bug 1060558.

An additional fix we had made locally but not upstreamed (but have now
upstreamed) and that has improved test coverage on our end (see test_mine) is
a problem relating to newlines.  See
emailjs/emailjs-imap-client#35.

Other (new) fixes folded in here:
- Namespace NIL delimiter: emailjs/emailjs-imap-client#36
- trailing whitespace in bodystructure list:
  emailjs/emailjs-imap-handler#8

Fixes we had made locally and upstreamed but not taken via the release process:
- NIL delimiters for LIST emailjs/emailjs-imap-client#27 and
  LSUB emailjs/emailjs-imap-client#29 (discovered while
  investigating bug 1091295 and bug 1084216)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants