Correct line breaks in quoted-printable encoding #496

Merged
merged 1 commit into from Jan 28, 2013

Conversation

Projects
None yet
2 participants
Collaborator

jeremy commented Jan 19, 2013

Quoted-printable transfer encoding is intended to be "line break neutral." It uses CRLF to encode line breaks, whether you prefer carriage returns, line feeds, or both.

Ruby's builtin #pack and #unpack support quoted-printable encoding, but expect to see linefeeds (\n) only. So it's our responsibility to convert to LF before encoding and to CRLF afterward, and to convert CRLF to LF after decoding. Otherwise we get incorrectly hex-encoded CR as well as doubled-up CR:

Ruby treats \n as its line break, so it hex-encodes \r:

> ["\r\n"].pack('M')
=> "=0D\n"

And Mail::Encodings::QuotedPrintable.decode does an additional to_crlf which converts the \n to \r\n, resulting in an extra line break:

> ["\r\n"].pack('M').to_crlf
=> "=0D\r\n"

We should to_lf before using Ruby's quoted-printable packing. That gets the input text back into Ruby's line ending, which is faithfully encoded as quoted-printable CRLFs:

> ["\r\n".to_lf].pack('M').to_crlf
=> "\r\n"

Ruby also unpacks CRLF as CRLF—strangely, considering it only respects LF for packing! For consistency, one would expect Ruby to unpack CRLF as LF. In any case, we can to_lf the unpacked result for consistent behavior (and working round-tripping):

> "\r\n".unpack('M')
=> "\r\n"
> "\r\n".unpack('M').to_lf
=> "\n"
Correct line breaks in quoted-printable encoding.
Quoted-printable transfer encoding is intended to be "line break
neutral." It uses CRLF to encode line breaks, whether you prefer
carriage returns, line feeds, or both.

Ruby's builtin #pack and #unpack support quoted-printable encoding,
but expect to see linefeeds (\n) only. So it's our responsibility to
convert to LF before encoding and to CRLF afterward, and to convert
CRLF to LF after decoding.
Contributor

carsonreinke commented Jan 25, 2013

Should "\r\n" == decode(encode("\r\n"))?

This just makes me want to use a different encoding.

Collaborator

jeremy commented Jan 26, 2013

@carsonreinke Quoted-printable encodes all line breaks, whether \r, \n, or \r\n, as \r\n. That's meant to be line-break neutral, so you can use carriage returns on one platform and newlines on another without any special translation. The sender encodes \r as \r\n and the receiver decodes \r\n as \n, for example.

Hence, there is no guarantee that decode(encode(line_break)) == line_break unless the line break is your platform's preference. In Ruby, that's almost always \n. So decode(encode(any_line_break)) == platform_line_break.

Quoted-printable is weird indeed.

jeremy added a commit that referenced this pull request Jan 28, 2013

Merge pull request #496 from jeremy/quoted-printable-line-break-neutr…
…ality

Correct line breaks in quoted-printable encoding

@jeremy jeremy merged commit ed88098 into mikel:master Jan 28, 2013

jeremy added a commit that referenced this pull request Jan 28, 2013

jeremy added a commit that referenced this pull request May 22, 2017

Eliminate attachment corruption caused by line ending conversions
* Omit initial CRLF linefeed conversion since CRLF are required newline
  separators. We shouldn't need to convert bare CR or LF. Update our
  fixture emails to use CRLF throughout. Closes #609. Fixes #408.

* Drop quoted-printable CRLF conversion. This was introduced to
  harmonize with Ruby's \n-based line endings. But this breaks Q-P
  encoding with binary data. It's not *meant* for binary data, but we
  don't yet take adequate measures to use base64 for these cases.
  Reverts #496. Fixes #1010.

Closes #1113
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment