Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mime/quotedprintable: Unable to parse UTF-8 character in body #22597

Closed
xplodwild opened this issue Nov 6, 2017 · 7 comments
Closed

mime/quotedprintable: Unable to parse UTF-8 character in body #22597

xplodwild opened this issue Nov 6, 2017 · 7 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@xplodwild
Copy link

What version of Go are you using (go version)?

1.9.2

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

linux-amd64, happens as well on windows-amd64

What did you do?

I am parsing a Slack email using mime/multipart. It contains a special character "forward quote": (not the regular simple quote)

The following program reproduces the issue:
https://play.golang.org/p/jwZKGQ8fQS

What did you expect to see?

The message should be parsed properly and I should be able to get the part displayed in the log/output.

What did you see instead?

Failed to read all part: quotedprintable: invalid unescaped byte 0xe2 in body

@opennota
Copy link

opennota commented Nov 7, 2017

Your forward quote is a multi-byte character and it must be qp-encoded. It is not. I think mime/quotedprintable is right here.

@xplodwild
Copy link
Author

xplodwild commented Nov 7, 2017

Thanks. It is indeed what one could expect from mime/quotedprintable, however this leaves me with either bypassing entirely qp encoding, or skipping the message altogether. Considering that email displays fine in Gmail, Outlook, Thunderbird and all the popular mai clients, there is no reason I should not be able to read it as well using golang's stdlib.

The only workaround I see for now is removing Content-Transfer-Encoding: quoted-printable from the mail headers whenever the read fails, in order to bypass quotedprintable when reading the mail part body from mime/multipart's output. This is obviously not optimal. The other solution would be to fork mime/quotedprintable to add a "relaxed" flag of some sorts, allowing unencoded characters to be let through.

But IMO this flag should be in the stdlib, especially considering it doesn't seem blocking for any other parser/mail client/mail service.

@ianlancetaylor
Copy link
Contributor

In the discussion of illegal substrings of quoted-printable data, RFC 2045 says:

Control characters other than TAB, or CR and LF as parts of CRLF pairs, must not appear. The same is true for octets with decimal values greater than 126. If found in incoming quoted-printable data by a decoder, a robust implementation might exclude them from the decoded data and warn the user that illegal characters were discovered.

So I suppose the question here is whether the mime/quotedprintable package should permit invalid bytes or not. Right now we return an error. One possibility would be:

  • upon encountering an invalid byte, if there are any bytes already read, unread the invalid byte and return the bytes read so far
  • if no bytes have been read so far, return the invalid byte with a new exported error
  • allow future reads to continue as usual

That would maintain the current behavior for most callers, while permitting callers who want to permit invalid bytes to simply ignore the new exported error and carry on as though it did not occur.

@ianlancetaylor ianlancetaylor added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Nov 7, 2017
@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone Nov 7, 2017
@rsc
Copy link
Contributor

rsc commented Nov 13, 2017

Per discussion with @bradfitz and @ianlancetaylor, let's just accept the >=0x80 bytes (for Go 1.11). CL welcome for Go 1.11.

@rsc rsc modified the milestones: Go1.11, Go1.10 Nov 13, 2017
@rsc rsc added the NeedsFix The path to resolution is known, but the work has not been done. label Nov 13, 2017
@gopherbot gopherbot removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Nov 13, 2017
@gopherbot
Copy link

Change https://golang.org/cl/121095 mentions this issue: mime/quotedprintable: accept UTF-8 encoded bytes

@RalphCorderoy
Copy link

From the CL:

RFC 2045 doesn't permit non-ASCII bytes, but some systems send them
anyhow, and it seems to do little harm to permit them.

The harm is callers that could previously rely on stdlib to check the data was correct now can't.
There is a continuing pattern of stdlib slackening RFC compliance over time, promoting divergence, and increasing the pressure on other systems to also ignore errors: 'Gmail/PHP/Go doesn't complain'.

Callers that care have to now check for themselves before they pass the quoted-printable input downstream to non-stdlib that will barf.

@ianlancetaylor
Copy link
Contributor

@RalphCorderoy It's certainly not ideal, but I think our priority has to be supporting the less sophisticated user. Since other systems are apparently doing the wrong thing, we need to support that. The different approach I outlined a few comments above seemed too complex. It's very simple for the sophisticated user to detect this problematic case by scanning the bytes returned from the Read method.

@golang golang locked and limited conversation to collaborators Jun 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants