Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webmail does not seem to decode UTF-8 attachment names #82

Closed
mattfbacon opened this issue Oct 12, 2023 · 1 comment
Closed

Webmail does not seem to decode UTF-8 attachment names #82

mattfbacon opened this issue Oct 12, 2023 · 1 comment

Comments

@mattfbacon
Copy link
Contributor

mattfbacon commented Oct 12, 2023

I see this in the mox webmail:

Garbled titles that have not been decoded

One of these is =?utf-8?B?4oCcU25vd+KAnSBieSBEYWxlIEJhaWxleS5wZGY=?=, which is base64-encoded UTF-8. The other is =?utf-8?Q?=E2=80=9CThe_Letters_They_Left_Behind=E2=80=9D_--_Scott_Edelman?==?utf-8?Q?=2Epdf?=, which is quoted-printable UTF-8.

The relevant sections of the raw email are:

Content-Disposition: inline;
	filename*=utf-8''%E2%80%9CThe%20Letters%20They%20Left%20Behind%E2%80%9D%20%2D%2D%20Scott%20Edelman.pdf
Content-Type: application/pdf;
	x-unix-mode=0644;
	name="=?utf-8?Q?=E2=80=9CThe_Letters_They_Left_Behind=E2=80=9D_--_Scott_Edelman?=
 =?utf-8?Q?=2Epdf?="
Content-Transfer-Encoding: base64
Content-Disposition: inline;
	filename*=utf-8''%E2%80%9CSnow%E2%80%9D%20by%20Dale%20Bailey.pdf
Content-Type: application/pdf;
	x-unix-mode=0644;
	name="=?utf-8?B?4oCcU25vd+KAnSBieSBEYWxlIEJhaWxleS5wZGY=?="
Content-Transfer-Encoding: base64
mjl- added a commit that referenced this issue Oct 14, 2023
according to the rfc's (2231, and 2047), non-ascii filenames in content-type
and content-disposition headers should be encoded like this:

	Content-Type: text/plain; name*=utf-8''hi%E2%98%BA.txt
	Content-Disposition: attachment; filename*=utf-8''hi%E2%98%BA.txt

and that is what the Go standard library mime.ParseMediaType and
mime.FormatMediaType parse and generate.

this is what thunderbird sends:

	Content-Type: text/plain; charset=UTF-8; name="=?UTF-8?B?aGnimLoudHh0?="
	Content-Disposition: attachment; filename*=UTF-8''%68%69%E2%98%BA%2E%74%78%74

(thunderbird will also correctly split long filenames over multiple parameters,
named "filename*0*", "filename*1*", etc.)

this is what gmail sends:

	Content-Type: text/plain; charset="US-ASCII"; name="=?UTF-8?B?aGnimLoudHh0?="
	Content-Disposition: attachment; filename="=?UTF-8?B?aGnimLoudHh0?="

i cannot find where the q/b-word encoded values in "name" and "filename" are
allowed. until that time, we try parsing them unless in pedantic mode.

we didn't generate correctly encoded filenames yet, this commit also fixes that.

for issue #82 by mattfbacon, thanks for reporting!
@mattfbacon
Copy link
Contributor Author

Looks like this was resolved by that commit. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant