Bad content-types #3

jlindley · 2009-11-02T22:46:51Z

What is the proper way to handle bad content types? I've read about as many RFC's today as I can stand, and I missed the content-type fall back recommendations if they're in there.

Of 400mb or so of email I've attempted to parse today (largely the Enron mail corpus) the vast majority of parse errors were on the content-type header.

Mostly things like

Content-Type: text

(Instead of text/plain)

Or like:

Content-Type: multipart/mixed boundary="----=_NextPart_000_000F_01C17754.8C3CAF30"

(Missing the ';' delimiter before the value hash)

I committed a fix to my fork[1] that sets content-type to 'text/plain' on parser errors, but that doesn't feel quite right. Should it just ignore that field in the header altogether?

Thanks-

Jim

[1] http://github.com/jlindley/mail/commit/2fd51a8d757bbec2a7ef553b6bc52486b45539ab

The text was updated successfully, but these errors were encountered:

jlindley · 2009-11-02T23:19:03Z

Actually now I've run across another unparseable field in some emails, namely:

Content-Location: file://spr1inf1/scripts/password_reset_email/password_reset_html/reset.gif

(It's invalid because it's not in quotes and contains a colon, as a 'token' under RFC 2045). What's the general philosophy to be for handling unparseable fields? At least with content-type (in the original post) there's a standard to fall back on (text/plain), but not so obviously with locations. Should the library discard crap, or try to guess where to quotes should go?

Jim

mikel · 2009-11-03T02:33:29Z

Ok... so the overriding philosophy is "don't loose any information", the other one is "don't nuke user info with something generated"

In this case, we could implement a "quote_if_needed" method on the content-location in the initialization method of field/content_location.rb

Maybe, make the method inside of lib/utilities.rb, you can look in the existing ActionMailer on how to implement this, then call that on the passed in value of content-location... as content-location is only ever going to be a single value.

But if we have that in utilities, we could then possibly parse any other param fields in content-disposition or content-type, again, before treetop gets to it.

Mikel

mikel · 2009-11-05T04:51:03Z

2acb70a: Closes Issue #1 - Handling badly formatted content-type fields

scottsch mentioned this issue Feb 11, 2012

Problems creating multipart messages #351

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad content-types #3

Bad content-types #3

jlindley commented Nov 2, 2009

jlindley commented Nov 2, 2009

mikel commented Nov 3, 2009

mikel commented Nov 5, 2009

Bad content-types #3

Bad content-types #3

Comments

jlindley commented Nov 2, 2009

jlindley commented Nov 2, 2009

mikel commented Nov 3, 2009

mikel commented Nov 5, 2009