Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad content-types #3

Closed
jlindley opened this issue Nov 2, 2009 · 3 comments
Closed

Bad content-types #3

jlindley opened this issue Nov 2, 2009 · 3 comments

Comments

@jlindley
Copy link
Contributor

jlindley commented Nov 2, 2009

What is the proper way to handle bad content types? I've read about as many RFC's today as I can stand, and I missed the content-type fall back recommendations if they're in there.

Of 400mb or so of email I've attempted to parse today (largely the Enron mail corpus) the vast majority of parse errors were on the content-type header.

Mostly things like

Content-Type: text

(Instead of text/plain)

Or like:

Content-Type: multipart/mixed boundary="----=_NextPart_000_000F_01C17754.8C3CAF30"

(Missing the ';' delimiter before the value hash)

I committed a fix to my fork[1] that sets content-type to 'text/plain' on parser errors, but that doesn't feel quite right. Should it just ignore that field in the header altogether?

Thanks-

Jim

[1] http://github.com/jlindley/mail/commit/2fd51a8d757bbec2a7ef553b6bc52486b45539ab

@jlindley
Copy link
Contributor Author

jlindley commented Nov 2, 2009

Actually now I've run across another unparseable field in some emails, namely:

Content-Location: file://spr1inf1/scripts/password_reset_email/password_reset_html/reset.gif

(It's invalid because it's not in quotes and contains a colon, as a 'token' under RFC 2045). What's the general philosophy to be for handling unparseable fields? At least with content-type (in the original post) there's a standard to fall back on (text/plain), but not so obviously with locations. Should the library discard crap, or try to guess where to quotes should go?

  • Jim

@mikel
Copy link
Owner

mikel commented Nov 3, 2009

Ok... so the overriding philosophy is "don't loose any information", the other one is "don't nuke user info with something generated"

In this case, we could implement a "quote_if_needed" method on the content-location in the initialization method of field/content_location.rb

Maybe, make the method inside of lib/utilities.rb, you can look in the existing ActionMailer on how to implement this, then call that on the passed in value of content-location... as content-location is only ever going to be a single value.

But if we have that in utilities, we could then possibly parse any other param fields in content-disposition or content-type, again, before treetop gets to it.

Mikel

@mikel
Copy link
Owner

mikel commented Nov 5, 2009

2acb70a: Closes Issue #1 - Handling badly formatted content-type fields

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants