Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MimeMessage Load / WriteTo byte compatibility fail cases #499

Closed
6 of 8 tasks
nitinag opened this issue Aug 20, 2019 · 16 comments
Closed
6 of 8 tasks

MimeMessage Load / WriteTo byte compatibility fail cases #499

nitinag opened this issue Aug 20, 2019 · 16 comments
Labels
bug Something isn't working compatibility Compatibility with existing software

Comments

@nitinag
Copy link
Sponsor

nitinag commented Aug 20, 2019

Since MimeMessage "attempts to maintain byte-for-byte compatibility with the original stream", I'm including test cases below where that fails.


The issues below are:

  • 1a. MimeMessage does not parse / track raw headers that don't have a colon (:), but does with invalid field names with spaces via RawField. Maybe it could do the same even if no colon? While these are bad headers, the unparsed mime part headers will cause DKIM to fail as well.
  • 1b. MimeMessage.WriteTo does not add CRLF if there is an epilogue when formatOptions.EnsureNewLine = true. It does add the new line when there is no epilogue.
var sourceStream = new MemoryStream(Encoding.UTF8.GetBytes(@"From: Example Test <test@example.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
   boundary=""simple boundary""
Example: test
Test
Test Test
Test:
Test: 
Test: Test
Test Example:

This is the preamble.

--simple boundary
Content-TypeS: text/plain

This is a test.

--simple boundary
Content-Type: text/plain;
Content-Disposition: attachment;
Content-Transfer-Encoding: test;
Content-Transfer-Encoding: binary;
Test Test Test: Test Test
Te$t($)*$= Test Test: Abc def
test test = test
test test :: test
filename=""test.txt""

Another test.

--simple boundary--


This is the epilogue."));
var mimeMessage = MimeMessage.Load(sourceStream);
var destinationStream = new MemoryStream();
var formatOptions = FormatOptions.Default.Clone();
formatOptions.EnsureNewLine = true;
mimeMessage.WriteTo(formatOptions, destinationStream);
if (sourceStream.Length + 2 != destinationStream.Length) //+2 expected due to formatOptions.EnsureNewLine = true on destinationStream
	throw new Exception("Expected length does not match.");

The issues below are:

  • MimeMessage produces extra CRLF when only headers are present (main headers or mime part headers). formatOptions.EnsureNewLine=false does not disable this behavior with Headers. Both cases below show up with a extra CRLF on WriteTo regardless of the original.
  • 2a.
var sourceStream = new MemoryStream(Encoding.UTF8.GetBytes(@"From: <test@example.com>
To: <test@example.com>"));
var mimeMessage = MimeMessage.Load(sourceStream);
var destinationStream = new MemoryStream();
var formatOptions = FormatOptions.Default.Clone();
formatOptions.EnsureNewLine = false;
mimeMessage.WriteTo(formatOptions, destinationStream);
if (sourceStream.Length != destinationStream.Length)
	throw new Exception("Expected length does not match.");
  • 2b.
var sourceStream = new MemoryStream(Encoding.UTF8.GetBytes(@"From: <test@example.com>
To: <test@example.com>
"));
var mimeMessage = MimeMessage.Load(sourceStream);
var destinationStream = new MemoryStream();
var formatOptions = FormatOptions.Default.Clone();
formatOptions.EnsureNewLine = false;
mimeMessage.WriteTo(formatOptions, destinationStream);
if (sourceStream.Length != destinationStream.Length)
	throw new Exception("Expected length does not match.");

The issues below are:

  • 3. Not ending multipart boundary and having text after causes an extra CRLF to be added
var sourceStream = new MemoryStream(Encoding.UTF8.GetBytes(@"From: Example Test <test@example.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
   boundary=""simple boundary""

This is the preamble.

--simple boundary

This is a test."));
var mimeMessage = MimeMessage.Load(sourceStream);
var destinationStream = new MemoryStream();
var formatOptions = FormatOptions.Default.Clone();
formatOptions.EnsureNewLine = false;
mimeMessage.WriteTo(formatOptions, destinationStream);
if (sourceStream.Length != destinationStream.Length)
	throw new Exception("Expected length does not match.");

The issues below are:

  • 4. Not ending multipart boundary and no CRLF after boundary on same line, causes boundary itself to disappear
var sourceStream = new MemoryStream(Encoding.UTF8.GetBytes(@"From: Example Test <test@example.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
   boundary=""simple boundary""

This is the preamble.

--simple boundary"));
var mimeMessage = MimeMessage.Load(sourceStream);
var destinationStream = new MemoryStream();
var formatOptions = FormatOptions.Default.Clone();
formatOptions.EnsureNewLine = false;
mimeMessage.WriteTo(formatOptions, destinationStream);
if (sourceStream.Length != destinationStream.Length)
	throw new Exception("Expected length does not match.");

The issues below are:

  • 5. Ending multipart boundary on same line causes extra CRLF
var sourceStream = new MemoryStream(Encoding.UTF8.GetBytes(@"From: Example Test <test@example.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
   boundary=""simple boundary""

This is the preamble.

--simple boundary--"));
var mimeMessage = MimeMessage.Load(sourceStream);
var destinationStream = new MemoryStream();
var formatOptions = FormatOptions.Default.Clone();
formatOptions.EnsureNewLine = false;
mimeMessage.WriteTo(formatOptions, destinationStream);
if (sourceStream.Length != destinationStream.Length)
	throw new Exception("Expected length does not match.");

The issues below are:

  • 6. Correctly places text into Multipart preamble, but oddly after a LINQPad Dump (wasn't able to find which property is causing this), it's adding a ";" and new boundary to the headers on WriteTo when none was present on input.
var sourceStream = new MemoryStream(Encoding.UTF8.GetBytes(@"From: Example Test <test@example.com>
Content-Type: multipart/mixed

This is the preamble.
."));
var mimeMessage = MimeMessage.Load(sourceStream);
mimeMessage.Dump(); //LINQPad access of properties
var destinationStream = new MemoryStream();
var formatOptions = FormatOptions.Default.Clone();
formatOptions.EnsureNewLine = false;
mimeMessage.WriteTo(formatOptions, destinationStream);
if (sourceStream.Length != destinationStream.Length)
	throw new Exception("Expected length does not match.");

Message after LINQPad property Dump:

From: Example Test <test@example.com>
Content-Type: multipart/mixed; boundary="=-/2bdSQCMK1FMz+QLxIb31w=="

This is the preamble.
.
jstedfast added a commit that referenced this issue Aug 21, 2019
…en EnsureNewLine is true

Partial fix for issue #499
jstedfast added a commit that referenced this issue Aug 21, 2019
@jstedfast
Copy link
Owner

Are these all representative of real-world test cases?

I'm just not sure the remaining issues are worth all of the time & effort it will take to "fix" them considering they are all invalid MIME.

@nitinag
Copy link
Sponsor Author

nitinag commented Aug 21, 2019

Byte-for-byte compatibility with the original stream is important for us in our use cases, even when the there may be invalid MIME present. I tested for two things: byte-for-byte with the original stream with FormatOptions.EnsureNewLine = false and consistent newline addition when formatOptions.EnsureNewLine = true on headers/body.

Two situations led me to look for these test cases:

  • We plan on using ImapClient.GetMessage for migration of messages from other imap servers to our own (custom built) imap server. It would be important for users that the messages we import are byte-for-byte the same one.
  • We're using MimeKit and SmtpClient during Smtp MTA relay / queue delivery, we want to be sure the messages we relay are byte-for-byte what's in the original stream to avoid breaking DKIM and so we don't end up with hard to debug potential issues with messages due to them being potentially slightly different on send.

This isn't a very high priority since as you said the cases are invalid MIME and would have limited impact, but they likely have "some" impact in certain cases and it'd be great if this was eventually "fixed" since it feels like MimeKit is already 98% byte-for-byte compatible and these would help get it closer to 100%.

@jstedfast
Copy link
Owner

FWIW, MailKit's ImapClient can also fetch raw Streams. Take a look at the ImapFolder.GetStream() methods (also has Async variants).

You may find that useful. I'm not sure I can ever guarantee 100% byte-for-byte compatibility in all cases when reserializing a MimeMessage. I'm just not sure it's even feasible without making MimeMessage an immutable object and adding a backing Stream that contains the raw data.

I could also potentially add something that would allow you to do smtpClient.Send (rawStream, sender, recipients, ...);

since it feels like MimeKit is already 98% byte-for-byte compatible and these would help get it closer to 100%.

You know what they say about the last 20%... it's even more true about the last 2%, unfortunately.

I'll poke at some more of these issues to see if I can find a solution to them, but I'm not sure I'll be able to get them all.

@nitinag
Copy link
Sponsor Author

nitinag commented Aug 21, 2019

That's great! We'd always prefer to retrieve / store / send the raw stream of data to avoid any potential unknown parser issues.

Taking a quick look at the code for ImapFolder.GetStreamAsync() seems like it should give us that for getting raw messages from ImapClient, though I still have to play around and test it, and the new smtpClient.SendAsync (rawStream, sender, recipients, ...); overload you suggested would give us that for SmtpClient, solving the underlying motivation for both cases. In certain cases where we need to change the headers (like adding them on incoming mx), we can choose to load only the headers in MimeKit, modify them, and merge them with the original raw data stream ourselves so even in that case, the raw body format is preserved.

That said, I think fixing the no colon header issue may help ensure we don't unintentionally remove those portions of the headers (though invalid) when we export it back out since that's a case that's already been identified. Being able to rely on consistent newline behavior with the Header WriteToAsync may also be useful for the same reason when we merge back, but we'll likely double check and ensure that on our side anyway. Maybe we'll even just stick to adding only our expected headers to the very top of the existing headers thus leaving the existing headers format fully intact as well.

jstedfast added a commit that referenced this issue Aug 21, 2019
@jstedfast
Copy link
Owner

Fixed the issue about headers that don't contain a ':' (or are otherwise invalid).

@nitinag
Copy link
Sponsor Author

nitinag commented Aug 22, 2019

Nice! Pulled the latest version from the MyGet feed and that test passes now.

@jstedfast
Copy link
Owner

I've updated your original comment to add markdown checkboxes so that I can check off the issues that I have fixed.

@jstedfast
Copy link
Owner

I've got a patch for your 3rd issue, but it breaks a load of other tests :-\

@jstedfast
Copy link
Owner

Since messages technically always need to end in CRLF anyway, I'm thinking we can probably dismiss items 2a, 3 and 5 in the list above.

@nitinag
Copy link
Sponsor Author

nitinag commented Aug 24, 2019

Our particular use cases are best solved via the ImapFolder.GetStream() and smtpClient.SendAsync (rawStream, sender, recipients, ...); you identified.

Given those solutions, currently only the header fixes will have an impact on us, which have already been made.

However, I think it might be useful on a future TODO list to consider having the library produce consistent and known behavior when it comes to newlines. For example, you could decide to ensure the FormatOptions.EnsureNewLine flag is always honored for headers/body with the default value set to true. Or, and I think this would be my vote, you could simply remove/obsolete that flag altogether and say when saving, the library produces correct CRLF behavior and always ends in NewLine (except for Binary which is the only special case that I'm currently aware of where NewLine should not touched/added, but may be present on some rare binary content). Then, those working on existing raw message use cases will have the rawStream versions in all components to use, for those just parsing, adding an extra new line if not present shouldn't matter, and for those creating/modifying, they'll always end up with correct line ending behavior afterward.

@jstedfast
Copy link
Owner

I was going to implement the Send (Stream, ...) API but stopped because I need to figure out a few issues with that idea:

  1. Can I rely on the stream being in canonical CRLF format? If not, I could write it through a CRLF filter, but that won't work if the content contains binary MIME.
  2. Can I rely on the stream ending with a newline?

As far as this bug report, I guess I can close this now.

@jstedfast jstedfast added bug Something isn't working compatibility Compatibility with existing software labels Aug 24, 2019
@nitinag
Copy link
Sponsor Author

nitinag commented Aug 24, 2019

  1. Can I rely on the stream being in canonical CRLF format? If not, I could write it through a CRLF filter, but that won't work if the content contains binary MIME.
  2. Can I rely on the stream ending with a newline?

I would expect the stream to be sent as is and the developer to have addressed any CRLF or other issues when the message was created via MimeMessage or otherwise. If one needs SmtpClient to modify it, they can and should use the MimeMessage overloads, which handle the CRLF issues in addition to other things such as line length issues via MimeMessage.Prepare, etc.

@nitinag
Copy link
Sponsor Author

nitinag commented Aug 29, 2019

Hi, any additional thoughts on smtpClient.SendAsync (rawStream, sender, recipients, ...);?

@jstedfast
Copy link
Owner

Oh, hey, sorry. I ran into some other issues.

There are a bunch of virtual protected API's that pass a MimeMessage parameter.

In order to implement a Send (Stream, ...) method, all of these hooks would have to be duplicated :-\

@nitinag
Copy link
Sponsor Author

nitinag commented Aug 30, 2019

I see what you mean. What would you suggest?

We'll definitely want to implement this for our use case since:

  • We want to ensure byte-for-byte compatibility for smtp mta relay including for mx forwards.
  • Loading the MimeMessage (even with the persistent flag) for large messages (e.g., 100MB) is wasteful since we don't need to use the parsed object during send and even more so since we use remote streams (it would help us avoid a second reading of the entire stream).

Does it make sense to include as part of the library or would it make more sense for us to look at maintaining our own copy of SmtpClient supporting rawStream (doesn't look like sub-classing would work given the changes required)?

@jstedfast
Copy link
Owner

Maintaining your own SmtpClient may be the best way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compatibility Compatibility with existing software
Projects
None yet
Development

No branches or pull requests

2 participants