-
-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There is one more unexpected space(' ') appear in the attachment name after parsing #809
Comments
Yea, quoted strings are not technically supposed to be folded. I'll try to look into this to see if adding a work-around is possible without breaking other things. |
thanks very much! |
Realistically, the original filename probably didn't have a tab -or- a space, the tab was just used when folding the parameter value because each line of a folded header MUST begin with a whitespace character and Lotus probably inserted the tab there to conform to that part of the spec. I've modified the parameter list parser to not convert tabs to spaces when it unquotes parameter values which is probably the best we can do because it's impossible to know if the sending client inserted a space for folding purposes or if it folded at a space (which is what a client should do in cases when it folds a header). For example, consider this improved folding of the filename:
Unfolding that would produce: ...which is most likely the correct original filename. You might be able to assume that tabs are not realistically part of a filename and so replacing tabs with string.Empty would be the best way to reproduce the original filename, but that logic isn't guaranteed. |
Unfortunately your screenshot doesn't give me much context. What am I looking at? Are those header values? Or is that the content of a TextPart? Or? |
Are you using a MimeKit.Text.HtmlToHtml text converter? |
Yes, I am using the "HtmlPreviewVisitor" of your sample source. |
Thanks. I'll look into this bug which I think I probably introduced a few days ago in the converter. |
Noted, thanks! |
In the original message, what are the |
Here's what I think the problem is:
Why is that a problem? Well, it means that the output of the conversion is no longer guaranteed to be ASCII which means that the charset declared in the Depending on how you pass the output along to the IE11 browser window, this can make a big difference. In my MessageReader sample, the code gets the converted string and sets it on the browser control via the DocumentText property which means that the browser control knows that the text it is getting is already a string (or, if it's a native control, it'd be converted to UTF-8 or UTF-16) and so no charset conversion would be needed. If the browser control/window loads the converted HTML via a Stream (e.g. by reading it from a saved output file), then it will end up using the I'm thinking that the best way to fix this is to continue saving the content to a file using UTF-8 but change the code in the HtmlTagCallback to modify the void HtmlTagCallback (HtmlTagContext ctx, HtmlWriter htmlWriter)
{
if (ctx.TagId == HtmlTagId.Meta && !ctx.IsEndTag) {
bool isContentType = false;
ctx.WriteTag (htmlWriter, false);
// replace charsets with "utf-8" since our output will be in utf-8 (and not whatever the original charset was)
foreach (var attribute in ctx.Attributes) {
if (attribute.Id == HtmlAttributeId.Charset) {
htmlWriter.WriteAttributeName (attribute.Name);
htmlWriter.WriteAttributeValue ("utf-8");
} else if (isContentType && attribute.Id == HtmlAttributeId.Content) {
htmlWriter.WriteAttributeName (attribute.Name);
htmlWriter.WriteAttributeValue ("text/html; charset=utf-8");
} else {
if (attribute.Id == HtmlAttributeId.HttpEquiv && attribute.Value != null
&& attribute.Value.Equals ("Content-Type", StringComparison.OrdinalIgnoreCase))
isContentType = true;
htmlWriter.WriteAttribute (attribute);
}
}
} else if (ctx.TagId == HtmlTagId.Image && !ctx.IsEndTag && stack.Count > 0) {
ctx.WriteTag (htmlWriter, false);
// replace the src attribute with a "data:" URL
foreach (var attribute in ctx.Attributes) {
if (attribute.Id == HtmlAttributeId.Src) {
if (!TryGetImage (attribute.Value, out var image)) {
htmlWriter.WriteAttribute (attribute);
continue;
}
var dataUri = GetDataUri (image);
htmlWriter.WriteAttributeName (attribute.Name);
htmlWriter.WriteAttributeValue (dataUri);
} else {
htmlWriter.WriteAttribute (attribute);
}
}
} else if (ctx.TagId == HtmlTagId.Body && !ctx.IsEndTag) {
ctx.WriteTag (htmlWriter, false);
// add and/or replace oncontextmenu="return false;"
foreach (var attribute in ctx.Attributes) {
if (attribute.Name.Equals ("oncontextmenu", StringComparison.OrdinalIgnoreCase))
continue;
htmlWriter.WriteAttribute (attribute);
}
htmlWriter.WriteAttribute ("oncontextmenu", "return false;");
} else {
// pass the tag through to the output
ctx.WriteTag (htmlWriter, true);
}
} |
MimeKit v3.4.0 has been released with this fix. |
Describe the bug
I am using Lotus Notes to extract an eml file , and found the "Content-Disposition" will be wrapped into two lines and there is a TAB('\t') appended in the second line when the attachment name is long.
Please see the sample below:
![image](https://user-images.githubusercontent.com/614477/175932959-9a2efdb4-1890-4710-8212-573d6d908b01.png)
After parsing by using the mimekit, the TAB '\t' will become a Space (' ')
I am not sure if it is because Lotus Notes did not follow the standard.
Thanks in advance!
The text was updated successfully, but these errors were encountered: