You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The transformer for RFC822 messages EMLTransformer.java has a severe bug that for those who store a lot of emails impacts performance.
The transformation of Multipart emails will always return the entire email, including attachments base64 text.
For indexing this results in indexing the plain text of base64 encoded attachment. A client of mine with 100.000+ emails could pretty much enter any character combination and get a hit. The index file size became 300+GB.
Preview of EML files, can get 300+ pages long in PdfJS viewer, since the the attachment base64 text is displayed.
How to reproduce
Create an email with html body and at least one attachment.
Create folder with a rule to transform to plain text
Transfer to Alfresco as EML file, drop into folder above.
Expected: Only text should show up
Actual: Text and encoding keys present. Attachment visible at base64.
Note: A long outstanding issue is that html part of email plain text is included when transforming. So you would probably see html as part of the transformation.
What is the cause?
In the EMLTransformer.java row 85-90 the mimetype is set to text/plain on the message. This destroys the message actual type of being multipart, so when the getContent is called it is always a string and never instanceof Multipart.
Just remove that and it works. It may have been needed with javax.mail 1.4.x, but it seem like it is not needed now with 1.5.x.
I will also have a look at making sure that that a plain text transformation does not include the html part of the message, and create a transformer that can pick out the html part and use that if available.
…ccounts not reenabled when using authentication chaining (#22)
- Logins are now protected based on a combined key from authentication service system id and user name, this allows us to fix the case when a valid login was denied for subsequent authentication service if the prior authentication services in the chain failed.
- Cache is now set to be local, as the new implementation doesn't require it to be fully distributed
The transformer for RFC822 messages EMLTransformer.java has a severe bug that for those who store a lot of emails impacts performance.
The transformation of Multipart emails will always return the entire email, including attachments base64 text.
For indexing this results in indexing the plain text of base64 encoded attachment. A client of mine with 100.000+ emails could pretty much enter any character combination and get a hit. The index file size became 300+GB.
Preview of EML files, can get 300+ pages long in PdfJS viewer, since the the attachment base64 text is displayed.
How to reproduce
Expected: Only text should show up
Actual: Text and encoding keys present. Attachment visible at base64.
Note: A long outstanding issue is that html part of email plain text is included when transforming. So you would probably see html as part of the transformation.
What is the cause?
In the EMLTransformer.java row 85-90 the mimetype is set to text/plain on the message. This destroys the message actual type of being multipart, so when the getContent is called it is always a string and never instanceof Multipart.
Just remove that and it works. It may have been needed with javax.mail 1.4.x, but it seem like it is not needed now with 1.5.x.
I will also have a look at making sure that that a plain text transformation does not include the html part of the message, and create a transformer that can pick out the html part and use that if available.
https://issues.alfresco.com/jira/browse/ALF-21259
The text was updated successfully, but these errors were encountered: