Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMLTransformer ignoring multipart emails #22

Closed
loftux opened this issue Feb 23, 2015 · 1 comment
Closed

EMLTransformer ignoring multipart emails #22

loftux opened this issue Feb 23, 2015 · 1 comment

Comments

@loftux
Copy link
Contributor

loftux commented Feb 23, 2015

The transformer for RFC822 messages EMLTransformer.java has a severe bug that for those who store a lot of emails impacts performance.
The transformation of Multipart emails will always return the entire email, including attachments base64 text.

For indexing this results in indexing the plain text of base64 encoded attachment. A client of mine with 100.000+ emails could pretty much enter any character combination and get a hit. The index file size became 300+GB.
Preview of EML files, can get 300+ pages long in PdfJS viewer, since the the attachment base64 text is displayed.
How to reproduce

  • Create an email with html body and at least one attachment.
  • Create folder with a rule to transform to plain text
  • Transfer to Alfresco as EML file, drop into folder above.
    Expected: Only text should show up
    Actual: Text and encoding keys present. Attachment visible at base64.

Note: A long outstanding issue is that html part of email plain text is included when transforming. So you would probably see html as part of the transformation.
What is the cause?

In the EMLTransformer.java row 85-90 the mimetype is set to text/plain on the message. This destroys the message actual type of being multipart, so when the getContent is called it is always a string and never instanceof Multipart.
Just remove that and it works. It may have been needed with javax.mail 1.4.x, but it seem like it is not needed now with 1.5.x.
I will also have a look at making sure that that a plain text transformation does not include the html part of the message, and create a transformer that can pick out the html part and use that if available.

https://issues.alfresco.com/jira/browse/ALF-21259

@loftux
Copy link
Contributor Author

loftux commented Feb 24, 2015

If there is a winmail.dat attachment currently just ignore it. There is a Java library to read those, see http://www.freeutils.net/source/jtnef/
Other third-party tools http://www.oracle.com/technetwork/java/javamail/third-party-136965.html
This comment for reference should there in the future become a need for extending the transformation support.

@loftux loftux closed this as completed in ee612e8 Mar 26, 2015
loftux added a commit that referenced this issue Apr 25, 2018
…ccounts not reenabled when using authentication chaining (#22)

- Logins are now protected based on a combined key from authentication service system id and user name, this allows us to fix the case when a valid login was denied for subsequent authentication service if the prior authentication services in the chain failed.
- Cache is now set to be local, as the new implementation doesn't require it to be fully distributed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant