Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zimbra HTML parsing #75

Closed
defkev opened this issue Feb 4, 2016 · 3 comments
Closed

Zimbra HTML parsing #75

defkev opened this issue Feb 4, 2016 · 3 comments

Comments

@defkev
Copy link
Contributor

defkev commented Feb 4, 2016

Run this through http://talon.mailgun.net/

It will only strip the reply headers but miss the actual quote in the html part
text/plain extraction is working fine

Date: Thu, 4 Feb 2016 16:56:47 +0100 (CET)
From: admin@example.com
To: test@example.com
Message-ID: <1981017729.36.1454601407387.JavaMail.zimbra@example.com>
In-Reply-To: <514673316.27.1454601393179.JavaMail.zimbra@example.com>
References: <514673316.27.1454601393179.JavaMail.zimbra@example.com>
Subject: Re: Lorem Ipsum
MIME-Version: 1.0
Content-Type: multipart/alternative; 
    boundary="----=_Part_35_1109890054.1454601407386"
X-Originating-IP: [1.1.1.1]
X-Mailer: Zimbra 8.6.0_GA_1153 (ZimbraWebClient - FF44 (Win)/8.6.0_GA_1153)
Thread-Topic: Lorem Ipsum
Thread-Index: ddFMd6wnxYPGpAbdA2oNKj8MgU0bH6/lWgJ/

------=_Part_35_1109890054.1454601407386
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 


From: admin@example.com 
To: "admin" <root@example.com> 
Sent: Thursday, February 4, 2016 4:56:33 PM 
Subject: Lorem Ipsum 

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 


------=_Part_35_1109890054.1454601407386
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><body><div style=3D"font-family: arial, helvetica, sans-serif; font-s=
ize: 12pt; color: #000000"><div>Lorem ipsum dolor sit amet, consectetur adi=
piscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna al=
iqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris ni=
si ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehender=
it in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteu=
r sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt =
mollit anim id est laborum.</div><div><br></div><hr id=3D"zwchr" data-marke=
r=3D"__DIVIDER__"><div data-marker=3D"__HEADERS__"><b>From: </b>admin@mymon=
eyex.com<br><b>To: </b>"admin" &lt;root@example.com&gt;<br><b>Sent: </b>T=
hursday, February 4, 2016 4:56:33 PM<br><b>Subject: </b>Lorem Ipsum<br></di=
v><br><div data-marker=3D"__QUOTED_TEXT__"><div style=3D"font-family: arial=
, helvetica, sans-serif; font-size: 12pt; color: #000000"><div>Lorem ipsum =
dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididu=
nt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud =
exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis =
aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu =
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt=
 in culpa qui officia deserunt mollit anim id est laborum.</div></div><br><=
/div></div></body></html>
------=_Part_35_1109890054.1454601407386--
@obukhov-sergey
Copy link
Member

Thanks @defkev I'll check it out. Actually the talon.mailgun.net is running an outdated version. Could you check it with the latest one?

@defkev
Copy link
Contributor Author

defkev commented Feb 5, 2016

Initially installed from pip then cloned it, results stay the same.

I already made the - i suppose - necessary changes, just need to test it.
Would you be willing to accept a merge request for this?

@obukhov-sergey
Copy link
Member

@defkev sure, sorry for delay with response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants