Using imapfw to analyze attachements #11

Rafiot · 2016-05-19T15:55:12Z

My goal is to bundle an IMAP proxy usable by endusers to sanitize email attachments transparently.

I am the maintainer of PyCIRClean and I'm going to write a python script that takes an email as input, lookup the attachment(s) with a script similar to this one, reattach the payload and return a "sanitized" email.

My question is the following: is there an easy way to hook a module to imapfw? I didn't find it in the documentation, sorry if I missed it.

nicolas33 · 2016-05-20T09:29:24Z

imapfw is not ready to download emails.

I'd be happy to know more on your use case to get a bigger picture of your expected use case. What do you mean by "hook a module"? What module(s) would you hook?

Rafiot · 2016-05-20T10:06:01Z

My idea is the following:

The users install and configure imapfw to connect to their imap server
The users configure their email client (Thunderbird, outlook,...) to connect to imapfw
imapfw acts as a transparent proxy for all messages except for the ones with attachments. In that case, it gives the whole source of the message to my script that check the sanity of the attachment and returns a message (after having changed the attachment if needed)

The other thing I'm not sure about is how an IMAP server would handle a message modified by the client. Do you have an idea?

nicolas33 · 2016-05-20T10:22:31Z

Thanks.

The other thing I'm not sure about is how an IMAP server would handle a message modified by the client. Do you have an idea?

emails are mapped with their UID. Changing the email invalidates the UID. So, the changed email must be removed from server and re-uploaded as a new email.

nicolas33 · 2016-05-20T11:17:46Z

Forgot to say our (WIP) documentation is available on the website

Rafiot · 2016-05-20T12:47:09Z

Thanks, I'll look at the doc and the code.

Do you think having such a module is doable with imapfw in the future? I'm comfortable working on a project under heavy development but I simply want to make sure it is possible before I start allocating time on it.

nicolas33 · 2016-05-20T13:01:05Z

I think it is possible in two ways:

if you need the "real" IMAP server to take changes into account, the proxy has to propagate the changes (download, make the change, delete old on server, create new on server).
if there's no need for the IMAP server to be aware of the changes, the proxy could be an IMAP server like Dovecot or internal (must be implemented). In this case, imapfw would sync both IMAP servers (and possibly make changes on the emails). Since Dovecot works on Maildir, imapfw could also sync the "real" IMAP server to the Dovecot database (Maildir).

nicolas33 · 2016-05-20T13:05:45Z

There another possible way: write a real proxy. This would be the hard path but very interesting, BTW.
In this case, imapfw should create a socket and allow triggers on IMAP requests.

Rafiot · 2016-05-20T16:18:35Z

Solution 1 seems to be the best one in my case (I don't really understand the difference with solution 3).

The goal is to be able to propose such a solution for occasional users of webmails too, so the changes need to be somehow synchronized back to the "real" IMAP server.

nicolas33 · 2016-05-20T17:01:55Z

There are different levels a proxy could act:

high-level (solution 1): the proxy exposes/tunes emails "on demand" when they are first downloaded and propagates changes back on "real". This could mean long waiting time responses for the client since it has to wait for a new UID (re-upload) on updates. In this case, discovering new emails is triggered by the client. The proxy implements almost a full IMAP server but the emails aren't stored locally. The database is the "real" server. In this case, the provided UIDs could be different from "real".
high-level (solution 2): imapfw syncs emails to a local maildir and then exposes them via IMAP which requires local disk space to store the emails. Discovering and updating emails is done by the proxy without connected client.
low-level (solution 3): proxy works on IMAP requests and commands. The proxy doesn't interpret the IMAP requests from the client. They are blindly relayed to the server. The exposed UIDs are the "real" UIDs. However, downloaded emails can be updated and propagated back on "real" before they are exposed to the client. This would mean long delays for plenty of IMAP commands because any discovered UID requires each email to be checked first in case they need changes. This likely requires a local database of known (already checked) UIDs and regexp on IMAP commands from the client.

There could be other ways of working for your proxy. For example, a monitor could regularly request "real" for new emails so they are checked and updated if needed. The proxy would only expose pre-validated UIDs.

nicolas33 · 2016-05-20T17:07:13Z

BTW, I wonder the update on emails is best done on the server at delivery time (MDA). I think most IMAP servers allow to do this kind of things. ,-)

Rafiot · 2016-05-20T22:13:38Z

Definitely, a postfix script will also happen, and it is the cleanest way, but the goal is to support user with no specific knowledge, infrastructure or support team at-hand (webmail users working in small organisations receiving all kind of ransomwares).

Just to make sure I got it right:

(everytime I say mail client, I mean Thunderbird/Outlook/...)

Solution 3

The mail client uses imapfw as an actual proxy and connects to it to get the emails, imapfw is the only one connecting to the IMAP server.
Every email passing through is send to the sanitizing module. If it is has an attachment, it is sanitized (optionally: the original email is sent in quarantine), the sanitized email is tagged as sanitized, pushed back to the server and passed to the email client.

Solution 1

imapfw acts as a mail client and modify the emails on demand

Downside: the mail client still connects to the remote IMAP server so it will still receive unprocessed malicious attachments if imapfw didn't had time to update the email.

Solution 2

imapfw does the the same as solution 1 but with a local storage.

Downside: If the email clients connects directly to imapfw, it will only see the emails in the local storage and not the ones on the server.

Solution 3 is most definitely the best one, because even if it is a bit slower at fetching the emails, the mail client will still do all the caching it was doing before (let's say the last 30 days and all the subjects) so the extra hop isn't critical.

nicolas33 · 2016-05-21T01:49:03Z

Solution 3

The mail client uses imapfw as an actual proxy and connects to it to get the emails, imapfw is the only one connecting to the IMAP server.

Same goes for 1 & 2.

Every email passing through is send to the sanitizing module. If it is has an attachment, it is sanitized (optionally: the original email is sent in quarantine), the sanitized email is tagged as sanitized, pushed back to the server and passed to the email client.

Same goes for 1 & 2.

Solution 1

imapfw acts as a mail client and modify the emails on demand

imapfw is an IMAP client in all the alternatives.

Downside: the mail client still connects to the remote IMAP server

No, the mail client connects to the proxy.

so it will still receive unprocessed malicious attachments if imapfw didn't had time to update the email.

Same goes for 2 & 3.

Solution 2

imapfw does the the same as solution 1 but with a local storage.

No, it doesn't do the same as solution 1. Solution 1 is about using IMAP as a language for the remote database. Solution 2 is about syncing both server and proxy.

Downside: If the email clients connects directly to imapfw, it will only see the emails in the local storage and not the ones on the server.

True, the latest emails on the server must be processed at regular intervals to update the local and remote databases.

Solution 3 is most definitely the best one, because even if it is a bit slower at fetching the emails, the mail client will still do all the caching it was doing before (let's say the last 30 days and all the subjects) so the extra hop isn't critical.

Caching can be done with solution 1, too.

However, I don't think solution 3 will be "a bit" slower. I think this can become a lot slower. For example, if the mail client only requests for the list of UIDs, the proxy must first download ALL the unkown emails to process them and then return the correct list of numbers.

I don't know which option is the best. I'd say it depends on what users expect. Solutions 1 and 3 are hard because IMAP is client side while the purpose is to apply changes on the server. Each solution has downsides. Proxying IMAP is "easy" as long as no modifications are made on the emails.

Rafiot · 2016-05-21T13:42:54Z

Ok, I understand now.

What the users expects is to receive their messages in their email client and not changing their habits. They also have multiple devices (PCs, phone, ...) and use a webmail from time to time so having a local cache isn't the goal, and we need to sync the changes back to the server (so the other clients also have the sanitized version).

Rafiot · 2016-05-21T14:22:01Z

Now a very practical question: is it something you think will be doable with imapfw in a near-ish future? Or should I look at an other library?

I'd very happy to participate to the development but right now, I don't really understand where I should look at in the code, as the framework seems very extensive.

nicolas33 · 2016-05-21T17:41:59Z

I can't tell how much time this would require since it depends on contributions (myself included). Also, this depends on your own knwoledges of Python and how "production ready" you expect it.

For now, imapfw is still early stage so you should expect to write quite some code. OTOH, this means you have more degrees of liberty to implement what you want.

I think imapfw has the best extensibility compared to any other library due to the design and the Python metaprograming capabilites.

I'd say imapfw can be a good long-time solution if you have enough time to spend on the code.

For a starter, I'd first look the screencast. Next, you should look at the code and request me on gitter. You can ask any question, as much as you want, so you can get a better picture of the current state and have a better overview of what you could do with imapfw.

nicolas33 · 2016-05-21T17:50:41Z

What the users expects is to receive their messages in their email client and not changing their habits. They also have multiple devices (PCs, phone, ...) and use a webmail from time to time so having a local cache isn't the goal, and we need to sync the changes back to the server (so the other clients also have the sanitized version).

Pushing back improves safety but most email clients will need 2 different accounts (one for the real remote and another for the proxy) so that the local caching of the clients won't be usefull while switching between both.

Whatever the solution, accessing the real would expose to un-checked emails.

Rafiot · 2016-05-22T22:06:14Z

Great, I'll dig into imapfw more and look for a way to implement it. I have decent skills in python so I'll definitely contribute as much as I can.

FYI, I wrote a quick&dirty script that takes a mail as input and returns a sanitized version: https://github.com/Rafiot/PyCIRCLean/blob/mail/bin/mail.py

It is far from being production ready, but this is the idea.

Regarding the 2 different accounts, I still don't get it :/

To me, it should work that way:

Of course, if the users uses any other device at the same time they use the proxy, they will get the un-sanitized version but as soon as the sanitizing is done, the only version that stays on the server is the sanitized one.
And on the machine where the proxy is running, they only get to see the mail when the sanitizing is done.

nicolas33 · 2016-05-23T09:14:49Z

nicolas33 · 2016-05-23T09:18:23Z

Things are worse because the clients can connect more than once and this should not trigger twice the same checks.

nicolas33 · 2016-05-23T10:07:40Z

The more I think about this, the more I'm convinced you should use both a proxy and a monitor. The proxy would only hides unchecked UIDs while the monitor (IDLE mode?) would sanitize the emails.

Rafiot · 2016-05-23T11:31:26Z

Okay, my idea was to have no database at the proxy's level and just look at the content of each email passing through, but your approach is probably more efficient.

I would still prefer, or at least have the possibility, to look at the original email in a quarantine folder but that's a detail at that point.

nicolas33 · 2016-05-23T13:41:20Z

Okay, my idea was to have no database at the proxy's level and just look at the content of each email passing through, but your approach is probably more efficient.

But you have to know which emails are already checked to avoid scanning them more than once.

I would still prefer, or at least have the possibility, to look at the original email in a quarantine folder but that's a detail at that point.

This is something I'd suggest at some point. Blindly trusting a sanitizer is crazy. ,-)

Rafiot · 2016-05-26T12:17:47Z

Sounds great, I now have a beta version of the mail parsing script: https://github.com/CIRCL/PyCIRCLean/blob/mail/bin/mail.py (it needs some refactoring)
I tested it on junk mails (~50k) and it works properly. Now we need to get the proxy together :)

Can you tell me what imapfw can do and can't do right now based on your last graph? This will help me to prepare my roadmap.

nicolas33 · 2016-05-26T14:23:38Z

You should look at the code. For IMAP sessions, see https://github.com/OfflineIMAP/imapfw/blob/master/imapfw/imap/imap.py#L108

Rafiot · 2016-07-07T13:36:39Z

very simple script to process a directory of emails: https://github.com/Rafiot/imapfw/blob/msghook/rascals/dev.messagehook.rascal

nicolas33 added the question label May 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using imapfw to analyze attachements #11

Using imapfw to analyze attachements #11

Rafiot commented May 19, 2016

nicolas33 commented May 20, 2016

Rafiot commented May 20, 2016 •

edited

Loading

nicolas33 commented May 20, 2016 •

edited

Loading

nicolas33 commented May 20, 2016

Rafiot commented May 20, 2016

nicolas33 commented May 20, 2016

nicolas33 commented May 20, 2016

Rafiot commented May 20, 2016

nicolas33 commented May 20, 2016

nicolas33 commented May 20, 2016 •

edited

Loading

Rafiot commented May 20, 2016 •

edited

Loading

nicolas33 commented May 21, 2016

Rafiot commented May 21, 2016

Rafiot commented May 21, 2016

nicolas33 commented May 21, 2016

nicolas33 commented May 21, 2016

Rafiot commented May 22, 2016

nicolas33 commented May 23, 2016

nicolas33 commented May 23, 2016

nicolas33 commented May 23, 2016

Rafiot commented May 23, 2016

nicolas33 commented May 23, 2016

Rafiot commented May 26, 2016 •

edited

Loading

nicolas33 commented May 26, 2016

Rafiot commented Jul 7, 2016

Using imapfw to analyze attachements #11

Using imapfw to analyze attachements #11

Comments

Rafiot commented May 19, 2016

nicolas33 commented May 20, 2016

Rafiot commented May 20, 2016 • edited Loading

nicolas33 commented May 20, 2016 • edited Loading

nicolas33 commented May 20, 2016

Rafiot commented May 20, 2016

nicolas33 commented May 20, 2016

nicolas33 commented May 20, 2016

Rafiot commented May 20, 2016

nicolas33 commented May 20, 2016

nicolas33 commented May 20, 2016 • edited Loading

Rafiot commented May 20, 2016 • edited Loading

nicolas33 commented May 21, 2016

Rafiot commented May 21, 2016

Rafiot commented May 21, 2016

nicolas33 commented May 21, 2016

nicolas33 commented May 21, 2016

Rafiot commented May 22, 2016

nicolas33 commented May 23, 2016

nicolas33 commented May 23, 2016

nicolas33 commented May 23, 2016

Rafiot commented May 23, 2016

nicolas33 commented May 23, 2016

Rafiot commented May 26, 2016 • edited Loading

nicolas33 commented May 26, 2016

Rafiot commented Jul 7, 2016

Rafiot commented May 20, 2016 •

edited

Loading

nicolas33 commented May 20, 2016 •

edited

Loading

nicolas33 commented May 20, 2016 •

edited

Loading

Rafiot commented May 20, 2016 •

edited

Loading

Rafiot commented May 26, 2016 •

edited

Loading