Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing mbox files? #11

Closed
onthebridgetonowhere opened this issue Dec 27, 2021 · 5 comments
Closed

Parsing mbox files? #11

onthebridgetonowhere opened this issue Dec 27, 2021 · 5 comments

Comments

@onthebridgetonowhere
Copy link

The library is great, thanks for putting it together! I was wondering if there are plans to have a parser for reading&parsing files in mbox format?

@mdecimus
Copy link
Member

Hi, thanks! I wasn't planning on supporting mbox but it could be a nice addition. I'll keep the issue open and as soon as I have some spare time I'll add the functionality. It shouldn't take too much time to implement it.

@onthebridgetonowhere
Copy link
Author

Thanks for the update, that'd be fantastic. If you start on it, I can maybe help with some PRs as well. Let me know!
Just for context, I need to parse emails from the apache mailing list, which I have them stored as mbox files.

@mdecimus
Copy link
Member

Hi, I just pushed to master the MBox parser as well as an example under the examples/ folder. Please, could you give it a quick test before I release it to crates.io? Thanks!

@onthebridgetonowhere
Copy link
Author

@mdecimus - fantastic, thank you! I've tried it out on more than 20k mbox files, totaling over 4M emails. Seems to work fine, at least for my case. There are some weird&infrequent corner cases where people copy paste emails (including all the metadata) and paste them in the body of the email; however, these are really hard to detect so this is good enough for me. I hope mail-parser becomes more popular and the standard way of parsing emails in Rust, it works really nice so far!

@mdecimus
Copy link
Member

There are some weird&infrequent corner cases where people copy paste emails (including all the metadata) and paste them in the body of the email

That is probably a bug in the process that generated the mbox file. When writing a message to an mbox file, any lines beginning with From should be escaped with a > symbol (for details here is the specification). Another alternative is that your mbox file was escaped using a different algorithm (i.e. adding the number of bytes to be read somewhere in the From header), if that is the case please send me a few samples and I'll add support for it.

I hope mail-parser becomes more popular and the standard way of parsing emails in Rust, it works really nice so far!

Thanks, I hope that too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants