Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Greenmail to have persistent storage #166

Closed
msaladin opened this issue Oct 31, 2016 · 11 comments
Closed

Allow Greenmail to have persistent storage #166

msaladin opened this issue Oct 31, 2016 · 11 comments

Comments

@msaladin
Copy link

Greenmail is very fast and this is cool for embedding it into automated tests. Sometimes, we just want to have a mail server which is started on a server, and the SUT is sending all the mails to this mail server. Currently, when the mail server is Greenmail, after a restart, all the messages would be lost because Greenmail implements only a InMemoryStore.

Is it possible to make the InMemoryStore configurable from outside and provide different implementations. I would like to contribute a FileMemoryStore which would then store all the mailboxes on the file system (location can be configured). This would make Greenmail a mail server which can be stopped and restarted without data loss.

Is this something that is of interest?

Thanks for any feedback.
Cheers Michael

@camann9
Copy link
Contributor

camann9 commented Oct 31, 2016

@marcelmay What do you think?
I think there is little to be said against this. We'd just have to think about how we configure this

@buildscientist
Copy link
Member

@msaladin how are you planning on storing the data? Are you going to serialize the emails? That might be a bit problematic if you're expecting to send in a lot of emails with attachments.

I think a better option would be to have a flag to write out the raw email with a base64 encoded attachment out to disk - so each email would be written to an individual file.

@msaladin
Copy link
Author

@buildscientist: I did not yet think about the implementation details too much. But my impression is that to create a file for each mail is not really good, because some OS have difficulties with directories which contain thousands of files. But I agree that handling all the mails including attachment data in a single file has disadvantages as well. So an idea which needs more elaboration would be to use the single-file-approach format, and only create additional files (maybe distributing the attachments over several directories instead of a single directory) for mails with attachments. But I would go for plain textfiles in any case, and binary data would be base64-encoded.

Thunderbird does it like this: My thunderbird has very large files containing all my mails, separated by an empty line, see https://support.mozilla.org/en-US/questions/1046974.

@buildscientist
Copy link
Member

@msaladin I understand that thousands of files might be an issue for some older filesystems - but most modern filesystems have pretty much overcome that issue.

Mail.app on OS X/macOS actually implements storage of emails as such. I would think the divide and conquer approach would be much more optimal than stuffing your data into a single file.

The other alternative is to store the data as a serialized BLOB in an embedded db like SQLlite - but I don't like the idea of adding an embedded db to GreenMail. Much easier for someone to navigate a filesystem with individual files/emails.

@marcelmay & @camann9 - any thoughts?

@camann9
Copy link
Contributor

camann9 commented Oct 31, 2016

I agree. Unless we anticipate having thousands of files on there I think we
should go with the single file approach.

On Mon, Oct 31, 2016, 18:34 Youssuf ElKalay notifications@github.com
wrote:

@msaladin https://github.com/msaladin I understand that thousands of
files might be an issue for some older filesystems - but most modern
filesystems have pretty much overcome that issue.

Mail.app on OS X/macOS actually implements storage of emails as such. I
would think the divide and conquer approach would be much more optimal than
stuffing your data into a single file.

The other alternative is to store the data as a serialized BLOB in an
embedded db like SQLlite - but I don't like the idea of adding an embedded
db to GreenMail. Much easier for someone to navigate a filesystem with
individual files/emails.

@marcelmay https://github.com/marcelmay & @camann9
https://github.com/camann9 - any thoughts?


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#166 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGiuae8712yyz_ogzTb3O-QJKODDmRxaks5q5ibBgaJpZM4Kk-5d
.

@msaladin
Copy link
Author

msaladin commented Nov 3, 2016

There is even a standard for this single-file-approach:
https://tools.ietf.org/html/rfc4155

@buildscientist
Copy link
Member

@camann9 Well that's kind of the point isn't it - how can you anticipate "not" having thousands of files.

I do recall @marcelmay mentioning that Alfresco uses GreenMail across their entire test suite and generates more than thousands of emails.

@msaladin I don't have a problem with the mbox format - all I'm saying is that from a testability standpoint it's much easier to have the emails stored individually. For example - someone might use their tests or application to generate an email and for whatever reason they may want to reload it. With a single file - we'd have to provide an API for them to retrieve that individual email.

@msaladin
Copy link
Author

OK, I seem to have messed up with my pull-requests, i wanted to create two individual pull requests, one for issue 165 and one for issue 166, but it seems that my subsequent commits were automatically pushed to the initiated pull requests. Sorry for this.

Anyhow, the pull request is now ready for you guys to evaluate it... it was actually a lot of changes, but I tried to do it as defensibly as possible. The main changes were that I just added a new class which implements a MailFolder (FileHierarchicalFolder) and a new class which implements a Store (MBoxFileStore). You then can easily set this store from outside like this:

-Dgreenmail.mailstore.impl.class=com.icegreen.greenmail.filestore.MBoxFileStore
-Dgreenmail.filestore.rootdir=/var/opt/greenmail/store

Additionally, I added a new JUnit rule for almost all the tests, because then the tests are actually run twice, once with the existing InMemoryStore, and once with the MBoxFileStore. Those unit tests then look like this:

@Rule
public final GreenMailRuleWithStoreChooser greenMail = new
GreenMailRuleWithStoreChooser(ServerSetupTest.SMTP_IMAP);
``
@Test
`@StoreChooser(store="file,memory")`
`public void testFetchUidsAndSize() throws MessagingException {...`

This means that this test (testFetchUidsAndSize) is run twice, once with the default and existing InMemoryStore, and once with the new filestore (where the root.dir is a temporary directory).

As as side note: I tested it locally on my RHEL linux with 130'000 messages, and it works quite fine. I played around with the MBOX format as well (currently, the MBoxFileStore store each message in a individual file). The performance was equal, independent of whether I used a single MBOX file store with random access or individual files. So I think that modern FS (I think ext3) can handle this quite good. And therefore the design decision was to make it as simple as possible:

  • Deletion is much more simpler when just deleting a file is enough (instead of marking a message as deleted and then compacting the MBOX file from time to time)
  • It is much more intuitive... When people delete a message on the filesystem, the message is deleted from the mailbox as well. Quite easy to do maintenance (e.g. delete all messages which are older than X days can be easily implemented with a "find -mtime +30 -exec rm -rf {}" command as cronjob.

Inhouse, we are already using the new implementation. I am looking forward to hearing from you whether you like it or not, and whether you can merge the pull-request. Thanks a lot.

@camann9
Copy link
Contributor

camann9 commented Nov 26, 2016 via email

@buildscientist
Copy link
Member

@camann9 I can have a look at this. I think the more code reviews we do the better.

@buildscientist
Copy link
Member

I believe this was discussed in the PR. The bottom line is - while we don't have any issues with the functionality, it needs to be pulled out of core and introduced as it's own GreenMail module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants