New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to export from gmvault's internal storage to standard formats #80
Conversation
- Inbox is exposed - Starred messages are exposed - Folder names are escaped properly - Chats are exported - Progress messages are printed
This is great (if it works, I have not tested). Please test & merge! |
@vasi Coming back from my holday. Thanks I will check your add-on and will let you know about it. |
@vasi sorry for the delay very busy at the moment. I come to the integration of your format translators. |
No problem, I know the feeling! :) |
mbox clearly can't share mails across label-based mailboxes but along with the tag cache I mentioned in #92, maildor++ could. |
I don't quite understand, could you explain further? |
What I want to store for each tag is the list of gmail_id that have the tag, that way you can check whether a mail has a multiple tags or not and hard link them if you want. |
@keltia OK I get what you want. You want some kind of index. I will think about it because if it is in Json it can be a pretty long list. |
Anything I can do to help this along? |
@vasi yes I had no time to try and test it. It is coming with unit test files ? |
No tests yet, I'll take a look at adding some. Currently I'm basing off of master, is that the right thing to do? Or should I base off of gmv-perf-1.7.2? |
Base on gmv-perf-1.7.2. Then with the tests we will integrate. On Fri, Dec 14, 2012 at 4:25 PM, Dave Vasilevsky
|
@vasi I will go on vacation until the 28th of Dec but I can be contacted. When you are ready with the testing contact me by email as we will integrate your tests in the Jenkins suite I have created for Gmvault. Guillaume |
I will also be on vacation until the new year. We'll work on this later. Enjoy your holiday! |
@vasi You too. Contact me next year once you have the test suite. I really would like to release a new version beginning of next year. |
Just used these patches to export 14,000 mails to mbox format and it worked very well. Thanks. |
@sjuxax Great version 1.7.2 will contain these patches. We are working on it |
@gaubert Ok, I'm back from holiday and I've rebased on gmv-perf-1.7.2, you can see the result in the 'export2' branch: https://github.com/vasi/gmvault/tree/export2 I'm ready to look into adding tests now, but I currently can't successfully run any of the existing tests I tried because they all look for specific paths like '/homespace/gaubert/.ssh'. Is there something I'm doing wrong? |
Also, I'm not 100% sure what we want to test, for the purposes of export. Obviously 'we can ask for an export, and nothing crashes' is a nice baseline, but just doing nothing passes that test ;) Testing that the output is correct is trickier. A diff with known-good output isn't good enough, since an export format may have multiple ways of correctly representing the same database. For example the hostname part of maildir IDs can be ignored, but will cause exports of different machines to have different pathnames. Maybe the right solution is to add import as well, and verify that export-then-import yields the identical database? Since the purpose of export is to be able to use gmvault databases with other programs, the real best test would be to actually use these other programs, but that's hard to automate! We should at least specify in the docs what external programs are supported by export, currently I've targeted mbox at Thunderbird, and maildir at dovecot. |
Yes the tests look for oauth token files stored in /homespace/gaubert/.ssh in the test machine. The best for you is to ignore these tests for now and build unitests for the mail export part. Once it has been done, I will integrate them in the main gmvault src tree and in the Jenkins I use for validation. One of the current test idea is to have a reference test mail account which contains a limited subset of emails (up to 100 max) that have specificities (tricky labels, badly formatted email, ....) and check that Gmvault can backup and restore them with some checkings to validate that the restored mailbox is identical to the original one. Another test file validates the command line interface. |
@vasi, you could effectively add the import which would be a good feature as well or keep a mbox and maildir export and then create the export from the Gmvault-db and compare the result with the kept mbox or maildir export that work with Thunderbird and dovecot. The kept reference mbox and maildir exports should be small but representative. |
@vasi I am testing. Does it work with labels in UTF-8 characters (French with accents, German, Japanese, ...). |
@gaubert My test gmail account already has labels with forbidden characters, like tilde, but testing UTF-8 too is a good idea!
As I was trying to say before, this doesn't work. Maildir includes paths that look like "1357207594.M717564P16249Q11615.myhost.mydomain", but obviously "myhost.mydomain" will be different depending on the system. I don't see any obvious way in the Python 'mailbox' module to have it use a mock hostname instead of the real one :( |
@vasi If I launch the export command a second time will redo the all export or only what is new ? Do we need a --resume mode in case of failure like we can have with the sync or restore. It might be useful to be able to restart from where you were. |
@vasi regarding myhost.mydomain, in your tests, you could anticipate that and get it from the system, the same way mailbox does it (probably using socket.gethostname()) and assert the right parts. As I said we want a validation tests here so if you have only hundred of emails max it should be enough. |
Hmm, so I do see the "From MAILER-DAEMON" stuff in the mbox file, but in my case Thunderbird still uses the proper "From:" line. I guess we can try calling set_from() on mbox messages and seeing if that helps your case. But I would still like to understand what's going on, and why Thunderbird interprets your messages so strangely. Good luck narrowing it down! |
Oh, and so I don't forget: Using the import method I showed above, Thunderbird will not import subdirectories that don't have an associated main mailbox. We should probably create an empty main mailbox for each subdirectory when exporting to mbox! |
@vasi I could not do it last night. I will try this week-end but I have a busy schedule. In the mean time, you could try to reproduce the error but adding >From in one of your emails and see if it breaks mbox ? |
Ok, I've improved logging.
Some notes:
|
A couple more things:
|
I did some more testing against different importers.
These can be worked around using OfflineIMAP's |
@vasi Was away this week-end and could not dedicate time to Gmvault.
Need more time to think about the rest |
@vasi regarding the imapoffline and dovecots issues. I agree with you. Having options in the command line for that should do the trick. What about --type maildir --flavour dovecot or offlineimap with one chosen by default. Tell me which is the best ? flavour would only work in the case of --type maildir. |
@vasi for this one:
Instead, you could use the internal date which is in the meta info and then forget about the directory. |
|
@vasi
Ok Let me know if I have missed something and where we are with these tasks. |
I think the only thing we missed is testing for Windows compatibility, including import. We will have to do that at some point. |
@vasi I would like to merge your branch. Where are you with the different flavours ? I think I will take over from where you are if you don't mind. Let me know |
Sorry for the lack of progress, I've had the flu :( Feel free to take over if you like. |
@vasi Could you still implement the --type dovecot and --type offlineimap flavours and I will take over from there as you were almost there. This would allow me to include export in the next release. All the left bugs on the other features have been solved. |
Ok, I will finish implementing the --type options. |
Hi Dave, When do you think you can have it done ? Thks. On 19 February 2013 01:27, Dave Vasilevsky notifications@github.com wrote:
|
@vasi any progress ? |
Yup, starting it out. |
@vasi add the support for OfflineIMAP and the other flavours in --type and I will deal with it afterwards. |
Ok, I've added the flavour support, though it was harder than I thought. Dovecot and OfflineIMAP flavours both appear to work now. I did some testing, but not exhaustively. |
@vasi ok thanks. I will pull your add-ons and include them in the export branch for the next release. I really would like to release the new Gmvault within 2-3 weeks now. |
Great! Let me know if you need any help. |
@vasi Why did you decide to put offlineimap as the default ? |
@vasi I have briefly tested all the modes (with label selection and hierarchical labels) and verified that it could be then used with Thunderbird. I am going to merge it in my dev branch for the final testing and because I want to release the next version. Many thanks for your help and efforts |
I used OfflineIMAP as the default because I assumed people would use export to switch to another IMAP provider. But maybe Thunderbird makes more sense, I don't know. Thanks for all your work on gmvault. |
FYI I regularly export my emails just to save a local copy. I then delete everything from the server, import the exported files into Thunderbird, compress and encrypt the exported files and distribute them such that I can recover them if necessary. This way, I don't have a lot of emails sitting on Gmail waiting to be pwned and/or subpoenaed. |
@sjuxax thanks for detailing your potential usage of that functionality this helps. |
Fixes issue 68: #68 .
The command to export is: gmvault export -d DB_DIR -t FORMAT OUTPUT_DIR
Currently valid formats are 'maildir' and 'mbox'. The maildir variant is 'Maildir++' as used by dovecote, and indeed the exported maildir can be used as-is as a dovecote backend. The original 'mboxo' variant of mbox is used, with '.sbd' as a suffix for directories as used by Thunderbird.
As mentioned in the issue, the maildir (or mbox) format duplicates emails, once per label. This is clearly not optimal, but there's no way to avoid it. Users should continue to use gmvault's internal format for everyday use, only exporting to maildir when they need it.