Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google takeout support #24

Closed
vzaliva opened this issue Dec 24, 2020 · 6 comments
Closed

google takeout support #24

vzaliva opened this issue Dec 24, 2020 · 6 comments

Comments

@vzaliva
Copy link

vzaliva commented Dec 24, 2020

I want to use this script to import mbox file created via Google Takeout service. They export all mail as a single mbox file. However, each message has an additional header specifying to what folder the message belongs to. E.g:

X-Gmail-Labels: Archived,Sent

It would be great if this script can handle these.

@rgladwell
Copy link
Owner

This would be a useful enhancement, possibly behind a CLI parameter flag.

Unfortunately, I don't have much time to dedicate to this project at the moment. If you need it sooner and have time yourself, I'd be happy to give advice and review code.

@adriangibanelbtactic
Copy link
Contributor

adriangibanelbtactic commented Mar 24, 2022

In order to avoid work overlap you should know that I am working on adding this feature in our branch: https://github.com/btactic/imap-upload/tree/google_takeout .

The current implemented functionality saves the emails in the first (or last) label/folder. Not in everyone of them as you might want to.

Note: I ported and adapted functionality from https://github.com/ldidry/gmail-mbox-to-imap which was based on an older version of imap-upload.

@rgladwell
Copy link
Owner

Fantastic! Feel free to raise a draft PR here. I'd love to see what you're doing.

adriangibanelbtactic added a commit to btactic/imap-upload that referenced this issue Mar 25, 2022
* Categories labels are ignored.
* Special IMAP_ folders are ignored.
* Extra: Escape surrogate characters from messages.
* Use Imap UTF-7 encoding for saving Imap folders.

Defaults:
* Open label is ignored
* Unseen label is ignored
* Inbox label is ignored if email is in other labels.
* Sent label is ignored if email is in other labels.

* Messages not having an 'Unseen' label have their
  flag set to: 'Seen'.

* Folders from 'Important' label are not imported to
  'Important' folder. Instead their flag is set to:
  'Flagged'.

* Messages are uploaded to every one of their
  label equivalent folders except for former rules.

* Basic multi language support

Closes rgladwell#24
adriangibanelbtactic added a commit to btactic/imap-upload that referenced this issue Mar 25, 2022
* Categories labels are ignored.
* Special IMAP_ folders are ignored.
* Extra: Escape surrogate characters from messages.
* Use Imap UTF-7 encoding for saving Imap folders.

Defaults:
* Open label is ignored
* Unseen label is ignored
* Inbox label is ignored if email is in other labels.
* Sent label is ignored if email is in other labels.

* Messages not having an 'Unseen' label have their
  flag set to: 'Seen'.

* Folders from 'Important' label are not imported to
  'Important' folder. Instead their flag is set to:
  'Flagged'.

* Messages are uploaded to every one of their
  label equivalent folders except for former rules.

* Multi language support

* Use 'box' folder as the base folder when uploading emails.

Closes rgladwell#24
adriangibanelbtactic added a commit to btactic/imap-upload that referenced this issue Mar 25, 2022
* Categories labels are ignored.
* Special IMAP_ folders are ignored.
* Extra: Escape surrogate characters from messages.
* Use Imap UTF-7 encoding for saving Imap folders.

Defaults:
* Open label is ignored
* Unseen label is ignored
* Inbox label is ignored if email is in other labels.
* Sent label is ignored if email is in other labels.

* Messages not having an 'Unseen' label have their
  flag set to: 'Seen'.

* Folders from 'Important' label are not imported to
  'Important' folder. Instead their flag is set to:
  'Flagged'.

* Messages are uploaded to every one of their
  label equivalent folders except for former rules.

* Multi language support

* Use 'box' folder as the base folder when uploading emails.

Closes rgladwell#24

Please note that this new feature adds a new requirement: imapclient python module.
@adriangibanelbtactic
Copy link
Contributor

My last pull request brings Google Takeout functionaly: #35 .
Waiting for your feedback there.

Thank you.

@rgladwell
Copy link
Owner

Thanks for fixing this. Merged!

@adriangibanelbtactic
Copy link
Contributor

Thanks for fixing this. Merged!

Thanks for your merge.

  • Feel free to rewrite this part:

    imap-upload/imap_upload.py

    Lines 300 to 318 in 9273178

    labels_without_categories = []
    for i in range(len(labels)):
    if (not (re.match(gmail_category_str,labels[i]))):
    labels_without_categories.append(labels[i])
    labels = labels_without_categories
    labels_without_special_imap_dirs = []
    for i in range(len(labels)):
    if (not (re.match(gmail_imap_str,labels[i]))):
    labels_without_special_imap_dirs.append(labels[i])
    labels = labels_without_special_imap_dirs
    sanitized_labels = []
    for i in range(len(labels)):
    sanitized_label = re.sub(r":", "_", labels[i])
    sanitized_labels.append(sanitized_label)
    labels = sanitized_labels
    in a more pythonic way (or without auxiliary list variables) if you think it's needed.

  • Also this function:

    imap-upload/imap_upload.py

    Lines 590 to 600 in 9273178

    def create_folders(self, boxes):
    i = 1
    while i <= len(boxes):
    google_takeout_box = "/".join(boxes[0:i])
    google_takeout_box_imap_command = '"' + google_takeout_box + '"'
    if google_takeout_box != "INBOX":
    try:
    self.imap.create(imap_utf7.encode(google_takeout_box_imap_command))
    except:
    print ("Cannot create box %s" % google_takeout_box)
    i += 1
    should be renamed from def create_folders(self, boxes): to def create_folder(self, boxpath):.

It actually doesn't deal with several folders but with one of them which can have parents in it.
And, well, you might end renaming it to create_box because what I understand as 'imap folder' you call it 'imap box' because I guess it's a more traditional way of naming it.

  • This other part is not too pythonic either:

    imap-upload/imap_upload.py

    Lines 356 to 358 in 9273178

    for i in range(len(labels)):
    box = re.sub(r"\?", "", labels[i])
    msg.boxes.append(box.split("/"))
    .

What I mean is that even if my code is functional it can be improved a lot by python experts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants