Skip to content

Data structures: Emails and Mailboxes

Jack Dodds edited this page Jan 6, 2021 · 8 revisions

This page is based on reading code from commit c9bd001 dated 2020-11-11 and files generated by it. There may errors or omissions!

The Mailpile instance on which the page is based receives incoming emails from three IMAP mail sources. It also includes old emails imported one time, when the instance was created, from a Thunderbird Local Folders mbox directory structure. The mail received from all the sources is stored in the local Mailpile directory. The Mailpile code supports the retrieval of email using other types of mail sources and mailboxes (see code at mailpile/mail-source and mailpile/mailboxes). This page does not document these additional types of data structures.

External format - Emails

Email messages are stored in RFC822 format, one message per file, in a directory heirarchy under subdirectory mail of the homedir. Similarly to most Mailpile files, the RFC822 plaintext of each file is optionally encrypted.

A message file name consists of a key, which is a number (10 digit, hex, lower case). The key is sometimes followed by the suffix !2,s. The key is referenced in the message location pointer (field 2) of the Metadata Index record for the message.

When the suffix !2,s is present there may be two files, sometimes but not always identical, with different numbers in their file names, both file names being listed (without suffix) in the same metadata index entry.

External format - Mailboxes

A mailbox is represented by a first level subdirectory of subdirectory mail in the Mailpile homedir. The subdirectory name consists of a number (5 digits, hex, lower case). Each first level subdirectory in turn contain subdirectories cur, new and tmp, which contain the email message files.

The first level subdirectories also contain a file wervd.ver which in this version of the software always contains "0" (see mailpile/mailboxes/wervd.pyc and wiki page WERVD Storage. Also, mail contains first level subdirectories cur, new and tmp which appear to be unused.

Mailboxes are defined by entries in mailpile.cfg. Each account has a 12 digit identifier. Each mailbox in the account has an entry config/sources/[account id]/mailbox/[mailbox id]. The [mailbox id] is 4 characters from the set (0..9,a..z) and is used to identify the mail box in the message location pointer (field 2) of the metadata index. The mailpile.cfg entry contains a parameter local which is the virtual file system path /Mailpile$/mail/[subdirectory] to the mailbox in the subdirectory mail of the homedir; a parameter name which is the user's name for the mailbox; and a parameter path. The path is a reference to another mailpile.cfg entry that identifies a mail source (e.g. for mailboxes receiving emails from an IMAP server) or a file path from which the mailbox was imported (e.g. in the case of a Thunderbird mbox file).

Mailboxes may also be represented by pickled data structures in files in the homedir with file names pickled-mailbox.[mailbox id]. The first line of these files indicate that the corresponding internal data structures are of class mailpile.mailboxes.wervd.MailpileMailbox. Attribute _toc of the pickled object contains a dictionary (using the email keys as in the Metadata Index) giving, for each email in the mailbox, its file path relative to the mailbox's subdirectory. Attribute _path of the pickled object contains the absolute path to the mail subdirectory associated with the mailbox and attribute _paths contains absolute paths to its cur, new and tmp subdirectories. The paths in the _path and _paths attributes are absolute paths corresponding to relative virtual file system paths defined in mailpile.cfg, in attribute local of config/sources/[account id]/mailbox/[mailbox id].

Internal format - Mailboxes

There are multiple classes called MailpileMailbox. Some relate to POP3 mail sources (not documented here) or sources identified as "obsolete, handled as local" in comments in defaults.py. Class mailboxes.maildir.MailpileMailbox is used only relative to these POP3 or obsolete mail sources.

This leaves class mailboxes.wervd.MailpileMailbox which is derived from the Python library class mailbox.Maildir using the factory class mailpile.mailboxes.UnorderedPicklable and is the class of the objects in the pickled-mailbox files described above.

Clone this wiki locally