Skip to content

planetaryescape/mail-threading

Repository files navigation

mail-threading

mail-threading reconstructs email conversation threads from parsed RFC 5322 message metadata. It is for mail clients, archives, importers, support tools, and mailbox analysis jobs that need local threading when a provider does not offer trustworthy native thread IDs.

The crate implements the client-side THREAD=REFERENCES behavior from RFC 5256 and follows Jamie Zawinski's threading algorithm lineage. It returns flat thread membership for application code, not an IMAP wire-protocol THREAD response.

Install

[dependencies]
mail-threading = "0.1"

Enable serde when public input/output types need to cross a JSON, IPC, or storage boundary:

[dependencies]
mail-threading = { version = "0.1", features = ["serde"] }

Why this exists

Rust has mature email parsers and transport clients, but it has not had a small maintained crate focused on local RFC 5256/JWZ thread reconstruction. Projects that need this behavior have typically had to embed an app-specific implementation.

mail-threading is extracted from mxr, where local threading is required for providers without native thread IDs. The shared JSON conformance corpus is part of the package so other implementations, including a future TypeScript port, can test against the same behavior.

Scope

Callers pass already-parsed message fields:

  • caller-stable message ID
  • optional RFC 5322 Message-ID
  • optional In-Reply-To
  • ordered References
  • message date
  • subject

The crate does not parse raw email, MIME, encoded words, IMAP commands, or provider API payloads. Use a parser such as mail-parser first, then pass the decoded metadata here.

Provider-native thread IDs should still win when they are available and trustworthy. This crate is for local reconstruction and fallback behavior.

Quick example

use chrono::{TimeZone, Utc};
use mail_threading::{thread_messages, Message};

let messages = vec![
    Message {
        id: "root".to_string(),
        message_id: Some("<root@example>".to_string()),
        in_reply_to: None,
        references: vec![],
        date: Utc.with_ymd_and_hms(2026, 5, 15, 9, 0, 0).unwrap(),
        subject: "Hello".to_string(),
    },
    Message {
        id: "reply".to_string(),
        message_id: Some("<reply@example>".to_string()),
        in_reply_to: Some("<root@example>".to_string()),
        references: vec!["<root@example>".to_string()],
        date: Utc.with_ymd_and_hms(2026, 5, 15, 9, 5, 0).unwrap(),
        subject: "Re: Hello".to_string(),
    },
];

let threads = thread_messages(&messages);

assert_eq!(threads.len(), 1);
assert_eq!(threads[0].root_message_id, "root");
assert_eq!(threads[0].messages, vec!["root", "reply"]);

API

Most callers need one of these two functions:

  • thread_messages(&[Message]) -> Vec<Thread>
  • thread_messages_with(&[Message], &ThreadingOptions) -> Vec<Thread>

The first uses defaults. The second lets you configure subject fallback, phantom pruning, and recognized subject prefixes.

Output uses Message::id. The RFC Message-ID header is used for ancestry matching, but it is not treated as the application's only message identity. That matters in real mailboxes, where Message-ID can be missing or duplicated.

Flexibility

The crate gives you the main policy knobs without making you fork it:

  • Set subject_merge: false if you only want header-based threading.
  • Leave prune_phantoms: true if you want compact public threads.
  • Set prune_phantoms: false if you need to surface missing ancestors.
  • Add or replace subject_prefixes if your mailflow uses different reply markers.

If you want something more opinionated than the default behavior, build on top of the returned threads. The output is flat on purpose, so you can keep using this crate for the hard part and then group, label, or reshape the threads in your own application code.

Missing ancestors

Referenced messages are often outside the local mailbox or current query window. The algorithm creates phantom containers for those missing ancestors. By default phantoms are pruned from public output, so the first visible message becomes the public thread root.

use chrono::{TimeZone, Utc};
use mail_threading::{thread_messages, Message};

let messages = vec![Message {
    id: "reply".to_string(),
    message_id: Some("<reply@example>".to_string()),
    in_reply_to: Some("<root@example>".to_string()),
    references: vec!["<root@example>".to_string()],
    date: Utc.with_ymd_and_hms(2026, 5, 15, 9, 5, 0).unwrap(),
    subject: "Re: Hello".to_string(),
}];

let threads = thread_messages(&messages);

assert_eq!(threads[0].root_message_id, "reply");
assert_eq!(threads[0].messages, vec!["reply"]);

Set ThreadingOptions::prune_phantoms to false when you need to inspect missing ancestors explicitly.

Subject fallback

Some clients strip References and In-Reply-To. With default options, headerless messages with the same normalized subject are merged. This is a practical fallback, not a guarantee: subject-only threading can over-merge unrelated conversations.

use chrono::{TimeZone, Utc};
use mail_threading::{thread_messages_with, Message, ThreadingOptions};

let messages = vec![
    Message {
        id: "a".to_string(),
        message_id: Some("<a@example>".to_string()),
        in_reply_to: None,
        references: vec![],
        date: Utc.with_ymd_and_hms(2026, 5, 15, 9, 0, 0).unwrap(),
        subject: "Lunch".to_string(),
    },
    Message {
        id: "b".to_string(),
        message_id: Some("<b@example>".to_string()),
        in_reply_to: None,
        references: vec![],
        date: Utc.with_ymd_and_hms(2026, 5, 15, 9, 5, 0).unwrap(),
        subject: "Re: Lunch".to_string(),
    },
];

let merged = thread_messages_with(&messages, &ThreadingOptions::default());
assert_eq!(merged.len(), 1);

let strict = thread_messages_with(
    &messages,
    &ThreadingOptions {
        subject_merge: false,
        ..ThreadingOptions::default()
    },
);
assert_eq!(strict.len(), 2);

Default subject normalization handles common RFC 5256 artifacts such as Re:, Fwd:, [Fwd: ...], (fwd), subject blobs/list tags, repeated whitespace, and several localized reply prefixes. Custom prefixes can be supplied through ThreadingOptions::subject_prefixes.

In real life

Thread a mailbox import

use chrono::{TimeZone, Utc};
use mail_threading::{thread_messages, Message};

let messages = vec![
    Message {
        id: "root".to_string(),
        message_id: Some("<root@example>".to_string()),
        in_reply_to: None,
        references: vec![],
        date: Utc.with_ymd_and_hms(2026, 5, 15, 9, 0, 0).unwrap(),
        subject: "Hello".to_string(),
    },
    Message {
        id: "reply".to_string(),
        message_id: Some("<reply@example>".to_string()),
        in_reply_to: Some("<root@example>".to_string()),
        references: vec!["<root@example>".to_string()],
        date: Utc.with_ymd_and_hms(2026, 5, 15, 9, 5, 0).unwrap(),
        subject: "Re: Hello".to_string(),
    },
];
let threads = thread_messages(&messages);

What you get: a flat list of conversation threads you can store, index, or group in a UI.

Keep fallback behavior but turn off subject merging

use chrono::{TimeZone, Utc};
use mail_threading::{thread_messages_with, Message, ThreadingOptions};

let messages = vec![
    Message {
        id: "a".to_string(),
        message_id: Some("<a@example>".to_string()),
        in_reply_to: None,
        references: vec![],
        date: Utc.with_ymd_and_hms(2026, 5, 15, 9, 0, 0).unwrap(),
        subject: "Lunch".to_string(),
    },
    Message {
        id: "b".to_string(),
        message_id: Some("<b@example>".to_string()),
        in_reply_to: None,
        references: vec![],
        date: Utc.with_ymd_and_hms(2026, 5, 15, 9, 5, 0).unwrap(),
        subject: "Re: Lunch".to_string(),
    },
];
let threads = thread_messages_with(
    &messages,
    &ThreadingOptions {
        subject_merge: false,
        ..ThreadingOptions::default()
    },
);

What you get: threads built only from Message-ID, References, and In-Reply-To, without subject-only merges.

Check the published corpus

cargo test -p mail-threading --all-features --tests

What you get: the crate's conformance suite, including the shared JSON corpus and RFC coverage matrix checks.

Message-ID policy

Every input message has a caller-stable id and an optional RFC 5322 Message-ID. Thread output returns caller IDs.

RFC 5256 says the first message with a duplicated Message-ID keeps that ID for threading, while later duplicates are assigned unique IDs. mail-threading does the same internally; duplicate messages remain visible in output under their caller IDs.

Missing or invalid Message-ID headers are assigned internal synthetic identities. They can still participate in subject fallback and can still appear as normal output messages.

Conformance

The conformance suite lives in testdata/conformance and ships with the crate. Fixtures are JSON, so another implementation can run the same cases without translating them first.

The RFC coverage matrix lives in testdata/rfc5256-coverage.md. It maps covered RFC 5256 behavior to fixture IDs and labels partial, out-of-scope, and intentionally divergent behavior.

Passing the corpus means the implementation matches the covered RFC 5256/JWZ behaviors. It does not claim to test raw email parsing, provider APIs, IMAP server behavior, or every possible malformed input.

Some choices are deliberate. The crate keeps flat output instead of IMAP response trees, and it avoids forcing same-subject header-backed roots to merge when that would create false positives. The exact cases are documented in the coverage matrix.

Non-goals

mail-threading does not implement:

  • IMAP SORT
  • IMAP command parsing
  • IMAP THREAD=ORDEREDSUBJECT
  • nested IMAP THREAD response formatting
  • raw RFC 5322 or MIME parsing
  • RFC 2047 encoded-word decoding
  • full i;unicode-casemap collation

Those belong in surrounding parser, IMAP, or application layers.

Complexity

Thread construction is hash-map based and intended to be linear in the number of messages plus references, aside from deterministic sorting of output threads and members.

Feature flags

  • serde: derives Serialize and Deserialize for public input and output types.

Minimum supported Rust version

The current MSRV is Rust 1.88.

Versioning

The crate follows semantic versioning. Public API changes, MSRV bumps, default option changes, and conformance-output changes are semver-significant.

See also

About

Spec-aligned client-side email threading using RFC 5256/JWZ references.

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages