Skip to content

Fix OOM on large backup restores#58

Merged
FernandoCelmer merged 2 commits into
developfrom
feature/31
Apr 16, 2026
Merged

Fix OOM on large backup restores#58
FernandoCelmer merged 2 commits into
developfrom
feature/31

Conversation

@FernandoCelmer
Copy link
Copy Markdown
Member

Summary

  • Refactored Restore.orchestrate() to collect only message IDs grouped by mailbox instead of holding all RawSerializer objects (with full RFC-822 content) in memory
  • restore_mailbox() now receives message_ids + storage and loads each message on-demand from storage, keeping at most one email body in memory per iteration
  • Explicit del raw after each use to help the garbage collector reclaim memory promptly

Closes #31

Test plan

  • All 149 existing tests pass (pytest tests/ -x -q)
  • Manual test with a large backup (>10 GB) to confirm memory stays bounded
  • Verify restore still correctly skips duplicates and creates missing mailboxes

Copy link
Copy Markdown
Member Author

@FernandoCelmer FernandoCelmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ” Code Review

Code issues found: 1

See inline comment below.

box_name: str, message_ids: list[str], retries: int = 3
) -> int:
task = None

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion]

Problem β€” The signature of restore_mailbox() changed from (session, box_name, raws, skip_duplicates) to (session, box_name, message_ids, storage, skip_duplicates). The new storage parameter was inserted before skip_duplicates, shifting positional arguments.

Failure scenario β€” Any external caller using positional args:

# Old API:
Restore.restore_mailbox(session, "INBOX", raws, True)
#                                              ^^^^ skip_duplicates=True

# New API β€” same call now passes True as storage:
Restore.restore_mailbox(session, "INBOX", ids, True)
#                                             ^^^^ storage=True (wrong!)
# skip_duplicates defaults to True, so no TypeError
# But True.get(mid) -> AttributeError at runtime

Since restore_mailbox() is a public @staticmethod (no leading underscore), this is a breaking change to the API surface.

Fix β€” Make storage keyword-only to prevent positional misuse:

@staticmethod
def restore_mailbox(
    session: ImapClient,
    box_name: str,
    message_ids: list[str],
    *,  # keyword-only after this
    storage: StorageABC,
    skip_duplicates: bool = True,
) -> dict[str, int]:

@FernandoCelmer FernandoCelmer merged commit d8ac9df into develop Apr 16, 2026
11 checks passed
@FernandoCelmer FernandoCelmer deleted the feature/31 branch April 16, 2026 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant