# Chapter 34: MIME Types

This notebook covers MIME (Multipurpose Internet Mail Extensions) types and the `mimetypes` module. You will learn how to detect file types from extensions, look up extensions from MIME types, understand MIME structure, and build multipart email messages with proper content types.

## Key Concepts
- **MIME types**: Standardized identifiers for content types (e.g., `text/plain`, `image/png`)
- **`mimetypes.guess_type()`**: Detect a file's MIME type from its name or path
- **`mimetypes.guess_extension()`**: Find a file extension for a given MIME type
- **Multipart messages**: Emails composed of multiple parts with different content types
- **MIME structure**: The `maintype/subtype` format and how it organizes content

## Section 1: Understanding MIME Type Structure

A MIME type consists of a **maintype** and a **subtype** separated by a slash. Common maintypes include `text`, `image`, `application`, `audio`, and `video`.

In [None]:
# MIME types follow the pattern: maintype/subtype
mime_examples: list[str] = [
    "text/plain",
    "text/html",
    "image/png",
    "image/jpeg",
    "application/json",
    "application/pdf",
    "audio/mpeg",
    "video/mp4",
]

print(f"{'MIME Type':<25} {'Main Type':<15} {'Sub Type'}")
print("-" * 55)
for mime in mime_examples:
    maintype, subtype = mime.split("/")
    print(f"{mime:<25} {maintype:<15} {subtype}")

## Section 2: Guessing MIME Types from Filenames

The `mimetypes.guess_type()` function returns a tuple of `(type, encoding)` based on a file's name or path. The encoding is typically `None` unless the file uses a known compression.

In [None]:
import mimetypes

# Guess type from filename
mime_pdf, enc_pdf = mimetypes.guess_type("document.pdf")
print(f"document.pdf  -> type={mime_pdf}, encoding={enc_pdf}")
assert mime_pdf == "application/pdf"

mime_png, enc_png = mimetypes.guess_type("image.png")
print(f"image.png     -> type={mime_png}, encoding={enc_png}")
assert mime_png == "image/png"

mime_py, enc_py = mimetypes.guess_type("script.py")
print(f"script.py     -> type={mime_py}, encoding={enc_py}")
assert mime_py == "text/x-python"

print("\nAll assertions passed.")

In [None]:
import mimetypes

# More file types
filenames: list[str] = [
    "report.docx",
    "data.csv",
    "config.yaml",
    "archive.tar.gz",
    "song.mp3",
    "movie.mp4",
    "style.css",
    "app.js",
]

print(f"{'Filename':<20} {'MIME Type':<40} {'Encoding'}")
print("-" * 70)
for name in filenames:
    mime, encoding = mimetypes.guess_type(name)
    print(f"{name:<20} {str(mime):<40} {encoding}")

## Section 3: Common MIME Types

Certain MIME types appear frequently in web and email contexts. The `mimetypes` module recognizes all well-known types.

In [None]:
import mimetypes

# Verify common MIME types
assert mimetypes.guess_type("file.json")[0] == "application/json"
assert mimetypes.guess_type("file.csv")[0] == "text/csv"
assert mimetypes.guess_type("file.txt")[0] == "text/plain"

# Additional common types
common: dict[str, str | None] = {
    "file.html": mimetypes.guess_type("file.html")[0],
    "file.xml": mimetypes.guess_type("file.xml")[0],
    "file.zip": mimetypes.guess_type("file.zip")[0],
    "file.gif": mimetypes.guess_type("file.gif")[0],
    "file.svg": mimetypes.guess_type("file.svg")[0],
}

for filename, mime in common.items():
    print(f"{filename:<15} -> {mime}")

print("\nAll assertions passed.")

## Section 4: Guessing Extensions from MIME Types

The reverse operation: given a MIME type string, find an appropriate file extension.

In [None]:
import mimetypes

# Guess file extension from MIME type
ext_html: str | None = mimetypes.guess_extension("text/html")
print(f"text/html           -> {ext_html}")
assert ext_html in (".html", ".htm")

ext_json: str | None = mimetypes.guess_extension("application/json")
print(f"application/json    -> {ext_json}")

ext_png: str | None = mimetypes.guess_extension("image/png")
print(f"image/png           -> {ext_png}")

ext_pdf: str | None = mimetypes.guess_extension("application/pdf")
print(f"application/pdf     -> {ext_pdf}")

# Unknown MIME type returns None
ext_unknown: str | None = mimetypes.guess_extension("application/x-unknown-type")
print(f"\napplication/x-unknown-type -> {ext_unknown}")

print("\nAll assertions passed.")

In [None]:
import mimetypes

# guess_all_extensions returns all known extensions for a type
all_html: list[str] = mimetypes.guess_all_extensions("text/html")
print(f"All extensions for text/html: {all_html}")

all_jpeg: list[str] = mimetypes.guess_all_extensions("image/jpeg")
print(f"All extensions for image/jpeg: {all_jpeg}")

## Section 5: The MimeTypes Class

For more control, you can create a `mimetypes.MimeTypes` instance. This lets you add custom type mappings without modifying the global state.

In [None]:
import mimetypes

# Create a custom MimeTypes instance
mime_db: mimetypes.MimeTypes = mimetypes.MimeTypes()

# Standard lookups work the same
result: tuple[str | None, str | None] = mime_db.guess_type("report.pdf")
print(f"report.pdf -> {result[0]}")

# Add a custom extension mapping
mime_db.add_type("application/x-custom", ".custom")
custom_result: tuple[str | None, str | None] = mime_db.guess_type("data.custom")
print(f"data.custom -> {custom_result[0]}")

# The global function does not know about our custom type
global_result: tuple[str | None, str | None] = mimetypes.guess_type("data.custom")
print(f"\nGlobal guess for data.custom: {global_result[0]}")

## Section 6: Multipart Email Messages

Multipart MIME messages contain multiple sections, each with its own content type. This is how emails carry both a text body and attachments.

In [None]:
from email.message import EmailMessage

# Build a multipart message
msg: EmailMessage = EmailMessage()
msg["Subject"] = "Report with Data"
msg["From"] = "analyst@example.com"
msg["To"] = "team@example.com"
msg.set_content("Please find the data attached.")

# Before attachment: not multipart
print(f"Before attachment:")
print(f"  Content type: {msg.get_content_type()}")
print(f"  Is multipart: {msg.is_multipart()}")

# Add a CSV attachment
csv_data: str = "name,value\nalpha,1\nbeta,2\n"
msg.add_attachment(
    csv_data,
    subtype="csv",
    filename="data.csv",
)

# After attachment: multipart/mixed
print(f"\nAfter attachment:")
print(f"  Content type: {msg.get_content_type()}")
print(f"  Is multipart: {msg.is_multipart()}")

In [None]:
from email.message import EmailMessage

# Inspecting multipart structure
msg: EmailMessage = EmailMessage()
msg["Subject"] = "Multi-attachment"
msg.set_content("Body text here.")

# Add various attachment types
msg.add_attachment(b"binary data", maintype="application", subtype="octet-stream", filename="file.bin")
msg.add_attachment("log output here", subtype="plain", filename="output.log")
msg.add_attachment(b"\x89PNG\r\n", maintype="image", subtype="png", filename="chart.png")

print(f"Top-level type: {msg.get_content_type()}")
print(f"Number of parts: {len(list(msg.iter_parts()))}")
print()

for i, part in enumerate(msg.iter_parts()):
    ct: str = part.get_content_type()
    fn: str | None = part.get_filename()
    disp: str | None = part.get_content_disposition()
    print(f"Part {i}: type={ct}, filename={fn}, disposition={disp}")

## Section 7: MIME Types in Practice

Combining `mimetypes` detection with `EmailMessage` to automatically determine the correct MIME type for file attachments.

In [None]:
import mimetypes
from email.message import EmailMessage


def attach_file(
    msg: EmailMessage,
    filename: str,
    data: bytes,
) -> None:
    """Attach data to a message, auto-detecting the MIME type from filename."""
    mime_type: str | None
    encoding: str | None
    mime_type, encoding = mimetypes.guess_type(filename)

    if mime_type is None:
        maintype: str = "application"
        subtype: str = "octet-stream"
    else:
        maintype, subtype = mime_type.split("/")

    msg.add_attachment(
        data,
        maintype=maintype,
        subtype=subtype,
        filename=filename,
    )


# Build a message with auto-detected types
msg: EmailMessage = EmailMessage()
msg.set_content("Files attached.")

attach_file(msg, "photo.jpg", b"\xff\xd8\xff\xe0")
attach_file(msg, "data.json", b'{"key": "value"}')
attach_file(msg, "mystery.xyz", b"unknown format")

for part in msg.iter_attachments():
    print(f"{part.get_filename():<15} -> {part.get_content_type()}")

In [None]:
import mimetypes


def categorize_mime(filename: str) -> str:
    """Categorize a file by its MIME maintype."""
    mime: str | None = mimetypes.guess_type(filename)[0]
    if mime is None:
        return "unknown"
    maintype: str = mime.split("/")[0]
    categories: dict[str, str] = {
        "text": "document",
        "image": "media",
        "audio": "media",
        "video": "media",
        "application": "binary",
    }
    return categories.get(maintype, "other")


files: list[str] = ["readme.txt", "logo.png", "song.mp3", "app.exe", "data.xyz"]
for f in files:
    print(f"{f:<15} -> {categorize_mime(f)}")

## Summary

### MIME Type Structure
- Format: **`maintype/subtype`** (e.g., `text/plain`, `image/png`, `application/json`)
- Common maintypes: `text`, `image`, `audio`, `video`, `application`, `multipart`

### The `mimetypes` Module
- **`mimetypes.guess_type(filename)`**: Returns `(type, encoding)` tuple from a filename
- **`mimetypes.guess_extension(mime_type)`**: Returns a file extension for a MIME type
- **`mimetypes.guess_all_extensions(mime_type)`**: Returns all known extensions for a type
- **`mimetypes.MimeTypes()`**: Create a custom instance with isolated type mappings

### Multipart Messages
- Adding an attachment converts a message to **`multipart/mixed`**
- **`msg.iter_parts()`**: Iterate over all parts (body + attachments)
- **`msg.iter_attachments()`**: Iterate over attachment parts only
- **`part.get_content_type()`**: Get the MIME type of a specific part
- **`part.get_content_disposition()`**: Returns `'attachment'` or `'inline'`

### Important Notes
- `guess_type()` works on filenames and paths -- it does **not** read file contents
- Returns `(None, None)` for unrecognized extensions
- Use `guess_type()` with email attachment helpers to auto-detect content types
- The module reads from system MIME type databases on startup