Skip to content

loadfix/python-ooxml-crypto

Repository files navigation

python-ooxml-crypto

Python decryption for password-protected OOXML documents (.xlsx / .docx / .pptx). Sits below the loadfix/python-xlsx, loadfix/python-docx, loadfix/python-pptx authoring libraries — all three hand raw bytes here, receive plain inner-ZIP bytes back, and continue their existing parse path unchanged.

The JavaScript sibling lives at loadfix/ooxml-crypto-js with an identical algorithm surface.

Status

All three MS-OFFCRYPTO algorithm families are implemented on the decrypt side: Agile (Office 2010+), Standard (Office 2007), and RC4 CryptoAPI (Office 2002/2003). encrypt() is Agile-only by design — the library decrypts legacy-format files for backwards-compat, but producing new ones would be a security regression. Round-trip verified against Office / LibreOffice-produced fixtures and against msoffcrypto-tool as an external oracle. See TODO.md.

Installation

pip install python-ooxml-crypto

Usage

Decrypt

from pathlib import Path
from ooxml_crypto import decrypt, detect_encryption

data = Path("secret.xlsx").read_bytes()
info = detect_encryption(data)
if info is None:
    # Plain ZIP — hand directly to python-xlsx / python-docx / python-pptx.
    ...
else:
    plain = decrypt(data, password="hunter2")
    # `plain` is the inner ZIP bytes.

Encrypt (Agile only)

from pathlib import Path
from ooxml_crypto import encrypt

plain_zip = Path("report.xlsx").read_bytes()  # plain inner ZIP bytes
blob = encrypt(plain_zip, password="hunter22")
Path("report-protected.xlsx").write_bytes(blob)

API

  • detect_encryption(data) → EncryptionInfo | None — returns None for non-CFB input, or an EncryptionInfo(kind, params) dataclass identifying the algorithm family.
  • decrypt(data, password, options=None) → bytes — returns the plain inner ZIP bytes. Raises WrongPasswordError, UnsupportedAlgorithmError, IntegrityCheckError, MalformedContainerError, or InputTooLargeError (64 MiB cap). options is a DecryptOptions(ignore_integrity=False).
  • encrypt(plaintext, password, options=None) → bytes — returns a CFB container wrapping the plaintext inner ZIP under Agile Encryption. Raises WeakPasswordError if the password is shorter than 8 chars (override via EncryptOptions(allow_weak_password=True)) or InvalidEncryptOptionsError for out-of-range spin counts. options is an EncryptOptions(spin_count=100_000, allow_weak_password=False).

All exceptions derive from OoxmlCryptoError and carry a stable .code attribute for programmatic dispatch.

Algorithm coverage

Algorithm Office era Decrypt Encrypt
Agile Encryption 2010 — today implemented implemented
Standard Encryption 2007 implemented out of scope
RC4 CryptoAPI 2002 / 2003 implemented out of scope

The encrypt path is deliberately Agile-only — producing new legacy-format encrypted files would be a security regression.

See MS-OFFCRYPTO for the full algorithm spec.

Security

Primitives come from the cryptography package — no hand-rolled crypto. Passwords and derived keys are never logged.

Both decrypt() and encrypt() are supported so that the python-xlsx / python-docx / python-pptx authoring stack can produce password-protected files. The encrypt path is Agile-only (no legacy Standard / RC4 encoders) and enforces a minimum password length to keep weak-crypto footguns out of the happy path.

Documentation

Full docs are in docs/ and render via Sphinx:

pip install -e '.[docs]'
cd docs && make html
open .build/html/index.html

docs/user/ covers install, quickstart, recipes, error handling, integration with the authoring libraries, a full MS-OFFCRYPTO algorithm walkthrough, and the security model. docs/api/ is the autodoc reference.

See also:

Related projects

Apache-2.0.

About

Decrypt password-protected OOXML (.xlsx / .docx / .pptx) files in Python — Agile / Standard / RC4 CryptoAPI.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages