Python decryption for password-protected OOXML documents
(.xlsx / .docx / .pptx). Sits below the loadfix/python-xlsx,
loadfix/python-docx, loadfix/python-pptx authoring libraries — all
three hand raw bytes here, receive plain inner-ZIP bytes back, and
continue their existing parse path unchanged.
The JavaScript sibling lives at
loadfix/ooxml-crypto-js
with an identical algorithm surface.
All three MS-OFFCRYPTO algorithm families are implemented on the
decrypt side: Agile (Office 2010+), Standard (Office 2007),
and RC4 CryptoAPI (Office 2002/2003). encrypt() is Agile-only
by design — the library decrypts legacy-format files for
backwards-compat, but producing new ones would be a security
regression. Round-trip verified against Office / LibreOffice-produced
fixtures and against msoffcrypto-tool as an external oracle. See
TODO.md.
pip install python-ooxml-cryptofrom pathlib import Path
from ooxml_crypto import decrypt, detect_encryption
data = Path("secret.xlsx").read_bytes()
info = detect_encryption(data)
if info is None:
# Plain ZIP — hand directly to python-xlsx / python-docx / python-pptx.
...
else:
plain = decrypt(data, password="hunter2")
# `plain` is the inner ZIP bytes.from pathlib import Path
from ooxml_crypto import encrypt
plain_zip = Path("report.xlsx").read_bytes() # plain inner ZIP bytes
blob = encrypt(plain_zip, password="hunter22")
Path("report-protected.xlsx").write_bytes(blob)detect_encryption(data) → EncryptionInfo | None— returnsNonefor non-CFB input, or anEncryptionInfo(kind, params)dataclass identifying the algorithm family.decrypt(data, password, options=None) → bytes— returns the plain inner ZIP bytes. RaisesWrongPasswordError,UnsupportedAlgorithmError,IntegrityCheckError,MalformedContainerError, orInputTooLargeError(64 MiB cap).optionsis aDecryptOptions(ignore_integrity=False).encrypt(plaintext, password, options=None) → bytes— returns a CFB container wrapping the plaintext inner ZIP under Agile Encryption. RaisesWeakPasswordErrorif the password is shorter than 8 chars (override viaEncryptOptions(allow_weak_password=True)) orInvalidEncryptOptionsErrorfor out-of-range spin counts.optionsis anEncryptOptions(spin_count=100_000, allow_weak_password=False).
All exceptions derive from OoxmlCryptoError and carry a stable
.code attribute for programmatic dispatch.
| Algorithm | Office era | Decrypt | Encrypt |
|---|---|---|---|
| Agile Encryption | 2010 — today | implemented | implemented |
| Standard Encryption | 2007 | implemented | out of scope |
| RC4 CryptoAPI | 2002 / 2003 | implemented | out of scope |
The encrypt path is deliberately Agile-only — producing new legacy-format encrypted files would be a security regression.
See MS-OFFCRYPTO for the full algorithm spec.
Primitives come from the cryptography
package — no hand-rolled crypto. Passwords and derived keys are never
logged.
Both decrypt() and encrypt() are supported so that the
python-xlsx / python-docx / python-pptx authoring stack can
produce password-protected files. The encrypt path is Agile-only
(no legacy Standard / RC4 encoders) and enforces a minimum password
length to keep weak-crypto footguns out of the happy path.
Full docs are in docs/ and render via Sphinx:
pip install -e '.[docs]'
cd docs && make html
open .build/html/index.htmldocs/user/ covers install, quickstart, recipes, error handling,
integration with the authoring libraries, a full MS-OFFCRYPTO
algorithm walkthrough, and the security model. docs/api/ is the
autodoc reference.
See also:
CHANGELOG.md— per-release change history.SECURITY.md— disclosure policy and threat model.CONTRIBUTING.md— dev setup and algorithm-addition guide.
- python-xlsx — XLSX authoring
- python-docx — DOCX authoring
- python-pptx — PPTX authoring
- ooxml-crypto-js — JS sibling
Apache-2.0.