Strict, dependency-free BitTorrent bencode encoder and decoder for
Python 3.8+. Round-trips are exact: the decoder rejects every
non-canonical encoding (leading zeros, i-0e, dict keys out of order or
duplicated, trailing data after a value), and the encoder emits dict
keys sorted lexicographically by raw bytes.
pip install bencodeOr from a clone:
pip install -e .from bencode import encode, decode
torrent = {
b"announce": b"http://tracker.example/announce",
b"info": {
b"length": 12345,
b"name": b"file.iso",
b"piece length": 16384,
b"pieces": b"\x00" * 40,
},
}
raw = encode(torrent)
assert decode(raw) == torrentThe four bencode types map to Python types one-to-one:
| Bencode | Python |
|---|---|
i<n>e |
int |
<len>:<bytes> |
bytes |
l<...>e |
list |
d<key><value>...e |
dict[bytes, ...] (keys are bytes) |
Encode a value to canonical bencode bytes.
- Accepts
int,bytes/bytearray/memoryview,list/tuple, anddictwithbyteskeys. - Rejects
bool(would silently encode asint),str,float,None, sets, and any other type withEncodeError. - Dict keys must be
bytes. The encoder sorts them lexicographically by raw bytes — the canonical bencode requirement.
encode(0) # b'i0e'
encode(-7) # b'i-7e'
encode(b"spam") # b'4:spam'
encode([]) # b'le'
encode({}) # b'de'
encode({b"b": 1, b"a": 2}) # b'd1:ai2e1:bi1ee'Decode a single complete bencode value. Raises TrailingDataError if
extra bytes remain after it.
decode(b"i42e") # 42
decode(b"4:spam") # b'spam'
decode(b"l4:spami42ee") # [b'spam', 42]
decode(b"d1:ai1ee") # {b'a': 1}Decode a single value starting at offset and return (value, end),
where end is the position immediately after the consumed value.
Useful for hand-rolling concatenated-value parsers.
data = b"i7e0:"
value, end = decode_partial(data) # (7, 3)
value, end = decode_partial(data, offset=end) # (b'', 5)Yield each bencode value in data until the input is exhausted.
Raises immediately on the first malformed value.
list(iter_decode(b"i1e0:le")) # [1, b'', []]All errors descend from BencodeError, which subclasses ValueError,
so except ValueError will catch them all.
| Class | Raised when |
|---|---|
EncodeError |
encode got an unsupported value or dict-key type. |
DecodeError |
Base class for decode errors. Carries .offset. |
TruncatedError |
Input ended mid-token. |
InvalidIntegerError |
Bad integer body (ie, i01e, i-0e, ...). |
InvalidStringError |
Bad string length prefix or truncated body. |
InvalidDictError |
Non-bytes key, or keys not strictly increasing. |
TrailingDataError |
decode saw bytes after the first complete value. |
EncodeError.path is a tuple describing where in the input structure
the bad value lives, e.g. (b"info", "name", 3) for the fourth element
of value[b"info"]["name"].
Bencode is a canonical format: the same Python value must round-trip to exactly the same bytes, and the same bytes must decode to exactly the same value, on every implementation. Strict validation makes that invariant detectable at decode time:
- Leading zeros (
i01e,02:ab) andi-0eare not legal — they would give two encodings for one number. - Dict keys must be sorted ascending byte-strings — otherwise the same dict can encode in any of n! orders.
boolis rejected on encode becauseTrue == 1, soencode(True)could silently producei1eand round-trip to1— easy to miss.
If you want a forgiving parser, this isn't it.
pip install pytest pytest-cov mypy
PYTHONPATH=src pytest --cov=bencode --cov-branch
mypy --strict src/bencodeThe bundled suite has 106 tests: 100% line + 100% branch coverage on all four source modules.
MIT — see LICENSE.