A serde-based binary serialization format that preserves lexicographic ordering. When two values satisfy a < b, their serialized byte representations satisfy bytes(a) < bytes(b) under standard byte-wise comparison.
Preserving lexicographic ordering is useful when storing keys in ordered key-value stores and databases (LSM trees, B-trees, etc.) where sort order must be maintained at the byte level.
Note that lexcode does not support preserving the ordering of explicit implementations of Ord or PartialOrd. The ordering lexcode will use for e.g. a struct is to first compare the first field, then the second field etc., regardless of any Ord and PartialOrd implementation.
Serialize and deserialize any serde-compatible type:
use serde::{Serialize, Deserialize};
#[derive(Debug, PartialEq, Serialize, Deserialize)]
struct Record {
category: u32,
name: String,
score: i64,
}
let record = Record { category: 1, name: "alice".into(), score: -42 };
// Serialize
let bytes = lexcode::to_bytes(&record).unwrap();
// Deserialize
let decoded: Record = lexcode::from_bytes(&bytes).unwrap();
assert_eq!(record, decoded);The encoded bytes preserve ordering — records sort first by category, then by name, then by score, matching the struct field order:
let a = lexcode::to_bytes(&Record { category: 1, name: "alice".into(), score: 10 }).unwrap();
let b = lexcode::to_bytes(&Record { category: 1, name: "bob".into(), score: 5 }).unwrap();
let c = lexcode::to_bytes(&Record { category: 2, name: "alice".into(), score: 99 }).unwrap();
assert!(a < b); // same category, "alice" < "bob"
assert!(b < c); // category 1 < category 2| Type | Encoding |
|---|---|
bool |
1 byte (0x00 / 0x01) |
u8, u16, u32, u64, u128 |
Variable-length unsigned varint |
i8, i16, i32, i64, i128 |
Variable-length signed varint (sign bit + magnitude) |
f32, f64 |
IEEE 754 with sign-aware bit flipping (fixed-width) |
char |
Variable-length unsigned varint of code point |
String / &str |
Sentinel-escaped with 0x00 terminator |
&[u8] |
Sentinel-escaped with 0x7F terminator |
FixedBytes<N> |
Raw bytes, exactly N bytes (zero overhead) |
Option<T> |
0x00 for None, 0x01 + value for Some |
Vec<T>, sequences |
0x01 + element per entry, 0x00 terminator |
| Maps | 0x01 + key + value per entry, 0x00 terminator |
| Tuples, structs | Fields concatenated in order (fixed-length) |
| Enums | Varint variant index + variant data |
(), unit structs |
Zero bytes |
Integers use a variable-length encoding that preserves lexicographic ordering while compressing small values into fewer bytes.
The number of leading 1-bits in the first byte (and optionally a second header byte for large values) determines how many additional data bytes follow. A 0-bit terminates the unary prefix. The remaining header bits plus the extra bytes store the value in big-endian.
| First byte pattern | Total bytes | Data bits | Value range |
|---|---|---|---|
0xxxxxxx |
1 | 7 | 0 – 127 |
10xxxxxx + 1 byte |
2 | 14 | 128 – 16,511 |
110xxxxx + 2 bytes |
3 | 21 | 16,512 – 2,113,663 |
1110xxxx + 3 bytes |
4 | 28 | … |
11110xxx + 4 bytes |
5 | 35 | |
111110xx + 5 bytes |
6 | 42 | |
1111110x + 6 bytes |
7 | 49 | |
11111110 + 7 bytes |
8 | 56 |
When the first byte is 0xFF, a second header byte extends the scheme in the same way, supporting up to 18 total bytes for the full u128 range.
Within each level, values are stored as an offset from the level's base, ensuring that all values at a shorter encoding are strictly less than all values at a longer encoding.
Bit 7 of the first byte is the sign bit (1 = non-negative, 0 = negative). The remaining 7 bits begin the same unary-prefix scheme for the magnitude:
- Non-negative values encode the magnitude directly after the sign bit.
- Negative values encode
|v| − 1as the magnitude, then bitwise-complement all bytes (except the sign bit stays 0). This reverses the ordering so that more-negative values produce lexicographically smaller byte sequences.
Small values near zero (both positive and negative) are encoded compactly in 1 byte. For example, values −64 to 63 fit in a single byte.
IEEE 754 floats use sign-aware bit manipulation: positive floats have their sign bit flipped; negative floats have all bits flipped. This produces a total ordering over all non-NaN values.
Variable-length types use sentinel encoding to allow unambiguous termination without length prefixes. Each occurrence of the sentinel byte within the data is escaped by doubling it (sentinel, 0x01), and the sequence is terminated with sentinel, 0x00.
Variable-length collections prefix each element with 0x01 and end with 0x00. This preserves element-wise lexicographic comparison.
For encoding a known number of bytes without encoding overhead, lexcode provides the type
FixedBytes<N> — the output is exactly N bytes, identical to the input array. This is ideal for hashes, UUIDs, and other fixed-size binary data where space efficiency matters.
Unlike [u8; N] (which varint-encodes each byte, averaging ~1.5× the size) or &[u8] / Vec<u8> (which uses sentinel encoding with a 2-byte terminator), FixedBytes<N> has zero overhead.
use lexcode::FixedBytes;
#[derive(serde::Serialize, serde::Deserialize)]
struct Record {
id: u64,
hash: FixedBytes<32>, // exactly 32 bytes in output
}The test suite uses proptest for property-based testing, verifying both roundtrip correctness and order preservation across all supported types.
cargo testBecause integers use a variable-length encoding that always promotes to u128/i128 internally, the same numeric value produces identical bytes regardless of the Rust integer width. For example, 42u8, 42u16, and 42u64 all serialize to the same bytes. This makes several type changes forward- and backward-compatible.
Integer widening (e.g. u8 → u32, i16 → i64): Always safe. Old data decodes correctly into the wider type since the byte encoding is identical.
Integer narrowing (e.g. u32 → u16, i64 → i32): Safe if all stored values fit in the narrower type. Out-of-range values produce an overflow error on deserialization rather than silent truncation.
char ↔ unsigned integers: char uses the same unsigned varint encoding as u8–u128, so these types are interchangeable. char → u32/u64/u128 always works. u32 → char works if the value is a valid Unicode code point.
Newtype struct wrapping/unwrapping: struct Wrapper(T) serializes identically to T, so adding or removing a newtype wrapper is always safe.
Widening a field inside a struct or container: Since field encodings are width-independent, changing a struct field from u8 to u32 (or an element type from i16 to i64, etc.) is safe — old data decodes correctly.
Appending enum variants: Adding new variants at the end of an enum is safe for reading old data, since existing variant indices are unchanged.
Unsigned ↔ signed: u8–u128 and i8–i128 use structurally different varint schemes (different header formats), so changing signedness silently produces wrong values.
Float width changes: f32 and f64 use fixed-width encodings (4 and 8 bytes respectively), so they are not interchangeable.
Integer ↔ float: Integers use variable-length varint; floats use fixed-width bit-flipped IEEE 754. These are completely different encodings.
String ↔ &[u8]: Strings use sentinel byte 0x00; byte slices use 0x7F. Data encoded as one cannot be decoded as the other.
T ↔ Option<T>: Option adds a 0x00/0x01 prefix byte, so wrapping or unwrapping Option changes the encoding.
Adding or removing struct/tuple fields: Fields are concatenated without framing, so changing the field count causes subsequent fields to be misaligned.
Reordering or removing enum variants: Variants are identified by their positional index, so reordering or removing variants changes the meaning of existing encoded data.
[u8; N] ↔ &[u8] / Vec<u8>: Fixed-size byte arrays are serialized as tuples (each element varint-encoded, concatenated without framing), while byte slices use sentinel encoding. These are completely different formats.
deserialize_anyis not supported. Since lexcode is a non-self-describing binary format, the deserializer must know the expected type at compile time. This means dynamically-typed values likeserde_json::Valuecannot be deserialized from lexcode.
See COMPARISON.md for a detailed comparison with bytekey2, ordcode, memcomparable, and other binary serialization formats.
This library is in no way associated with Lex Fridman. This library cannot conduct a 4-hour podcast interview with your data structures.
Licensed under the MIT License.
