A comprehensive C library implementing error-correcting codes and source coding algorithms for educational and practical use.
| Codec | Type | Status | Description |
|---|---|---|---|
| Huffman | Source Coding | Complete | Dynamic Huffman coding with frequency analysis |
| Hamming | Channel Coding | Complete | Hamming(n,k) codes with single-error correction |
| BCH | Channel Coding | Partial | BCH codes with Berlekamp-Massey decoder (encoder works) |
| Goppa | Channel Coding | Partial | Binary Goppa codes with syndrome computation (Patterson decoder structure) |
- GF(2): Binary field operations, vector/matrix arithmetic, Gaussian elimination
- GF(2^m): Finite field operations (m=2..16) with log/antilog tables
- Polynomials: Arithmetic over GF(2) and GF(2^m), includes GCD, evaluation, modular operations
- Bit I/O: Efficient bit-level streaming for codec implementations
Coding-theory/
├── include/ # Public header files
│ ├── codectk.h # Main API: codec registry and error codes
│ ├── bitio.h # Bit-level I/O (header-only)
│ ├── gf2.h # Binary field GF(2) operations
│ ├── gf2m.h # Galois field GF(2^m) operations
│ ├── poly.h # Polynomial arithmetic
│ ├── huffman.h # Huffman coding
│ ├── hamming.h # Hamming codes
│ ├── bch.h # BCH codes (stub)
│ └── goppa.h # Goppa codes (stub)
├── src/ # Implementation files
│ ├── registry.c # Codec registry and error messages
│ ├── gf2.c # GF(2) implementation
│ ├── gf2m.c # GF(2^m) with table generation
│ ├── poly.c # Polynomial operations
│ ├── huffman.c # Huffman encoder/decoder
│ ├── hamming.c # Hamming encoder/decoder
│ ├── bch.c # BCH (stub)
│ └── goppa.c # Goppa (stub)
├── tests/ # Comprehensive test suite
│ ├── test_main.c
│ ├── test_bitio.c
│ ├── test_gf2.c
│ ├── test_gf2m.c
│ ├── test_poly.c
│ ├── test_hamming.c
│ └── test_huffman.c
├── tools/ # Command-line utilities
│ └── pipe.c # Encode/decode tool
├── CMakeLists.txt # Build configuration
└── claude.md # Detailed implementation plan
- CMake 3.10 or later
- GCC or Clang with C11 support
- Standard C library (libc)
# Create build directory
mkdir build && cd build
# Configure with CMake
cmake ..
# Build library and tests
make
# Run tests
./test_codectk
# Build demo tool
make pipe_toolDebug mode (default if not specified):
cmake -DCMAKE_BUILD_TYPE=Debug ..
makeIncludes debugging symbols, AddressSanitizer, and UBSan.
Release mode:
cmake -DCMAKE_BUILD_TYPE=Release ..
makeOptimized with -O3, no sanitizers.
This section shows how to demonstrate each implemented component.
The test suite provides comprehensive validation of all components:
cd build
./test_codectkExpected Output:
==============================================
Coding Theory Toolkit - Test Suite
==============================================
Running bitio tests...
[1] Write and read individual bits: PASS
[2] Partial byte handling: PASS
[3] Multi-byte write and read: PASS
[4] EOF handling: PASS
bitio: 4/4 tests passed
Running GF(2) tests...
[1] Vector init and free: PASS
[2] Vector set and get: PASS
[3] Vector XOR: PASS
[4] Vector dot product: PASS
[5] Vector Hamming weight: PASS
[6] Matrix init and free: PASS
[7] Matrix row reduction: PASS
gf2: 7/7 tests passed
Running GF(2^m) tests...
[1] Field context initialization: PASS
[2] Field axioms (associativity, commutativity): PASS
[3] Multiplicative inverse: PASS
[4] Exponentiation: PASS
gf2m: 4/4 tests passed
Running polynomial tests...
[1] GF(2) polynomial addition: PASS
[2] GF(2) polynomial multiplication: PASS
[3] GF(2^m) polynomial evaluation: PASS
[4] GF(2^m) polynomial GCD: PASS
poly: 4/4 tests passed
Running Hamming code tests...
[1] Hamming(7,4) encode/decode: PASS
[2] Hamming single-bit error correction: PASS
[3] Multiple code sizes: PASS
hamming: 3/3 tests passed
Running Huffman coding tests...
[1] Huffman encode/decode round-trip: PASS
[2] Huffman with single repeated symbol: PASS
[3] Huffman with varied symbol frequencies: PASS
[4] Huffman compression ratio check: PASS
huffman: 4/4 tests passed
==============================================
ALL TESTS PASSED
==============================================
What This Demonstrates:
- All mathematical primitives (GF(2), GF(2^m), polynomials) work correctly
- Bit I/O handles edge cases (partial bytes, EOF, multi-byte operations)
- Hamming codes successfully encode, decode, and correct single-bit errors
- Huffman coding achieves compression and perfect reconstruction
Demonstrate lossless data compression with Huffman coding:
# Create test file
echo "The quick brown fox jumps over the lazy dog. The dog was not amused." > input.txt
# Show original size
ls -lh input.txt
# Encode with Huffman
./pipe_tool encode huffman input.txt compressed.bin
# Show compressed size
ls -lh compressed.bin
# Decode back
./pipe_tool decode huffman compressed.bin output.txt
# Verify perfect reconstruction
diff input.txt output.txt
echo $? # Should output 0 (files identical)Expected Output Format:
Encoding with huffman...
Encoded: 70 bytes -> 187 bytes (267.14% of original)
Success!
Decoding with huffman...
Decoded: 187 bytes -> 70 bytes
Success!
# diff returns 0 (files identical)
What This Demonstrates:
- Huffman encoding works end-to-end
- Perfect reconstruction (lossless)
- Note: Huffman overhead from frequency table storage can exceed compression gains on small files
For better compression ratio, try a larger file:
# Create larger file with repeated content
for i in {1..100}; do echo "Lorem ipsum dolor sit amet consectetur adipiscing elit"; done > large.txt
./pipe_tool encode huffman large.txt large_compressed.bin
# Should show < 100% (actual compression)Demonstrate single-bit error correction (requires building a custom test program):
Create demo_hamming.c:
#include "include/hamming.h"
#include "include/codectk.h"
#include <stdio.h>
#include <string.h>
int main(void) {
const codectk_codec *codec = hamming_codec();
hamming_params params = {.m = 3}; // Hamming(7,4)
// Original data: 4 bits = 1010 binary
uint8_t data[1] = {0x0A};
uint8_t encoded[10];
uint8_t decoded[10];
size_t enc_bits = sizeof(encoded) * 8;
size_t dec_bits = sizeof(decoded) * 8;
// Encode
codec->encode(¶ms, data, 4, encoded, &enc_bits);
printf("Original: 0x%02X (4 bits: 1010)\n", data[0] & 0x0F);
printf("Encoded: 0x%02X (7 bits)\n\n", encoded[0] & 0x7F);
// Introduce single-bit error
printf("Introducing error: flipping bit 2\n");
encoded[0] ^= 0x04; // Flip bit 2
printf("Corrupted: 0x%02X\n\n", encoded[0] & 0x7F);
// Decode with error correction
size_t corrected = 0;
codectk_err err = codec->decode(¶ms, encoded, enc_bits,
decoded, &dec_bits, &corrected);
printf("Decode result: %s\n", codectk_strerror(err));
printf("Errors corrected: %zu\n", corrected);
printf("Recovered: 0x%02X (matches original: %s)\n",
decoded[0] & 0x0F,
((decoded[0] & 0x0F) == (data[0] & 0x0F)) ? "YES" : "NO");
return 0;
}Build and run:
gcc -o demo_hamming demo_hamming.c -Lbuild -lcodectk -lm -I.
./demo_hammingExpected Output:
Original: 0x0A (4 bits: 1010)
Encoded: 0x6A (7 bits)
Introducing error: flipping bit 2
Corrupted: 0x6E
Decode result: Success
Errors corrected: 1
Recovered: 0x0A (matches original: YES)
What This Demonstrates:
- Hamming codes add redundancy (4 bits -> 7 bits)
- Single-bit errors are detected and corrected automatically
- Perfect recovery of original data despite corruption
Demonstrate field operations (create demo_math.c):
#include "include/gf2m.h"
#include "include/poly.h"
#include <stdio.h>
int main(void) {
// Initialize GF(2^4) field with primitive polynomial x^4 + x + 1
gf2m_ctx ctx;
gf2m_ctx_init(&ctx, 4, 0x13);
printf("=== GF(2^4) Field Operations ===\n\n");
// Demonstrate field arithmetic
uint16_t a = 3, b = 5;
printf("Let a = %u, b = %u\n\n", a, b);
uint16_t sum = gf2m_add(a, b);
uint16_t product = gf2m_mul(&ctx, a, b);
uint16_t inv_a = gf2m_inv(&ctx, a);
uint16_t a_squared = gf2m_sqr(&ctx, a);
printf("a + b = %u (XOR in GF(2^m))\n", sum);
printf("a * b = %u\n", product);
printf("a^-1 = %u\n", inv_a);
printf("a^2 = %u\n", a_squared);
// Verify inverse property: a * a^-1 = 1
uint16_t check = gf2m_mul(&ctx, a, inv_a);
printf("\nVerify: a * a^-1 = %u (should be 1)\n", check);
// Polynomial operations
printf("\n=== Polynomial Operations over GF(2^4) ===\n\n");
poly_gf2m_t p, q, result;
poly_gf2m_init(&p, &ctx, 10);
poly_gf2m_init(&q, &ctx, 10);
poly_gf2m_init(&result, &ctx, 20);
// p(x) = 3x + 2
poly_gf2m_set_coeff(&p, 0, 2);
poly_gf2m_set_coeff(&p, 1, 3);
// q(x) = 5
poly_gf2m_set_coeff(&q, 0, 5);
printf("p(x) = 3x + 2\n");
printf("q(x) = 5\n\n");
// Evaluate p(1)
uint16_t p_at_1 = poly_gf2m_eval(&p, 1);
printf("p(1) = %u\n", p_at_1);
// Multiply polynomials
poly_gf2m_mul(&result, &p, &q);
printf("p(x) * q(x) = ");
for (int i = result.deg; i >= 0; i--) {
if (result.coeff[i] != 0) {
if (i < result.deg) printf(" + ");
printf("%u", result.coeff[i]);
if (i > 0) printf("x");
if (i > 1) printf("^%d", i);
}
}
printf("\n");
// Cleanup
poly_gf2m_free(&p);
poly_gf2m_free(&q);
poly_gf2m_free(&result);
gf2m_ctx_free(&ctx);
return 0;
}What This Demonstrates:
- GF(2^m) field operations: addition, multiplication, inversion, squaring
- Field axioms are satisfied (associativity, commutativity, inverses)
- Polynomial arithmetic over finite fields
- Mathematical foundation for BCH and Goppa codes
Create benchmark.sh to measure throughput:
#!/bin/bash
echo "=== Codec Performance Benchmark ==="
echo
# Create test files of various sizes
for size in 1K 10K 100K 1M; do
dd if=/dev/urandom of=test_${size}.bin bs=$size count=1 2>/dev/null
done
# Benchmark Huffman
echo "Huffman Encoding Throughput:"
for size in 1K 10K 100K 1M; do
echo -n " $size: "
time -p ./pipe_tool encode huffman test_${size}.bin out.bin 2>&1 | grep real | awk '{print 1/'$2' " MB/s"}'
done
echo
echo "Hamming(15,11) would be tested similarly with appropriate parameters"
# Cleanup
rm -f test_*.bin out.bin#include "include/codectk.h"
// Get codec
const codectk_codec *codec = codectk_get("huffman");
// Encode
uint8_t input[100] = {...};
uint8_t output[1000];
size_t output_bits = sizeof(output) * 8;
codectk_err err = codec->encode(NULL, input, sizeof(input) * 8,
output, &output_bits);
if (err != CODECTK_OK) {
fprintf(stderr, "Error: %s\n", codectk_strerror(err));
}
// Decode
uint8_t decoded[1000];
size_t decoded_bits = sizeof(decoded) * 8;
size_t num_corrected = 0;
err = codec->decode(NULL, output, output_bits,
decoded, &decoded_bits, &num_corrected);#include "include/gf2m.h"
// Initialize GF(2^8) with AES polynomial
gf2m_ctx ctx;
gf2m_ctx_init(&ctx, 8, 0x11B);
// Field operations
uint16_t a = 0x53, b = 0xCA;
uint16_t sum = gf2m_add(a, b); // XOR
uint16_t product = gf2m_mul(&ctx, a, b); // Using log tables
uint16_t inverse = gf2m_inv(&ctx, a); // Multiplicative inverse
gf2m_ctx_free(&ctx);- Build system with CMake
- GF(2) binary field operations
- GF(2^m) with runtime table generation
- Polynomial arithmetic (both GF(2) and GF(2^m))
- Bit I/O streaming
- Error message system
- Huffman dynamic coding with frequency analysis
- Hamming codes with parameterizable m
- Comprehensive test suite for all components
- BCH codes: ✓ Implemented (569 lines)
- Generator polynomial from minimal polynomials (LCM computation)
- Berlekamp-Massey algorithm for error locator polynomial
- Chien search for finding error positions
- Encoder works, decoder needs syndrome debugging for small codes
- Binary Goppa codes: ✓ Implemented (350 lines)
- Parity-check matrix H construction from (L, g)
- Systematic encoder structure
- Patterson decoder: syndrome computation, polynomial inversion
- Note: Quadratic splitting algorithm for full error correction pending
- ASM backends (PCLMULQDQ, PMULL)
- CLI tool with pipeline support
- Fuzzing and property tests
- CI/CD integration
- Performance optimization
- Clean Compilation:
-Wall -Wextra -Wconversionwith zero warnings - Memory Safety: All tests pass under AddressSanitizer and UBSan
- Documentation: Comprehensive comments explaining algorithms
- Testing: Unit tests for all components with edge case coverage
- Frequency analysis of input symbols
- Min-heap construction for tree building
- Code assignment via tree traversal
- Header format: HUF1 magic + 257-entry frequency table
- Bit-level encoding and decoding
- Runtime computation of (n,k) from parameter m
- Parity bit placement at powers of 2
- Syndrome calculation for error detection
- Single-bit error correction via syndrome
- Log/antilog table generation from primitive polynomial
- Multiplication: log(a*b) = log(a) + log(b)
- Inversion: log(a^-1) = (2^m - 1) - log(a)
- Verification of primitive polynomial during init
- BCH decoder with Berlekamp-Massey algorithm
- Goppa decoder with Patterson's algorithm
- ASM acceleration for x86-64 and ARM64
- Streaming API for large files
- Python/Rust bindings via FFI
Educational use. See source files for details.
- TODO.md: Detailed task breakdown
- claude.md: Complete implementation plan and architecture notes
- Source code comments: Algorithm explanations and optimization notes