AethAlloc

A high-performance memory allocator optimized for network packet processing and memory-constrained workloads.

Overview

AethAlloc is a production-grade memory allocator featuring:

Thread-Local Caching: Lock-free per-thread free lists with 14 size classes (16B - 64KB)
SIMD-Safe Alignment: All allocations are 16-byte aligned for AVX/SSE safety
O(1) Anti-Hoarding: Batch transfer to global pool prevents memory bloat in producer-consumer patterns
Zero Fragmentation: 11x better memory efficiency than glibc in long-running workloads

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Thread N                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ ThreadLocalCache                                         │   │
│  │  heads[14] ──► Free List (size class 0-13)              │   │
│  │  counts[14] ──► Cached block counts                      │   │
│  └─────────────────────────────────────────────────────────┘   │
│                           │                                     │
│              Anti-Hoarding Threshold (4096)                     │
│                           │                                     │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │ GlobalFreeList[14]                                       │   │
│  │  Lock-free Treiber Stack (O(1) batch push)              │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────────┐
│                      PageAllocator                               │
│  mmap/munmap backend with 4KB page granularity                  │
│  PageHeader: magic + num_pages + requested_size                 │
└─────────────────────────────────────────────────────────────────┘

Crates

Crate	Description
`aethalloc-core`	Core algorithms (page allocator, size classes, lock-free stack)
`aethalloc-abi`	C ABI exports for LD_PRELOAD injection

Deployment Targets

m720q Gateway (Multi-WAN Routing)

Producer-Consumer: 622K ops/s (+26% vs glibc)
Guarantees line-rate packet inspection
Asynchronous SKB payload handoff between NIC and firewall workers

X1 Yoga Workstation (Desktop)

Fragmentation RSS: 24 MB (9x better than glibc)
Prevents memory bloat in long-running desktop environments
Preserves NVMe lifespan and battery capacity

Building

# Build the shared library
nix build

# Or with cargo
cargo build --release -p aethalloc-abi

Usage

# LD_PRELOAD injection
LD_PRELOAD=./target/release/libaethalloc_abi.so ./your-program

# With Nix wrapper
nix run .#suricata-aeth

Feature Flags

Feature	Description	Default
`simple-cache`	Thread-local free-list per size class	Yes
`magazine-caching`	Hoard-style magazines (experimental)	No

Benchmarks

Test System: Intel Core i5-8365U (4 cores, 8 threads) @ 1.60GHz, 16 GB RAM
Last updated: 2026-03-18

Summary

Benchmark	glibc	AethAlloc	Ratio
Packet Churn	238K ops/s	245K ops/s	103%
KV Store	328K ops/s	365K ops/s	111%
Producer-Consumer	493K ops/s	622K ops/s	126%
Multithread (8T)	6.7M ops/s	6.5M ops/s	97%
Fragmentation RSS	220 MB	24 MB	9x better

Packet Churn (Network Processing)

64-byte packet allocations simulating network processing workload.

glibc:       238K ops/s
AethAlloc:   245K ops/s (+3%)
P50 latency: 3.4 µs
P99 latency: 7.7 µs

KV Store (Redis-like Workload)

Variable-sized keys (8-64B) and values (16-64KB) with SET/GET/DEL operations.

glibc:       328K ops/s
AethAlloc:   365K ops/s (+11%)
SET latency: 3.6 µs
GET latency: 0.6 µs
DEL latency: 0.9 µs

Producer-Consumer (Cross-Thread Frees)

Thread A allocates, Thread B frees. Simulates network packet handoff.

glibc:       493K ops/s
AethAlloc:   622K ops/s (+26%)
Memory:      Stable with anti-hoarding active

Multithread Churn (8 Threads)

Concurrent allocations (16B - 4KB) across 8 threads with heavy contention.

glibc:       6.7M ops/s
AethAlloc:   6.5M ops/s (97%)
Avg latency: 834 ns

Fragmentation (Long-running Server)

Mixed allocation sizes (16B - 1MB) over 1M iterations.

glibc:       220 MB RSS
AethAlloc:   24 MB RSS (9x better)
Throughput:  232K ops/s

Tail Latency (P99/P99.9)

80,000 operations across 8 threads measuring per-operation latency.

glibc:       P50=84ns  P99=103ns  P99.9=127ns  P99.99=26µs  Max=2.3ms
AethAlloc:   P50=93ns  P99=116ns  P99.9=600ns  P99.99=10µs  Max=2.1ms

Massive Allocations (>1GB)

Huge contiguous allocations with high alignment.

glibc:       256MB, 512MB, 1GB, 1GB@2MB-align, 2GB - all PASS
AethAlloc:   256MB, 512MB, 1GB, 1GB@2MB-align, 2GB - all PASS

Technical Implementation

SIMD Alignment

All allocations return 16-byte aligned pointers:

const CACHE_HEADER_SIZE: usize = 16; // Ensures AVX/SSE safety

O(1) Batch Push

Anti-hoarding uses single CAS for entire batch:

// Walk local list to find tail
while walked < flush_count {
    batch_tail = (*batch_tail).next;
}

// Single atomic swap for entire batch
GLOBAL_FREE_LISTS[class].push_batch(batch_head, batch_tail);

Size Classes

14 power-of-two size classes from 16 bytes to 64KB:

Class	Size	Class	Size
0	16B	7	2KB
1	32B	8	4KB
2	64B	9	8KB
3	128B	10	16KB
4	256B	11	32KB
5	512B	12	64KB
6	1KB	13	(reserved)

Testing

# Run all tests
cargo test --all

# Run benchmarks
gcc -O3 -pthread benches/packet_churn.c -o /tmp/packet_churn
LD_PRELOAD=./target/release/libaethalloc_abi.so /tmp/packet_churn

# Run stress tests
gcc -O3 benches/corruption_test.c -o /tmp/corruption_test
LD_PRELOAD=./target/release/libaethalloc_abi.so /tmp/corruption_test

Status

Component	Status
Core allocator	✅ Complete
Thread-local caching	✅ Complete
SIMD alignment	✅ Complete
O(1) anti-hoarding	✅ Complete
Lock-free global pool	✅ Complete
Benchmarks	✅ Complete
Stress tests	✅ Complete
CI/CD	✅ Complete

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.cargo		.cargo
.engram		.engram
.github/workflows		.github/workflows
aethalloc-abi		aethalloc-abi
aethalloc-amo		aethalloc-amo
aethalloc-core		aethalloc-core
aethalloc-hess		aethalloc-hess
aethalloc-metrics		aethalloc-metrics
aethalloc-vmpc		aethalloc-vmpc
benches		benches
tests/integration		tests/integration
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AethAlloc

Overview

Architecture

Crates

Deployment Targets

m720q Gateway (Multi-WAN Routing)

X1 Yoga Workstation (Desktop)

Building

Usage

Feature Flags

Benchmarks

Summary

Packet Churn (Network Processing)

KV Store (Redis-like Workload)

Producer-Consumer (Cross-Thread Frees)

Multithread Churn (8 Threads)

Fragmentation (Long-running Server)

Tail Latency (P99/P99.9)

Massive Allocations (>1GB)

Technical Implementation

SIMD Alignment

O(1) Batch Push

Size Classes

Testing

Status

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AethAlloc

Overview

Architecture

Crates

Deployment Targets

m720q Gateway (Multi-WAN Routing)

X1 Yoga Workstation (Desktop)

Building

Usage

Feature Flags

Benchmarks

Summary

Packet Churn (Network Processing)

KV Store (Redis-like Workload)

Producer-Consumer (Cross-Thread Frees)

Multithread Churn (8 Threads)

Fragmentation (Long-running Server)

Tail Latency (P99/P99.9)

Massive Allocations (>1GB)

Technical Implementation

SIMD Alignment

O(1) Batch Push

Size Classes

Testing

Status

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages