PA5 is a fixed-size C++ heap allocator that implements explicit free/used metadata, next-fit allocation, block splitting, and bidirectional coalescing, with a Linux x86 (-m32) workflow that preserves the original assignment behavior.
This repository includes the allocator implementation, assignment-provided test harnesses, Windows reference output, and a Linux-friendly build and benchmarking workflow.
The allocator manages a single 50 KiB region (Mem::TotalSize) using in-heap metadata (Heap, Free, Used) and explicit linked-list management for both free and used blocks.
This repository focuses on behavior preservation first and measured optimization second.
Building and validating this allocator exercises:
- low-level memory layout and metadata design
- fragmentation behavior, splitting, and coalescing tradeoffs
- allocator performance under stress workloads
- portability constraints between MSVC/Windows and
g++/Linux
- Custom allocator implementation in
PA5/Mem.cpp - Next-fit style free-block search with a single wrap
- Address-sorted free-list insertion for deterministic coalescing checks
- Block splitting with a minimum residual threshold (
sizeof(Free) + 8) - Upward and downward coalescing of physically adjacent free blocks
- Incremental heap statistics tracking for used/free blocks and memory
- Linux x86 build path with
-m32and fair stress benchmark target - Compatibility shims in
Framework/for MSVC/Windows-specific assumptions
Allocator instrumentation options:
| Macro | Purpose |
|---|---|
MEM_PROFILE |
Counts allocator operations (calls, traversal steps, merge counts) |
MEM_PROFILE_TIMING |
Adds timing accumulation for key allocator paths |
MEM_VERIFY_FREE_STATS |
Verifies free-list stats against list traversal (debug validation) |
Managed region: Mem::TotalSize = 50 KiB
low address
+--------------------------------------------------------------+
| Heap header (Heap, 32 bytes) |
| pUsedHead | pFreeHead | pNextFit | used/free statistics |
+--------------------------------------------------------------+ <-- first block
Each block in heap order:
+-------------------------------+------------------------------+
| Block header (Free or Used) | Payload |
| mData (size + flags) | user bytes |
| pNext | ... |
| pPrev | [last 4 bytes] secret ptr |
+-------------------------------+------------------------------+
mData bit layout (uint32_t):
- bit 0: Type (1 = Free, 0 = Used)
- bit 1: AboveFree (1 = previous/above block is free)
- bits 31..2: payload size (mask with ~0x3)
high address
.
├── PA5/
│ ├── Mem.cpp / Mem.h # allocator logic and public API
│ ├── Heap.cpp / Heap.h # heap state metadata
│ ├── Free.cpp / Free.h # free block metadata
│ └── Used.cpp / Used.h # used block metadata
├── PA5_ReadOnly/
│ ├── main.cpp
│ └── MemReadOnly.cpp # assignment-provided runtime/harness
├── Test/
│ ├── Mem_Test_01.cpp ... Mem_Test_17.cpp
│ └── Mem_Test_Stress.cpp
├── Framework/ # framework utilities + Linux compatibility headers
├── Makefile # Linux build and benchmark targets
└── Reference_Output.txt # Windows reference benchmark output
You need a 32-bit capable C++ toolchain.
Example for Debian/Ubuntu:
sudo apt update
sudo apt install -y build-essential g++-multilib libc6-dev-i386make full && ./build/pa5_debug_x86make stress-release && ./build/pa5_stress_release_x86For fair Linux benchmarking, this repository uses:
-m32-O3-DNDEBUG-flto-fno-builtin-malloc-fno-builtin-free
- Functional assignment tests are in
Test/Mem_Test_01.cppthroughTest/Mem_Test_17.cpp - The stress benchmark is in
Test/Mem_Test_Stress.cpp - Recommended workflow:
- run full debug tests
- run the fair stress-release benchmark
- use median over multiple runs for performance comparisons
| Mode | System allocator (Mem Lib) |
Custom allocator (Custom Memory) |
Reported ratio (times faster) |
|---|---|---|---|
| Windows x86 Debug | 4118.881748 ms |
3374.176539 ms |
1.220707x |
| Windows x86 Release | 871.623310 ms |
249.820803 ms |
3.488994x |
| Mode | Custom allocator median | Ratio median (supplemental) |
|---|---|---|
| Linux x86 stress-release | ~195.99 ms |
~0.72x |
| Baseline | Custom allocator time |
|---|---|
| Windows x86 Release custom baseline | 249.820803 ms |
| Linux fair stress-release custom median | ~195.99 ms |
The Linux custom allocator raw benchmark time is about 21.5% faster than the Windows x86 Release custom baseline.
The times faster ratio is computed against each platform's system allocator baseline.
Because Linux malloc/free and the Windows CRT allocator behave differently, Windows and Linux ratios are not directly apples-to-apples.
For cross-platform comparison, use raw custom allocator time as the primary metric:
- ratio is useful as a within-platform indicator
- raw custom time is the Windows-vs-Linux baseline comparison
#include "Mem.h"
int main()
{
Mem mem(Mem::Guard::Type_A);
mem.initialize();
void* p = mem.malloc(256);
if (p != nullptr)
{
mem.free(p);
}
return 0;
}- Keep optimization scope narrow and measurable; preserve allocator semantics before changing allocator policy.
- Add an optional heap-integrity pass that validates header flags, list links, block sizes, and adjacency invariants after stress runs.
- Evaluate alternate placement policies (for example best-fit or size-segregated free lists) behind compile-time switches while keeping assignment behavior as a regression gate.
- Replace ad hoc benchmark sampling with scripted multi-run statistics (median, p95, variance) and fixed CPU affinity to reduce noise.
- Improve reproducibility with CI jobs that build and run both
fullandstress-releaseon a 32-bit-capable Linux toolchain.
This project began as an assignment-style custom memory allocator exercise and was later extended with a Linux build path, benchmarking workflow, and measured performance improvements.