PseudoForge v0.1.4
This release is a major deterministic cleanup-quality update focused on large Windows kernel decompilation corpora. It expands layout provenance, domain-aware identity recovery, corpus quality reporting, replay planning, and fail-closed validation while keeping IDB writes limited to explicitly selected, validator-gated renames.
Compared with v0.1.3, this release includes 239 commits across 107 files.
Highlights
- Added trusted temp-base provenance reporting for decompiler temporary layout bases.
- Added Windows kernel domain identity packs for common subsystem-specific structures and roles.
- Added corpus quality, replay planning, quality comparison, and cleanup integrity tooling.
- Expanded field layout hints, layout rewrite previews, blocker queues, and provenance comments.
- Improved NTSTATUS/status literal cleanup and residue reporting.
- Hardened LLM rename handling, stale fallback cleanup, and risky rename suppression.
- Updated README and release validation documentation for the new corpus-quality workflow.
Trusted Temp-Base Provenance
PseudoForge now tracks decompiler temporary layout bases through source origin, lifetime stability, merge shape, guard dominance, and mutation risk.
New evidence comments include:
inferred_offset_temp_provenance_traceinferred_offset_trusted_temp_sourceinferred_offset_temp_promotion_blockedinferred_offset_same_family_merge_provenanceinferred_offset_call_result_parameter_dominanceinferred_offset_post_access_mutation_blocker
Trust classes now distinguish stable, review-only, and blocked candidates such as:
trusted_stable_sourcetrusted_stable_tempstable_review_onlysame_family_merge_reviewcall_result_parameter_reviewcall_result_temporary_reviewbranch_merge_blockedreassignment_blockedmutation_blockedopaque_source_blockedweak_or_unknown_source_blocked
Canonical layout rewrites remain fail-closed. Opaque call results, mixed call-result/parameter branches without dominance proof, bugcheck/debug parameters without domain identity, globals/MMIO-looking bases, array cursors, post-access writes, address-taken uses, and pointer mutations stay report-only or blocked.
Domain Identity Packs
Added a domain identity framework and subsystem packs for Windows kernel analysis, including:
- Object Manager
- Process and thread lifecycle
- Process/thread notify callbacks
- Token and security objects
- Handle tables
- Memory Manager and VAD-related flows
- Registry configuration
- I/O Manager
- PnP and power paths
- File cache and section objects
- ALPC ports
- ETW/WMI telemetry
- Executive async patterns
- Dispatcher synchronization
- Security descriptors and ACLs
- Image code integrity
- Trap and processor state contexts
These profiles improve role naming and layout evidence without turning weak identity hints into unsafe rewrites.
Layout And Structural Analysis
Expanded deterministic layout analysis with:
- Field layout hints and field alias previews
- Stable base source evidence
- Generic base evidence and trust candidates
- Subfield overlay and narrow subfield evidence
- Bitfield mask and bitfield alias hints
- Hot field cluster evidence
- Base stability and relocation-sensitive RHS samples
- Merged layout base evidence
- Allocation/null merge dominance
- Call-result merge equivalence
- Parameter merge provenance
- Bugcheck merge identity
- Indexed callback table evidence
- Dense structural hints
Validated layout rewrite previews are now exported as reviewable artifacts, with canonical and partial rewrite paths gated behind explicit validation.
Corpus Quality And Replay Planning
Added and expanded corpus-level tooling:
tools/pseudoforge_corpus_quality.pytools/pseudoforge_replay_plan.pytools/pseudoforge_quality_compare.pytools/pseudoforge_cleanup_integrity.py
The quality report now tracks residue metrics, source identity blockers, layout blocker queues, preview validation state, temp-base provenance, source origins, branch merge shapes, dominance state, and blocked/review-only candidates.
Replay planning can rank high-value functions and emit focused EA lists for targeted no-LLM reruns.
Status Cleanup Improvements
Expanded deterministic NTSTATUS and status-like cleanup, including:
- Profiled status argument cleanup
- Status alias comparison cleanup
- Guard-dispatch status aliases
- Logical OR status aliases
- Status pointer store literals
- Nested status pointer stores
- Low-DWORD status carriers
- Bitmask-guarded status comparisons
- Small enum and debug-exception residue split queues
- More detailed NTSTATUS residue review metadata
Batch And Validation Workflow
Improved headless IDA and corpus validation workflows:
- Deterministic IDA replay mode
- LLM candidate cache replay mode
- Better source-identity replay queues
- Better replay scoring for layout actionability and residue saturation
- Cleanup integrity QA gate
- Release validation documentation for deterministic replay and corpus quality comparisons
- Kernel corpus relocation/package workflow updates
LLM And Rename Safety
Improved rename handling with:
- Evidence-backed dispatcher LLM rename salvage
- Pascal underscore normalization
- Risky unassigned LLM rename suppression
- Stale LLM fallback artifact cleanup
- Filtered warning artifact export
- Better diagnostics for rename quality and weak candidate residue
LLM suggestions remain advisory and must pass deterministic validation.
Documentation
Updated documentation for:
- Trusted temp-base provenance reporting
- Corpus quality and replay planning workflow
- Quality comparison commands
- Release validation workflow
- Kernel corpus package installation and runbook guidance
Compatibility Notes
- Existing corpus artifacts can still be read, but new provenance metrics require rerunning analysis with this version.
- Kernel Corpus data packages remain separate from PseudoForge plugin releases.
- New quality and replay metrics are additive.
- IDB writes remain limited to selected, validator-gated local and argument renames.