Conversation
…ot, not fully verified, but works for tests
added benchmark
There was a problem hiding this comment.
Pull request overview
This PR significantly expands ARM Thumb (ARMv8-M) backend infrastructure and documentation while removing unused ARM64 stubs, aiming to support a global register allocator and related optimizations, benchmarks, and test workflows.
Changes:
- Added ARM Thumb backend components (scratch register management, call-site tracking, ABI classification, arch/FPU configs).
- Added extensive design/validation documentation and benchmark analysis notes for IR/codegen optimizations.
- Added container + CI scaffolding and repo tooling metadata (Dockerfile, workflows, submodules, editor configs); removed ARM64 dummy assembler/linker code.
Reviewed changes
Copilot reviewed 48 out of 536 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| asm_token_fix_plan.md | Documents migration plan for new ASM token layout in Thumb assembler |
| arm64-link.c | Removes AArch64 linker/relocation implementation from this branch |
| arm64-asm.c | Removes ARM64 dummy assembler stub |
| arm-tok.h | Removes legacy ARM token definitions header |
| arm-thumb-scratch.c | Adds scratch register allocation/save/restore helpers for Thumb codegen |
| arm-thumb-defs.h | Introduces Thumb target defs, ABI enums, generator state structs, suffix parsing types |
| arm-thumb-callsite.c | Adds call-site table + IR scan utilities for ABI call layout reconstruction |
| arch/fpu/arm/fpv5-sp-d16.h | Declares FPv5-SP-D16 floating-point config |
| arch/fpu/arm/fpv5-sp-d16.c | Defines FPv5-SP-D16 floating-point capabilities/config |
| arch/armv8m.c | Adds ARMv8-M ArchitectureConfig defaults |
| arch/arm_aapcs.c | Adds AAPCS-like argument classification + call layout capacity helpers |
| TCC_GCC_CODEGEN_COMPARISON.md | Documents codegen gaps vs GCC and targeted optimizations |
| TAL_TOKSTR_OPTIMIZATION.md | Documents plan to reduce peak memory by changing tok_str allocation strategy |
| REFACTORING_SUMMARY.md | Summarizes “architecture independence” refactor and new machine/opt modules |
| POSTINC_EMBEDDED_DEREF_PLAN.md | Documents plan for post-inc fusion when deref is embedded in expressions |
| OPTIMIZATION_VALIDATION_REPORT.md | Records which optimization phases are implemented/validated |
| OPTIMIZATION_QUICK_WIN_1_FP_CACHE.md | Documents FP-offset cache implementation and tests |
| OPTIMIZATION_PLAN_V2.md | Updated optimization plan and analysis (LR usage, indexed loads, post-inc, etc.) |
| OPTIMIZATION_FP_CACHE_IMPROVEMENT_PLAN.md | Documents FP cache “register-agnostic” approach and limitations |
| MLA_WITH_DEREFERENCES_PLAN.md | Documents plan to enable MLA fusion when MUL operands are dereferences |
| LICM_IMPLEMENTATION_STATUS.md | Status/report on LICM implementation and known bugs |
| LICM_IMPLEMENTATION_PLAN.md | Detailed LICM design plan |
| LAZY_SECTION_LOADING_ALL_PLAN.md | Notes reverted attempt to defer-load all sections and failure reasons |
| LAZY_SECTION_LOADING.md | Documents lazy section loading design/implementation |
| FP_CACHE_IR_LEVEL_PLAN.md | Proposes IR-level FP offset caching as a cleaner approach |
| Dockerfile | Adds Ubuntu-based build/test container with ARM toolchain + QEMU |
| BUBBLE_SORT_COMPARISON.md | Detailed bubble-sort codegen diff analysis and optimization priorities |
| AGENTS.md | Adds project agent/guide documentation, structure, build/test notes |
| .vscode/tasks.json | Adds QEMU run/stop tasks |
| .vscode/settings.json | Adds terminal auto-approve configuration for tooling |
| .vscode/launch.json | Adds debug configs (tcc under gdb, QEMU gdb attach) |
| .vscode/c_cpp_properties.json | Adds IntelliSense settings/defines for ARM Thumb target |
| .gitmodules | Adds new submodules (newlib, pico-sdk, mibench) |
| .github/workflows/docker-build.yml | Adds workflow to build/push container image to GHCR |
| .github/workflows/ci.yml | Adds CI workflow running tests inside GHCR container |
| .github/prompts/plan-aliasAnalysisStoreLoadOptimizations.prompt.md | Adds prompt doc for alias-analysis/store-load optimization plan |
| .dockerignore | Adds docker ignore rules for repo build artifacts and tooling dirs |
| .clang-format | Updates formatting rules (including brace style + column limit) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| #pragma once | ||
|
|
||
| #define USING_GLOBALS |
There was a problem hiding this comment.
Defining USING_GLOBALS in a header is very likely to cause multiple-definition / ODR-style linker errors because every translation unit including this header will pull in global definitions from tcc.h. Move #define USING_GLOBALS into exactly one .c file (typically the implementation file) or remove it entirely from the header and include only what’s needed (e.g., forward-declare FloatingPointConfig or include a smaller header that doesn’t depend on USING_GLOBALS).
| #define USING_GLOBALS |
| if (layout->capacity < needed) | ||
| { | ||
| int new_cap = layout->capacity ? layout->capacity : 8; | ||
| while (new_cap < needed) | ||
| new_cap *= 2; | ||
| // layout->locs = (TCCAbiArgLoc *)tcc_realloc(layout->locs, sizeof(TCCAbiArgLoc) * new_cap); | ||
| /* Zero new tail for determinism. */ | ||
| // memset(&layout->locs[layout->capacity], 0, sizeof(TCCAbiArgLoc) * (new_cap - layout->capacity)); | ||
| layout->capacity = new_cap; | ||
| } |
There was a problem hiding this comment.
layout->capacity is increased without reallocating the backing arrays (the tcc_realloc/memset calls are commented out). This can desynchronize capacity from actual allocated sizes and can also cause tcc_abi_call_layout_ensure_capacity() to incorrectly skip reallocation later, leading to out-of-bounds writes. Replace this block with a call to tcc_abi_call_layout_ensure_capacity(layout, needed) (or fully re-enable correct realloc/zero-init here) and keep capacity consistent with the real allocations.
| if (layout->capacity < needed) | |
| { | |
| int new_cap = layout->capacity ? layout->capacity : 8; | |
| while (new_cap < needed) | |
| new_cap *= 2; | |
| // layout->locs = (TCCAbiArgLoc *)tcc_realloc(layout->locs, sizeof(TCCAbiArgLoc) * new_cap); | |
| /* Zero new tail for determinism. */ | |
| // memset(&layout->locs[layout->capacity], 0, sizeof(TCCAbiArgLoc) * (new_cap - layout->capacity)); | |
| layout->capacity = new_cap; | |
| } | |
| tcc_abi_call_layout_ensure_capacity(layout, needed); |
| for (int i = 0; i < count; ++i) | ||
| { | ||
| if (regs_to_save & (1u << result.regs[i])) | ||
| { | ||
| if (scratch_push_count < 128) | ||
| scratch_push_stack[scratch_push_count++] = result.regs[i]; | ||
| else | ||
| tcc_error("compiler_error: scratch register push stack overflow (>128 pushes without restore)"); | ||
| } | ||
| } |
There was a problem hiding this comment.
When you emit a single th_push(regs_to_save) with a register mask, the architectural stack layout is in register-number order, but scratch_push_stack records registers in allocation order (result.regs[]). If allocation order differs from ascending register number order, later POP validation/deferral and the “auto restore” path can pop registers in an order that doesn’t match what’s actually on the stack, restoring wrong values. A robust fix is to track pushes as masks on a push-mask stack (LIFO of uint32_t masks) instead of per-register, or record the saved registers in the exact order they are stored on the stack (ascending register number) for each mask push.
|
|
||
| if (can_restore_all && check_count > 0) | ||
| { | ||
| fprintf(stderr, "[SCRATCH] Popping registers (mask=0x%x) in single instruction\n", allocs->saved_mask); |
There was a problem hiding this comment.
These fprintf(stderr, ...) calls are unconditional (not behind #ifdef ARM_THUMB_DEBUG_SCRATCH) and will spam stderr in normal builds/tests. Wrap them in the same debug macro used elsewhere in this file (or route through a project logging macro that can be compiled out).
| } | ||
| else | ||
| { | ||
| fprintf(stderr, "[SCRATCH] WARNING: restore_scratch_regs out of order; deferring POP\n"); |
There was a problem hiding this comment.
These fprintf(stderr, ...) calls are unconditional (not behind #ifdef ARM_THUMB_DEBUG_SCRATCH) and will spam stderr in normal builds/tests. Wrap them in the same debug macro used elsewhere in this file (or route through a project logging macro that can be compiled out).
.vscode/settings.json
Outdated
| "make": true, | ||
| "heaptrack_print": true |
There was a problem hiding this comment.
Committing editor settings that auto-approve terminal commands can create a footgun for contributors (commands may run without explicit confirmation). Consider removing this from the repo, scoping it to a local/user settings example, or setting these to false by default with documentation on how to opt-in locally.
| "make": true, | |
| "heaptrack_print": true | |
| "make": false, | |
| "heaptrack_print": false |
| BasedOnStyle: LLVM | ||
| ColumnLimit: 120 | ||
| AllowShortFunctionsOnASingleLine: None | ||
| BreakBeforeBraces: Allman |
There was a problem hiding this comment.
BreakBeforeBraces: Allman conflicts with the brace style documented in AGENTS.md (K&R style). This will cause churn and inconsistent formatting expectations. Align the documented style and .clang-format settings (either update the docs to Allman or adjust .clang-format to match the documented K&R approach).
| BreakBeforeBraces: Allman | |
| BreakBeforeBraces: Attach |
and a lot of other things + benchmarks and tests