Skip to content

Conversation

jserv
Copy link
Collaborator

@jserv jserv commented Sep 4, 2025

This pull request does a cohesive ELF improvement initiative that:

  • Establishes proper read-only memory sections (.rodata)
  • Implements C const qualifier semantics with proper memory placement
  • Enhances memory protection by separating read-only from writable data

Summary by cubic

Adds proper ELF .rodata and .bss sections and implements the C const qualifier. Strings and const-initialized globals are now read-only; uninitialized globals go to .bss, improving memory protection.

  • New Features
    • ELF: add .rodata and .bss; update section counts/indices, shstrtab, program header sizes, and alignment; write .rodata to file, count .bss in memsz only.
    • Data placement: string literals and const-initialized globals to .rodata; uninitialized globals to .bss; others to .data.
    • Language/codegen: add const keyword; track is_const_qualified/is_initialized; ARM/RISC-V load addresses from elf_rodata_start; compute data/rodata/bss bases.
    • Tests/docs: add const global tests; remove “missing .rodata” from README known issues.

cubic-dev-ai[bot]

This comment was marked as outdated.

@jserv jserv force-pushed the improve-elf branch 5 times, most recently from 373109a to cf6eb19 Compare September 5, 2025 14:50
The ARM backend had two conflicting definitions of the __bx function,
causing compilation errors. This commit removes the incorrect duplicate
and keeps the proper BX (Branch and Exchange) instruction implementation.

The correct implementation uses: (cond << 28) | 0x012FFF10 | rm
which properly encodes the ARM BX instruction.
Copy link
Collaborator

@DrXiao DrXiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the C const qualifier is only "partially" supported, since there are still certain cases that shecc cannot handle.

For example, the improved shecc cannot parse a constant pointer such as:

int * const ptr; /* constant pointer */

Therefore, how about clarifying this in the commit message to indicate that the C const qualifier is only "partially" supported?

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Sep 7, 2025
@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Sep 7, 2025
@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Sep 7, 2025
Copy link

cubic-dev-ai bot commented Sep 7, 2025

@cubic-dev-ai Review

@jserv I've started the AI code review. It'll take a few minutes to complete.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 17 files

React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.


elf_code = strbuf_create(MAX_CODE);
elf_data = strbuf_create(MAX_DATA);
elf_rodata = strbuf_create(MAX_DATA);
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elf_rodata is allocated but never freed in global_release, causing a resource leak

Prompt for AI agents
Address the following comment on src/globals.c at line 1234:

<comment>elf_rodata is allocated but never freed in global_release, causing a resource leak</comment>

<file context>
@@ -1227,10 +1231,12 @@ void global_init(void)
 
     elf_code = strbuf_create(MAX_CODE);
     elf_data = strbuf_create(MAX_DATA);
+    elf_rodata = strbuf_create(MAX_DATA);
     elf_header = strbuf_create(MAX_HEADER);
     elf_symtab = strbuf_create(MAX_SYMTAB);
</file context>
Fix with Cubic

try_ 35 << EOF
int main() {
const int value = 35;
int *ptr = &value;
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implicitly discards const qualifiers by assigning &value (const int) to int*, contradicting the test intent and C const semantics; use const int *ptr instead.

Prompt for AI agents
Address the following comment on tests/driver.sh at line 1829:

<comment>Implicitly discards const qualifiers by assigning &amp;value (const int) to int*, contradicting the test intent and C const semantics; use const int *ptr instead.</comment>

<file context>
@@ -1775,6 +1775,197 @@ int main()
+try_ 35 &lt;&lt; EOF
+int main() {
+    const int value = 35;
+    int *ptr = &amp;value;
+    return *ptr;
+}
</file context>
Suggested change
int *ptr = &value;
const int *ptr = &value;
Fix with Cubic

if (lex_accept(T_const)) {
is_const = true;
/* After const, we expect a type */
if (!lex_peek(T_identifier, token))
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rejects valid declarations like 'const struct/union ...' by requiring an identifier after const

Prompt for AI agents
Address the following comment on src/parser.c at line 4367:

<comment>Rejects valid declarations like &#39;const struct/union ...&#39; by requiring an identifier after const</comment>

<file context>
@@ -4332,14 +4360,22 @@ basic_block_t *read_body_statement(block_t *parent, basic_block_t *bb)
+    if (lex_accept(T_const)) {
+        is_const = true;
+        /* After const, we expect a type */
+        if (!lex_peek(T_identifier, token))
+            error(&quot;Expected type after const&quot;);
+    }
</file context>
Fix with Cubic

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 issues found across 18 files

React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.

OP_load_constant, /* load constant */
OP_load_data_address, /* lookup address of a constant in data section */
OP_load_constant, /* load constant */
OP_load_data_address, /* lookup address of a constant in data section */
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for OP_load_data_address is misleading now that constants reside in .rodata; clarify it refers to a writable data symbol, not a constant.

Prompt for AI agents
Address the following comment on src/defs.h at line 265:

<comment>Comment for OP_load_data_address is misleading now that constants reside in .rodata; clarify it refers to a writable data symbol, not a constant.</comment>

<file context>
@@ -260,8 +261,9 @@ typedef enum {
-    OP_load_constant,     /* load constant */
-    OP_load_data_address, /* lookup address of a constant in data section */
+    OP_load_constant,       /* load constant */
+    OP_load_data_address,   /* lookup address of a constant in data section */
+    OP_load_rodata_address, /* lookup address of a constant in rodata section */
 
</file context>
Suggested change
OP_load_data_address, /* lookup address of a constant in data section */
OP_load_data_address, /* lookup address of a symbol in data section */
Fix with Cubic

bool is_func;
bool is_global;
bool address_taken; /* true if variable address was taken (&var) */
bool is_const_qualified; /* true if variable has const qualifier */
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar boolean names for different concepts can be confusing; distinguish qualifier-const from constant-value more clearly to avoid misuse.

Prompt for AI agents
Address the following comment on src/defs.h at line 358:

<comment>Similar boolean names for different concepts can be confusing; distinguish qualifier-const from constant-value more clearly to avoid misuse.</comment>

<file context>
@@ -353,11 +355,14 @@ struct var {
     bool is_func;
     bool is_global;
-    bool address_taken; /* true if variable address was taken (&amp;var) */
+    bool is_const_qualified; /* true if variable has const qualifier */
+    bool address_taken;      /* true if variable address was taken (&amp;var) */
     int array_size;
</file context>
Fix with Cubic

/* Hash table constants */
#define NUM_DIRECTIVES 11
#define NUM_KEYWORDS 16
#define NUM_KEYWORDS 17
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid hardcoding NUM_KEYWORDS; derive the keyword count from the keywords[] array to prevent desync that can cause out-of-bounds access or missing mappings.

Prompt for AI agents
Address the following comment on src/lexer.c at line 15:

<comment>Avoid hardcoding NUM_KEYWORDS; derive the keyword count from the keywords[] array to prevent desync that can cause out-of-bounds access or missing mappings.</comment>

<file context>
@@ -12,7 +12,7 @@
 /* Hash table constants */
 #define NUM_DIRECTIVES 11
-#define NUM_KEYWORDS 16
+#define NUM_KEYWORDS 17
 
 /* Token mapping structure for elegant initialization */
</file context>
Fix with Cubic

try_ 35 << EOF
int main() {
const int value = 35;
int *ptr = &value;
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning address of const int to int* discards qualifiers; use const int* to preserve const-correctness.

Prompt for AI agents
Address the following comment on tests/driver.sh at line 1829:

<comment>Assigning address of const int to int* discards qualifiers; use const int* to preserve const-correctness.</comment>

<file context>
@@ -1775,6 +1775,197 @@ int main()
+try_ 35 &lt;&lt; EOF
+int main() {
+    const int value = 35;
+    int *ptr = &amp;value;
+    return *ptr;
+}
</file context>
Fix with Cubic

}

mask = hash >> 31;
const int mask = hash >> 31;
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right shift of a negative signed int is implementation-defined; compute mask portably using a comparison.

Prompt for AI agents
Address the following comment on src/globals.c at line 382:

<comment>Right shift of a negative signed int is implementation-defined; compute mask portably using a comparison.</comment>

<file context>
@@ -368,14 +372,14 @@ void arena_free(arena_t *arena)
     }
 
-    mask = hash &gt;&gt; 31;
+    const int mask = hash &gt;&gt; 31;
     return ((hash ^ mask) - mask) &amp; (size - 1);
 }
</file context>
Suggested change
const int mask = hash >> 31;
const int mask = (hash < 0) ? -1 : 0;
Fix with Cubic

int hashmap_hash_index(int size, char *key)
{
int hash = 0x811c9dc5, mask;
int hash = 0x811c9dc5;
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning 0x811c9dc5 to a signed int is implementation-defined; use a representable signed literal to avoid overflow/portability issues.

Prompt for AI agents
Address the following comment on src/globals.c at line 375:

<comment>Assigning 0x811c9dc5 to a signed int is implementation-defined; use a representable signed literal to avoid overflow/portability issues.</comment>

<file context>
@@ -368,14 +372,14 @@ void arena_free(arena_t *arena)
 int hashmap_hash_index(int size, char *key)
 {
-    int hash = 0x811c9dc5, mask;
+    int hash = 0x811c9dc5;
 
     for (; *key; key++) {
</file context>
Fix with Cubic


elf_code = strbuf_create(MAX_CODE);
elf_data = strbuf_create(MAX_DATA);
elf_rodata = strbuf_create(MAX_DATA);
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elf_rodata is allocated but never freed in global_release; add strbuf_free(elf_rodata) alongside other section frees to avoid leaking this buffer.

Prompt for AI agents
Address the following comment on src/globals.c at line 1234:

<comment>elf_rodata is allocated but never freed in global_release; add strbuf_free(elf_rodata) alongside other section frees to avoid leaking this buffer.</comment>

<file context>
@@ -1227,10 +1231,12 @@ void global_init(void)
 
     elf_code = strbuf_create(MAX_CODE);
     elf_data = strbuf_create(MAX_DATA);
+    elf_rodata = strbuf_create(MAX_DATA);
     elf_header = strbuf_create(MAX_HEADER);
     elf_symtab = strbuf_create(MAX_SYMTAB);
</file context>
Fix with Cubic

elf_rodata->size; /* size in file - includes .rodata */
phdr.p_memsz = elf_code->size + elf_data->size + elf_rodata->size +
elf_bss_size; /* size in memory - includes .bss */
phdr.p_flags = 7; /* flags */
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single RWX PT_LOAD segment means .rodata isn’t actually read-only; split into RX and RW PT_LOAD segments to enforce protection.

Prompt for AI agents
Address the following comment on src/elf.c at line 190:

<comment>Single RWX PT_LOAD segment means .rodata isn’t actually read-only; split into RX and RW PT_LOAD segments to enforce protection.</comment>

<file context>
@@ -176,14 +179,16 @@ void elf_generate_header(void)
+                    elf_rodata-&gt;size; /* size in file - includes .rodata */
+    phdr.p_memsz = elf_code-&gt;size + elf_data-&gt;size + elf_rodata-&gt;size +
+                   elf_bss_size; /* size in memory - includes .bss */
+    phdr.p_flags = 7;            /* flags */
+    phdr.p_align = 4;            /* alignment */
     elf_write_blk(elf_header, &amp;phdr, sizeof(elf32_phdr_t));
</file context>
Fix with Cubic

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 18 files

React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.

/* Hash table constants */
#define NUM_DIRECTIVES 11
#define NUM_KEYWORDS 16
#define NUM_KEYWORDS 17
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual keyword count requires sync with keywords[]; mismatch risks OOB access during insertion and incorrect allocation. Derive the count from the array instead of a hardcoded macro.

Prompt for AI agents
Address the following comment on src/lexer.c at line 15:

<comment>Manual keyword count requires sync with keywords[]; mismatch risks OOB access during insertion and incorrect allocation. Derive the count from the array instead of a hardcoded macro.</comment>

<file context>
@@ -12,7 +12,7 @@
 /* Hash table constants */
 #define NUM_DIRECTIVES 11
-#define NUM_KEYWORDS 16
+#define NUM_KEYWORDS 17
 
 /* Token mapping structure for elegant initialization */
</file context>
Fix with Cubic

try_ 35 << EOF
int main() {
const int value = 35;
int *ptr = &value;
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning the address of a const int to an int* discards const; use const int* to preserve const correctness.

Prompt for AI agents
Address the following comment on tests/driver.sh at line 1829:

<comment>Assigning the address of a const int to an int* discards const; use const int* to preserve const correctness.</comment>

<file context>
@@ -1775,6 +1775,197 @@ int main()
+try_ 35 &lt;&lt; EOF
+int main() {
+    const int value = 35;
+    int *ptr = &amp;value;
+    return *ptr;
+}
</file context>
Suggested change
int *ptr = &value;
const int *ptr = &value;
Fix with Cubic


elf_code = strbuf_create(MAX_CODE);
elf_data = strbuf_create(MAX_DATA);
elf_rodata = strbuf_create(MAX_DATA);
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elf_rodata buffer is allocated but never freed in global_release, causing a memory leak; add strbuf_free(elf_rodata) alongside other frees.

Prompt for AI agents
Address the following comment on src/globals.c at line 1234:

<comment>elf_rodata buffer is allocated but never freed in global_release, causing a memory leak; add strbuf_free(elf_rodata) alongside other frees.</comment>

<file context>
@@ -1227,10 +1231,12 @@ void global_init(void)
 
     elf_code = strbuf_create(MAX_CODE);
     elf_data = strbuf_create(MAX_DATA);
+    elf_rodata = strbuf_create(MAX_DATA);
     elf_header = strbuf_create(MAX_HEADER);
     elf_symtab = strbuf_create(MAX_SYMTAB);
</file context>
Fix with Cubic

type = find_type(token, find_type_flag);
if (type) {
var = require_typed_var(parent, type);
var->is_const_qualified = is_const;
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Const qualifier on a struct global applies only to the first declarator; subsequent comma-separated variables are not marked const.

Prompt for AI agents
Address the following comment on src/parser.c at line 4157:

<comment>Const qualifier on a struct global applies only to the first declarator; subsequent comma-separated variables are not marked const.</comment>

<file context>
@@ -4127,6 +4154,7 @@ basic_block_t *read_body_statement(block_t *parent, basic_block_t *bb)
         type = find_type(token, find_type_flag);
         if (type) {
             var = require_typed_var(parent, type);
+            var-&gt;is_const_qualified = is_const;
             read_partial_var_decl(var, NULL);
             add_insn(parent, bb, OP_allocat, var, NULL, NULL, 0, NULL);
</file context>
Fix with Cubic

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 18 files

React with 👍 or 👎 to teach cubic. You can also tag @cubic-dev-ai to give feedback, ask questions, or re-run the review.

try_ 35 << EOF
int main() {
const int value = 35;
int *ptr = &value;
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning a const int’s address to an int* discards const; this should be a compile-time error. Use a const-qualified pointer instead.

Prompt for AI agents
Address the following comment on tests/driver.sh at line 1829:

<comment>Assigning a const int’s address to an int* discards const; this should be a compile-time error. Use a const-qualified pointer instead.</comment>

<file context>
@@ -1775,6 +1775,197 @@ int main()
+try_ 35 &lt;&lt; EOF
+int main() {
+    const int value = 35;
+    int *ptr = &amp;value;
+    return *ptr;
+}
</file context>
Suggested change
int *ptr = &value;
const int *ptr = &value;
Fix with Cubic


elf_code = strbuf_create(MAX_CODE);
elf_data = strbuf_create(MAX_DATA);
elf_rodata = strbuf_create(MAX_DATA);
Copy link

@cubic-dev-ai cubic-dev-ai bot Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elf_rodata is allocated but never freed in global_release, causing a memory leak.

Prompt for AI agents
Address the following comment on src/globals.c at line 1234:

<comment>elf_rodata is allocated but never freed in global_release, causing a memory leak.</comment>

<file context>
@@ -1227,10 +1231,12 @@ void global_init(void)
 
     elf_code = strbuf_create(MAX_CODE);
     elf_data = strbuf_create(MAX_DATA);
+    elf_rodata = strbuf_create(MAX_DATA);
     elf_header = strbuf_create(MAX_HEADER);
     elf_symtab = strbuf_create(MAX_SYMTAB);
</file context>
Fix with Cubic

The implementation allows shecc to parse and handle const qualifiers in
C code, improving type safety and C standard compliance.

The parser can now handle const qualifiers after pointer asterisks,
enabling declarations like 'int * const ptr' where the pointer itself
is const (but the pointed-to value can change).

Note: This is still partial support as patterns like 'int const *ptr'
are not yet handled. Full const qualifier support would require tracking
const-ness of pointers separately from const-ness of pointed-to data.
@jserv jserv force-pushed the improve-elf branch 3 times, most recently from 6c3ac14 to f6f2c57 Compare September 7, 2025 10:15
This commit adds infrastructure for .rodata (read-only data) and .bss
(uninitialized data) sections and implements OP_load_rodata_address for
addressing constants in .rodata. Then, it moves string literals from
.data to .rodata section for better memory protection.

TODO: implement optimizations for global variable initialization with
compile-time constants.
Add 20 test cases for const qualifier support covering:
- Basic const local and global variables
- Const function parameters
- Const pointer operations
- Const arrays and strings
- Mixed const/non-const scenarios
- Const with various C constructs
This removes unused 'func_t *func' declaration in code_generate function
that was causing compiler warnings. The variable is not needed as 'func'
is only used in emit_ph2_ir() where it has its own local declaration.
This updates snapshots to reflect the new ELF structure with .rodata and
.bss sections. The changes are due to the modified section layout and
new opcode for addressing constants in the read-only data section.
@jserv jserv merged commit 3bd86da into master Sep 7, 2025
12 checks passed
@jserv jserv deleted the improve-elf branch September 7, 2025 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants