[clang][ssaf] Add EntityId and EntityIdTable for efficient entity handling #1

jkorous-apple · 2025-11-22T00:52:31Z

Introduce EntityId and EntityIdTable to provide efficient, lightweight handles for working with EntityNames in the Scalable Static Analysis Framework (SSAF).

Introduces two key components:

EntityId: Lightweight opaque handle representing an entity in an EntityIdTable
EntityIdTable: Entity name interning table that maps unique EntityNames to EntityIds

The interning table ensures each EntityName maps to exactly one EntityId, providing fast equality comparisons and lookups. EntityIds are index-based and remain stable for the lifetime of their table. This enables efficient entity tracking and comparison operations needed for whole-program analysis.

steakhal · 2025-11-25T12:53:35Z

clang/include/clang/Analysis/Scalable/Model/EntityId.h

+/// EntityIds are created by EntityIdTable. Equality and ordering comparisons
+/// are well-defined for EntityIds created by the same EntityIdTable.


So, Entities also have their ordering. If I recall they are based of USR, Suffix, Namespace.
My point is that if I have EntityName A and B, and they are ordered A < B.
Does that also imply that EntityId of A and B are also ordered the same way?

I didn't consider this. Suppose the ordering is different, do you see that as a problem?

steakhal · 2025-11-25T13:00:05Z

clang/include/clang/Analysis/Scalable/Model/EntityIdTable.h

+class EntityIdTable {
+  std::set<EntityName> Entities;
+
+  std::vector<const EntityName*> IdToEntity;


This table owns these entities in the Entities std::set, which is a stable associative container.
Could we just hand out pointers to the elements instead of keeping a separate index -> pointer mapping in the form of IdToEntity vector?

This is an interesting question because an EntityId is basically as large as a pointer (due to the size_t member). Pointers also have natural EQ and LT operators, and their semantics are clear.

I prefer to not surface const EntityName* to the users as that can be misleading and it's hard to communicate that only entities from a given table can be compared against each other. But to avoid that createEntityId could just cast the pointer to size_t.

Another issue is deserialization. With the current design, if we take serialized EntityTable and deserialize entities in the same order they ware added, we keep the IDs valid and won't have to update references to EntityId-s in the summaries.

steakhal · 2025-11-25T13:12:26Z

clang/unittests/Analysis/Scalable/EntityIdTest.cpp

+  EXPECT_TRUE(Id1 < Id2 || Id2 < Id1);
+  EXPECT_TRUE(Id1 < Id3 || Id3 < Id1);
+  EXPECT_TRUE(Id2 < Id3 || Id3 < Id2);
+
+  // Transitivity: if a < b and b < c, then a < c
+  if (Id1 < Id2 && Id2 < Id3) {
+    EXPECT_TRUE(Id1 < Id3);
+  }


If feels arbitrary that we didn't want to express a specific ordering among Id1, Id2, Id3 - (except that they are inequality to each other), and here we suddenly only check transitivity if a very specific ordering applies, such as: Id1 < Id2 < Id3.

I'd suggest either pin a specific ordering in both cases or keep it relaxed too.

steakhal · 2025-11-25T13:13:44Z

clang/lib/Analysis/Scalable/Model/EntityIdTable.cpp

+void EntityIdTable::forEach(std::function<void(const EntityName&, EntityId)> Callback) const {
+  for (size_t Index = 0; Index < IdToEntity.size(); ++Index) {
+    EntityId EId(Index);
+    const EntityName& Name = *IdToEntity[Index];
+    Callback(Name, EId);
+  }
+}


I don't think we had a test for this API, while the rest was thoroughly tested.

@printf

… errors (llvm#169989) We can see the following while running clang-repl in C mode ``` anutosh491@vv-nuc:/build/anutosh491/llvm-project/build/bin$ ./clang-repl --Xcc=-x --Xcc=c --Xcc=-std=c23 clang-repl> printf("hi\n"); In file included from <<< inputs >>>:1: input_line_1:1:1: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 1 | printf("hi\n"); | ^ input_line_1:1:1: note: include the header <stdio.h> or explicitly provide a declaration for 'printf' error: Parsing failed. clang-repl> #include <stdio.h> hi ``` In debug mode while dumping the generated Module, i see this ``` clang-repl> printf("hi\n"); In file included from <<< inputs >>>:1: input_line_1:1:1: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 1 | printf("hi\n"); | ^ input_line_1:1:1: note: include the header <stdio.h> or explicitly provide a declaration for 'printf' error: Parsing failed. clang-repl> #include <stdio.h> === compile-ptu 1 === [TU=0x55556cfbf830, M=0x55556cfc13a0 (incr_module_1)] [LLVM IR] ; ModuleID = 'incr_module_1' source_filename = "incr_module_1" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" @.str = private unnamed_addr constant [4 x i8] c"hi\0A\00", align 1 @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @_GLOBAL__sub_I_incr_module_1, ptr null }] define internal void @__stmts__0() #0 { entry: %call = call i32 (ptr, ...) @printf(ptr noundef @.str) ret void } declare i32 @printf(ptr noundef, ...) #1 ; Function Attrs: noinline nounwind uwtable define internal void @_GLOBAL__sub_I_incr_module_1() swiftlang#2 section ".text.startup" { entry: call void @__stmts__0() ret void } attributes #0 = { "min-legal-vector-width"="0" } attributes #1 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" } attributes swiftlang#2 = { noinline nounwind uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" } !llvm.module.flags = !{!0, !1, !2, !3, !4} !llvm.ident = !{!5} !0 = !{i32 1, !"wchar_size", i32 4} !1 = !{i32 8, !"PIC Level", i32 2} !2 = !{i32 7, !"PIE Level", i32 2} !3 = !{i32 7, !"uwtable", i32 2} !4 = !{i32 7, !"frame-pointer", i32 2} !5 = !{!"clang version 22.0.0git (https://github.com/anutosh491/llvm-project.git 81ad8fb)"} === end compile-ptu === execute-ptu 1: [TU=0x55556cfbf830, M=0x55556cfc13a0 (incr_module_1)] hi ``` Basically I see that CodeGen emits IR for a cell before we know whether DiagnosticsEngine has an error. For C code like `printf("hi\n");` without <stdio.h>, Sema emits a diagnostic but still produces a "codegen-able" `TopLevelStmt`, so the `printf` call is IR-generated into the current module. Previously, when `Diags.hasErrorOccurred()` was true, we only cleaned up the PTU AST and left the CodeGen module untouched. The next successful cell then called `GenModule()`, which returned that same module (now also containing the next cell’s IR), causing side effects from the failed cell (e.g. printf)

…dling Introduce EntityId and EntityIdTable to provide efficient, lightweight handles for working with EntityNames in the Scalable Static Analysis Framework (SSAF). Introduces two key components: - EntityId: Lightweight opaque handle representing an entity in an EntityIdTable - EntityIdTable: Entity name interning table that maps unique EntityNames to EntityIds The interning table ensures each EntityName maps to exactly one EntityId, providing fast equality comparisons and lookups. EntityIds are index-based and remain stable for the lifetime of their table. This enables efficient entity tracking and comparison operations needed for whole-program analysis.

jkorous-apple · 2025-12-10T17:46:06Z

I didn't realize I can't retarget this PR to the upstream repository.
Let's continue here:
llvm#171660

steakhal reviewed Nov 25, 2025

View reviewed changes

jkorous-apple mentioned this pull request Dec 6, 2025

[clang][ssaf] Introduce entity abstraction for SSAF llvm/llvm-project#169131

Merged

jkorous-apple added 3 commits December 8, 2025 17:24

[clang][ssaf][NFC] Improve transitivity test for EntityId

6cbe27a

[clang][ssaf][NFC] Add unit tests for EntityTable::forEach

d88f692

jkorous-apple force-pushed the ssaf2 branch from 2928855 to d88f692 Compare December 9, 2025 01:53

jkorous-apple closed this Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[clang][ssaf] Add EntityId and EntityIdTable for efficient entity handling #1

[clang][ssaf] Add EntityId and EntityIdTable for efficient entity handling #1

Uh oh!

jkorous-apple commented Nov 22, 2025

Uh oh!

steakhal Nov 25, 2025

Uh oh!

jkorous-apple Dec 8, 2025

Uh oh!

steakhal Nov 25, 2025

Uh oh!

jkorous-apple Dec 9, 2025 •

edited

Loading

Uh oh!

steakhal Nov 25, 2025

Uh oh!

steakhal Nov 25, 2025

Uh oh!

jkorous-apple commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		/// EntityIds are created by EntityIdTable. Equality and ordering comparisons
		/// are well-defined for EntityIds created by the same EntityIdTable.

[clang][ssaf] Add EntityId and EntityIdTable for efficient entity handling #1

[clang][ssaf] Add EntityId and EntityIdTable for efficient entity handling #1

Uh oh!

Conversation

jkorous-apple commented Nov 22, 2025

Uh oh!

steakhal Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

jkorous-apple Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

steakhal Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

jkorous-apple Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steakhal Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

steakhal Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

jkorous-apple commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jkorous-apple Dec 9, 2025 •

edited

Loading