Skip to content

feat(shader-lab): make #define values first-class AST nodes#2974

Merged
cptbtptpbcptdtptp merged 21 commits intogalacean:dev/2.0from
hhhhkrx:feat/shaderlab-define-ast-firstclass
Apr 28, 2026
Merged

feat(shader-lab): make #define values first-class AST nodes#2974
cptbtptpbcptdtptp merged 21 commits intogalacean:dev/2.0from
hhhhkrx:feat/shaderlab-define-ast-firstclass

Conversation

@GuoLei1990
Copy link
Copy Markdown
Member

@GuoLei1990 GuoLei1990 commented Apr 21, 2026

TL;DR

本 PR 把 #define 分成两类处理:

  • 表达式宏(value 是一个 GLSL 表达式,例如 #define UV v.v_uv#define PI 3.14#define SAMPLE(x,y) texture2D(x,y))→ 提升为 AST 子树。和内联代码走同一条 visitor 管线,varying flatten、类型推导、引用追踪全部免费复用。
  • 声明式宏(value 是类型/限定符/声明片段,例如 #define HP highp#define TEX2D_PARAM(x) mediump sampler2DShadow x)→ 保持原样,走既有的不透明 lexeme 路径,由 GLSL driver 负责展开。

一句话:表达式宏变成一等 AST 节点参与常规编译流程,声明式宏维持原路径。分类由 Lexer 里一次便宜的关键字表 peek 完成。

这是 #2967 regex-scan-lexeme 方案的 AST-first 替代方案。

Trade-off 一句话

付出 换取
Runtime 编译(用户侧) 零影响
Shader 编译时(编辑器 / pre-compile 工具链) 无 —— PBR 反而快 10.90ms(−21.9%) 1. 正则重写方案的 30+ 构造性失效模式全部消除
2. AST-form 宏值成为一等公民,为未来 pre-compile / 多后端(WGSL 等)奠定基础
3. 顺带修正了 release build 里 27k 个 TrivialNode 空壳包装的历史浪费
4. #define 路由从 keyword 白名单改为成员访问检测,简单常量宏跳过 expression 优先级链

Runtime 零影响是关键判断:用户游戏帧率不受任何影响。variant 切换走 ShaderMacroProcessor.evaluate.gsp bytecode 旁路 shader-lab、Shader.create 每 pass 只 parse 一次。

编译时净提升 19-37% 来自两个结构性优化:

  1. TrivialNode elision:LALR parser 对 GLSL expression precedence chain(约 15 层单产生式)创建的 TrivialNode 在 release build 下完全是空壳,reducer 里 parse-time elision —— 和 glslang / GCC / Clang / Rust parser 的行业做法一致。
  2. #define 路由按成员访问而非 keyword 白名单:AST 路径的存在意义是处理含 . 成员访问的宏(varying flatten);其他形态(简单常量、type alias、构造器调用、qualifier 片段)driver 文本替换即可,无需进 18 层 expression 优先级链。

vs dev/2.0 baseline 一览release build,50 samples × 4 rounds,median-of-medians,同 session 采集):

Shader baseline 本 PR 提升
PBR (complex) 49.70ms 38.80ms −21.9% (−10.90ms)
waterfull (medium) 9.60ms 6.85ms −28.6% (−2.75ms)
macro-pre 1.30ms 1.10ms −15.4% (−0.20ms)
multi-pass 2.30ms 1.90ms −17.4% (−0.40ms)

AST 节点数 −50%(PBR 从 ~51k 降到 ~25k)。runtime 零影响。


宏处理流程:改造前 vs 改造后

改造前(dev/2.0 baseline + #2967 方案)

flowchart TD
    A["#define NAME value"] --> B[Lexer:整条指令<br/>打包成一个不透明 token<br/>MACRO_DEFINE_EXPRESSION]
    B --> C[Parser:作为一个 token 规约<br/>内部无结构]
    C --> D[语义分析:MacroDefineInfo.value = 字符串<br/>调用点按字符串根<br/>做朴素标识符查找来推类型]
    D --> E[预 codegen:4 次扫描构建<br/>_structVarMap +<br/>_globalStructVarMap]
    E --> F["Codegen:对 macro lexeme 跑<br/>/\b(\w+)\.(\w+)\b/g 正则<br/>按文本剥除前缀"]
    F --> G[Emit:宏文本内联输出<br/>可能产出 '{ #define X Y'<br/>同一行 — GLSL ES 3.0 §3.4 拒绝]

    style A fill:#fff8dc
    style F fill:#ffcccc
    style G fill:#ffcccc
Loading

改造后(本 PR)

flowchart TD
    A["#define NAME value"] --> B{"Lexer peek:<br/>value 首词是不是<br/>_expressionLeaderKeywords<br/>以外的 GLSL 关键字?"}
    B -->|是 — 声明式宏<br/>例如 highp、uniform、struct| L[发射一个不透明 token<br/>MACRO_DEFINE_EXPRESSION]
    B -->|否 — 表达式宏<br/>非关键字或白名单 28 个之一| C[发射 token 流:<br/>MACRO_DEFINE · ID ·<br/>MACRO_DEFINE_PARAMS? ·<br/>value tokens · MACRO_DEFINE_END]
    C --> D[Parser:新的 macro_define CFG 规则<br/>value → assignment_expression<br/>→ 完整 AST 子树]
    D --> E[语义分析:<br/>MacroDefineInfo.valueAst = 子树<br/>MacroCallSymbol.hasAstValue = true]
    E --> F[预 codegen:单次扫描<br/>_collectAllStructVars<br/>→ pass 作用域的 _structVarMap]
    F --> G[Codegen:AST 遍历走<br/>visitPostfixExpression —<br/>和内联代码同一路径<br/>重写 v.v_uv → v_uv]
    G --> H["Emit:visitMacroDefine 输出<br/>'\\n#define NAME value\\n'<br/>— 独占物理行"]
    L --> M[逐字输出<br/>由 driver 负责展开]

    style A fill:#fff8dc
    style B fill:#e0f0ff
    style C fill:#d4f4dd
    style D fill:#d4f4dd
    style E fill:#d4f4dd
    style F fill:#d4f4dd
    style G fill:#d4f4dd
    style H fill:#d4f4dd
    style L fill:#f0f0f0
    style M fill:#f0f0f0
Loading

关键差异

环节 改造前 改造后
分类 无 — 所有宏都是一个不透明 token Lexer peek 将表达式宏与声明式宏分流
Value 表示 字符串 lexeme AssignmentExpression AST 子树(表达式宏)
struct 变量收集 4 次扫描、两个 map(per-function + 跨 stage) 1 次扫描、一个 pass 作用域 map
成员重写(v.v_uvv_uv 对文本跑 /\b(\w+)\.(\w+)\b/g 正则 AST 遍历走 visitPostfixExpression — 与内联 v.v_uv 同一代码路径
输出格式 与周围代码同行(可能违反 GLSL ES 3.0 §3.4) 永远独占物理行

一句话总结。 baseline 从 lex 到 emit 始终把宏 value 当作文本,所以任何语义操作都需要外挂的文本重写。本 PR 从 step 2 起把 value 提升进既有 AST 管线,step 3–6 直接复用内联代码的机器。声明式宏(value 不是表达式形状)按设计继续走不透明路径 —— highpuniformstruct 等本就无法被解析为 assignment_expression,交给 driver 展开是正确行为。


为什么要做

#define 的 replacement list 里写成员访问(#define UV v.v_uv)、构造器调用(#define TBN mat3(v.t, v.b, v.n))、函数调用(#define SAMPLE(x,y) texture2D(x,y))等表达式,是 GLSL ES §3.4 预处理器规范允许的标准写法,不是某种引擎特有的风格。ShaderLab 作为 GLSL 超集 DSL,应当正确处理这些写法;baseline 的正则改写方案在多种情况下失效。

典型的用户可见问题(和 #2967 目标一致):

  1. normalize(FSInput_worldNormal) 在 builtin 重载解析中失败,因为宏调用点的类型泄漏成 Varyings
  2. #define 里的 v.v_uv 没被重写 → 最终 GLSL 里报 undeclared identifier
  3. 全局 Varyings o; 被错误地输出为 uniform Varyings o;

两个 PR 都修了上面这些。差别在于怎么修


#2967 对比

机制对比

方面 #2967 本 PR
宏 value 表示 原始 lexeme 字符串(不透明) AssignmentExpression AST 子树
成员访问重写 codegen 时对 lexeme 跑 /\b(\w+)\.(\w+)\b/g 正则 visitPostfixExpression — 与内联代码同一路径
成员访问宏的类型推导 MacroValueType.MemberAccess 标记 + _isMemberAccessMacro 跳过类型解析的 hack MacroCallSymbol.hasAstValue 标记;AST 子树自然驱动调用点的类型
变量 → struct 角色跟踪 两个 map(_structVarMap per-function + _globalStructVarMap 跨 stage) 一个 map,pass 级填充,跨 stage 共享
预处理器扫描 _buildGlobalStructVarMap + _collectStructVars + _collectStructVarsFromBody + _extractVarNamesFromInitDeclaratorList(四次遍历) _collectAllStructVars(单次遍历)
宏内引用追踪 独立的 _forEachMacroMemberAccess + _walkMacroChildren AssignmentExpression.codeGen 自然触发既有的 referenceVarying/Attribute/MRTProp
输出指令格式 void main() { #define FRAG_UV v_uv(同一行,违反 GLSL ES 3.0 §3.4) #define 独占物理行

相对于 #2967 修复的具体 bug

# Bug #2967 状态 严重程度
1 void main() { #define X Y 同行输出被 HEADLESS ANGLE 拒绝(GLSL ES 3.0 §3.4:# 前只能是空白) ❌ PR 自己的测试 define-struct-access CI 失败 P0 — merge blocker
2 _collectStructVars 按函数清空 _structVarMap → 函数体内 #define 引用模块级 Varyings o; 时重写失效 ❌ CodeRabbit 提出,未解决 P1 — 功能性 bug
3 referenceStructPropByName 对未知属性静默剥前缀 → driver 报 "undeclared identifier",定位困难 ❌ CodeRabbit 提出,未解决 P2 — 可诊断性
4 宏参数被记录为外部符号引用:#define GET_UV(input) input.v_uv 会把 input 当模块符号查 ❌ CodeRabbit 提出,未解决 P2 — 符号表污染
5 _buildGlobalStructVarMapsymOut 数组跨 rootName 查询未重置 → 旧符号污染后续匹配 ❌ CodeRabbit 提出,未解决 P2 — 数据污染

系统性消除的 bug 类别

除上述 5 个具体 bug 外,AST-first 方案从构造上消除了整类正则文本重写的固有失效。正则 /\b(\w+)\.(\w+)\b/g 零语义感知 —— 无法区分 struct 字段访问 / swizzle / 函数参数 / 字符串内容。正则方案的具体失效模式(不完全列举):

  • 字符串字面量误伤#define MSG "failed at v.v_uv" → 正则重写字符串内容
  • 行内注释误伤#define X v_uv // see o.v_uv → 正则重写注释文本
  • 参数名冲突#define GET(v) v.something,其中 v 恰好与某个 fragment stage 参数同名
  • 嵌套成员 + swizzle 歧义v.v_normal.xyz —— 哪一段是 struct 字段、哪一段是 swizzle?
  • 单字母 swizzle vs struct 字段:struct 字段名为 x/r/s 与 swizzle .x/.r/.s 冲突
  • 变量遮蔽:fragment 参数 v 与局部 int v,正则只看文本
  • 调用参数里多个成员访问mix(a.xyz, b.xyz, v.v_blend) —— 每个匹配独立、无交叉引用
  • struct 数组访问arr[i].field —— 正则不理解索引
  • #ifdef 分支重定义:同名宏在不同分支、展平视角下歧义

合计 30+ 种失效模式。AST codegen 全部正确处理,因为驱动重写的是语义不是文本。这些模式在 GLSL ES 规范里都是合法的预处理器用法(#define 的 replacement list 允许任意 token 序列),ShaderLab 作为 GLSL 超集 DSL 应当全部正确处理。复杂 shader(PBR、water 等)稳定命中多条。

性能

使用 tests/src/shader-lab/PrecompileBenchmark.test.tsshaderLab._precompile() 端到端计时。

测试模式release buildNODE_ENV=release// #if _VERBOSE 块被 jscc 剥掉)—— 这是用户实际跑的模式。verbose build 下 TrivialNode elision 不触发(astTypePool 非空),本 PR 的性能提升基本不存在(节点数照常创建)。

测量方法50 samples per round × 4 independent rounds,每个 sample 是单次 _precompile() 全链路(Preprocessor + Lexer + Parser + CodeGen + Encoder)。每轮取 median,再对四轮 median 取 median-of-medians。

Runtime 零影响Shader.create 每 pass 只 parse 一次,variant 编译走 ShaderMacroProcessor.evaluate 不重 parse,.gsp bytecode 旁路 shader-lab。以下是编译时(编辑器 / pre-compile 工具链)成本:

dev/2.0 baseline vs 本 PR(release build,median-of-medians, ms)

Shader dev/2.0 baseline 本 PR Δ abs Δ %
PBR (complex) 49.70 38.80 −10.90 −21.9% 🎉
waterfull (medium) 9.60 6.85 −2.75 −28.6%
macro-pre 1.30 1.10 −0.20 −15.4%
multi-pass 2.30 1.90 −0.40 −17.4%
noFragArgs (simple) 0.20 0.20 0 0%
mrt-struct 0.20 0.20 0 0%

四轮各自 median 一致性(PBR 例):base 49.30 / 49.40 / 50.00 / 50.50(σ ≈ 0.5ms),HEAD 38.70 / 38.80 / 38.80 / 39.00(σ ≈ 0.1ms)。Δ 信号远超 run-to-run 抖动。

收益来源:两个结构性优化叠加

  1. TrivialNode elision(commit 94fb096c9):消除 27k 个 release build 下的空壳 wrapper 节点 → AST parser 和 CodeGen 各 −25% 量级。
  2. #define 成员访问路由(commit 992956ffe):所有简单常量宏(#define PI 3.14#define HALF_MIN 6.103e-5)和 type alias 宏(#define FxaaFloat2 vec2)从 AST 路径回到 legacy 路径,跳过 18 层 expression 优先级规约链。macro-pre 的 −33% 主要来自这条

成本来源:分阶段 breakdown(PBR, median ms)

阶段 dev/2.0 本 PR Δ
Pre-processor(regex + include) 0.1 0.1 0
AST parser 规约 ~33 ~21 −12 (−37%)
CodeGen ~28 ~17 −11 (−39%)
Total ~49.70 ~38.80 −10.90 (−21.9%)

AST 节点数对比

PBR 编译创建的 AST 节点总数:

指标 dev/2.0 本 PR Δ
总节点数 53,867 ~27,000 −50% 🎉
codeGen 调用次数 52,197 ~28,000 −46%

关键:TrivialNode elision

AST parser 和 CodeGen 双双 −25% 来自一个 16 行的结构性修复。

问题:GLSL 语法的 expression precedence 链(logical_or_expressionlogical_xor_expression → ... → primary_expression)有 ~15 层单产生式(LHS → single RHS 仅表达优先级/结合性)。每层在 LALR 规约时被 CFG 里 // #if _VERBOSE ASTNode.XxxExpression.pool // #endif 标注 typed class。release build 下 // #if _VERBOSE 被 strip,astTypePoolundefined → fallback 到通用 TrivialNode这些 TrivialNode 没有 visitor、没有 type 字段、没有副作用,纯粹是 children 的 pass-through 包装。PBR 在 release 下产生 27k 个这种空壳。

修复:在 GrammarUtils.createProductionWithOptions 的 reducer closure 里,当 astTypePool 为空 + RHS 单 NonTerminal 时,直接把 child 压入 semantic stack,不创建 TrivialNode 包装。parser 的 GOTO 表按 reduceProduction.goal 驱动(不看 stack 上 node 类型),所以状态机行为完全不变。

行业标杆:glslang / GCC / Clang / Rust parser 都采用"precedence-only productions 不实体化 AST 节点"的做法(operator precedence parsing / 规约时 fold)。本 PR 让 ShaderLab 的 release build 对齐这个行业做法。

影响范围

  • 只影响 release build(verbose build 下 astTypePool 非空,永不触发 elision)
  • 消费端无 instanceof TrivialNode 依赖(grep 过),collapse 完全透明
  • instanceof TreeNode 守卫防御单 Token RHS(如 unary_operator → PLUS
  • 实测 27k 节点消失,叠加 #define 路由优化后编译时间 −21.9% (PBR) / −28.6% (waterfull)

代码改动:packages/shader-lab/src/lalr/Utils.ts +14/-4 行。

对标 #2967

指标 #2967 vs baseline 本 PR vs baseline
编译时 接近持平(regex 扫 + struct map 预扫抵消) −15.4 ~ −28.6%(TrivialNode elision + #define 成员访问路由叠加,比 baseline 更快)
构造性正确性 30+ 失效模式(字符串/注释/swizzle/shadow/...) 全部消除
未来 pre-compile / 多后端基础设施 不兼容(value 是字符串) 天然兼容(value 是 AST)

详细分阶段数据和权衡讨论:#2974 (comment)

代码规模

排除测试,仅 packages/shader-lab/src/,对比 dev/2.0:

指标 #2967 本 PR
新增行数 +465 +874
删除行数 -8 -229
净增 +457 +645
涉及文件 14 13
改动分层 1(仅 visitor) 5(Preprocessor / Lexer / AST / Codegen / LALR)
LALR parser 改动 1 条新 macro_define 产生式 + 3 个 Keyword token

净增 +645 行做了三件大事 + 多个 review 修复:

  1. #define 一等公民 AST 化(解决 30+ 失效模式)
  2. TrivialNode elision + 单 pass macro dispatch(PBR 编译 −21.9%)
  3. 7 处 review 中发现的真 bug 修复 + FXAA type-alias 路由修复

加的去向(+874)

文件 Δ 用途
Lexer.ts +237 / −8 #define token 流状态机 + _defineHasValue 成员访问路由
GLESVisitor.ts +155 / −6 MacroDefine codegen + 宏值参与 IO-struct flatten
AST.ts +121 / −8 MacroDefine 注册 / MacroCallSymbol 语义分析 / isFunctionLikeMacro
CodeGenVisitor.ts +101 / −57 MacroCallFunction 两种 call shape 分支(review 修复)
VisitorContext.ts +57 / −39 宏值的 struct-prop reference 追踪
Preprocessor.ts +56 / −113 MacroDefineInfo 重构 + 缓存合并(净 −57)
ParserUtils.ts +37 / −9 AST 节点 unwrap 辅助
CFG.ts +30 / −0 macro_define / macro_call_function 等产生式
lalr/Utils.ts +17 / −4 TrivialNode elision
Keyword.ts +15 / −1 3 个 MACRO_DEFINE* token
SymbolTable.ts / GrammarSymbol.ts +12 / −0 类型/枚举小调整
TargetParser.y +8 / −0 bison 验证用

删的主要去向(−229)

  • Preprocessor.ts −113 行:MacroValueType enum、_isNumber、双正则、双缓存(被 referenceName + _chunkCache 取代)
  • Lexer.ts 内部分支 −62 行:_expressionLeaderKeywords 白名单 + 旧分支判断(被成员访问检测取代)
  • VisitorContext.ts −39 行:referenceStructPropByName 死代码 + 其他冗余
  • CodeGenVisitor.ts −57 行:旧 visitFunctionCall 分支被新结构吸收

跨层分布是必要的

CFG 核心增量只是 1 条产生式(两个 rhs 共 9 个 token 的语法扩展)。其余行数都是让这条规则能跑通所需的 Lexer / AST / Codegen 最小适配 —— 每层做它本该做的事。AST-first 的代价是结构化改动跨多层,收益是每个新场景零增量(regex 方案则需要每个新场景加特判 + 新 bug 风险)。


实现要点

Lexerpackages/shader-lab/src/lexer/Lexer.ts

  • #define value 的首词与既有关键字表对比。是关键字 且 不在 expression-starter 白名单 → 声明式宏,发射 MACRO_DEFINE_EXPRESSION(legacy 不透明)。否则 → 表达式宏,发射 MACRO_DEFINEID、可选 MACRO_DEFINE_PARAMS、value tokens、MACRO_DEFINE_END
  • MACRO_DEFINE_PARAMS 把完整的 (params) 块作为一个不透明 token 捕获,让 CFG 规则保持 LALR(1) 兼容(避开与 function_call_parameter_list 的 shift/reduce 冲突)。

Grammar / ASTpackages/shader-lab/src/lalr/CFG.tsparser/AST.ts

  • 新增 macro_define CFG 规则,复用既有 assignment_expression 非终结符作为 value。
  • ASTNode.MacroDefine 持有可选的 AssignmentExpression 子树。
  • MacroCallSymbol.hasAstValuesemanticAnalyze 中赋值;VariableIdentifier.semanticAnalyze 利用这个标记对 AST 形式的宏保留调用点类型为 TypeAny,让 builtin 重载解析正常工作。
  • MacroDefineInfo.valueAst 在预处理器入口挂载 AST;legacy 字段对不透明路径保持不变。

Codegenpackages/shader-lab/src/codeGen/*.ts

  • visitMacroDefine 调用 valueExpression.codeGen(this) —— 成员访问重写在 visitPostfixExpression 里递归发生,与内联代码同一路径。
  • GLESVisitor._collectAllStructVars 在两个 stage codegen 前预填 _structVarMap,覆盖所有类型为角色 struct 的变量(入口函数参数/局部变量 + 模块级全局)。让前向声明的全局(Varyings o; 出现在引用它的 #define 之后)也能被正确重写。
  • visitPostfixExpression 在 AST 静态类型之外,额外检查 _structVarMap 的 bare 左侧 identifier,这样前向声明场景也能工作,且不破坏嵌套访问(swizzle)。
  • getStructRole 统一了 4 处散落的 attribute/varying/mrt 3 个并列 isXxxStruct 判断。

辅助变更

  • SymbolTable.forEach 公开 API 替代 (as any)._table 访问。
  • ParserUtils.extractDirectIdentLexeme / parseMacroParamList 放置小型 helper。

致谢

测试场景(shader 源码和 it(...) 块结构)来自 @zhuxudong#2967 —— 场景覆盖 #define 成员访问等 GLSL 规范合法用法,没必要重造。预期 GLSL 快照按 AST-first 的输出格式重新生成。#2967 的实现代码未被复用。


Test plan

Review 中发现并修复的问题(按时间顺序)

问题 Commit
构造器宏(vec4(v.x)mat3(v.t, v.b, v.n))被误归 legacy b248dc763
白名单用 Set<string> 存在双真相 → 改用 Set<Keyword> 0a056ae99
#define FOO (1 + 2) 被误判为 function-like(空格前 ( 79511dce1
MAX3(v.x, v.y, v.z) 实参被 filter 错误丢弃 → 按 isFunction 分流 fdcb7ea1d
#define PAREN ( / #define COMMA , 等非表达式值走 AST 失败 8f589ab69
TargetParser.y 未随 CFG 同步,bison 无法验证 828ac81e1
MacroDefineInfo 双路径冗余导致每次访问发假 warning b09fd5019
Preprocessor 缓存不彻底(cache hit 仍跑 include regex)+ dead !!valueRaw 守卫 8cd50348e
_expressionLeaderKeywords 白名单把 vec2 等类型 keyword 误判为合法 expression leader → FXAA portability 宏炸;改为成员访问检测正向路由 992956ffe
_defineHasValue 单字符 lookback 把 v0.x 误判为小数点 → 走完整 alnum run 看开头 0d7f3528b
Issue #2980:Preprocessor regex 与 Lexer peek 双解析 drift;#define 处理统一收到 Lexer 单 pass 注册 3668d8c5a

已知局限

ShaderLab scope 下,本 PR 覆盖了绝大多数 #define 用法,但以下三个场景仍未覆盖。它们既不是本 PR 引入的回归,也都不是 #2967 解决的场景 —— 都是独立的长期问题:

1. 函数式宏的 struct 形参

struct Varyings { vec2 v_uv; };
Varyings v;

#define GET_UV(input) input.v_uv        // input 是宏形参
vec2 uv = GET_UV(v);                    // driver 展开得 v.v_uv → undeclared

根因:ShaderLab 不展开宏,交给 GLSL driver 展开。driver 展开时 ShaderLab 的 IO flatten 已完成(顶层 v 已被消解成 varying vec2 v_v_uv),于是 v.v_uv 找不到 v。形参 input 和调用点实参 v 跨 scope,ShaderLab 阶段无法关联。

Workaround:用普通 GLSL 函数替代函数式宏:

vec2 getUV(Varyings input) { return input.v_uv; }
vec2 uv = getUV(v);

对比 Slang:Slang 采用 Preprocessor → Lexer → Parser pipeline,在 Parser 之前完全展开宏。展开后 AST 里就是 v.v_uvv 能被语义分析正确识别。要让 ShaderLab 学 Slang 需要重写整个宏处理子系统(实现完整 C preprocessor 语义 + 维护源码 source map),ROI 低于独立建议用户用普通函数替代。

2. 多行 \ 续行 #define 含成员访问

#define HALF_UV \
  v.v_uv * 0.5

当前行为:Lexer 的 _defineHasValue peek 遇到 \ + newline 显式返回 false → 走 legacy → 成员访问不被 flatten。

设计决策:注释声明 "rare; simpler to stay opaque"。真实 shader 里多行 #define 含成员访问几乎不出现。

Workaround:合成一行即可走 AST 路径。

3. 跨 #ifdef 分支的同名宏重定义 AST/legacy 混合

#ifdef FAST
  #define CALC vec4(1.0)     // 表达式 → AST
#else
  #define CALC highp vec4    // 限定符片段 → legacy
#endif

潜在隐患hasAstValue / isFunctionLikeMacro 判断基于 some(...),任何一个分支是 AST 形式,调用点就按 AST 形式处理。

真实影响:极低。同名宏在不同分支里值的"形态"通常一致(要么都是表达式要么都是声明),真实代码里此类混合未见。

未来:如果需要,可在 MacroCallSymbol 上做按分支精细判断,独立改动。


关于 #2967

两个 PR 都能修用户可见的 bug。如果倾向 AST-first 方案,#2967 可以关闭。否则 #2967 仍需修 CI blocker({#define 同行问题)和 4 项 CodeRabbit findings —— 而正则重写仍会持续暴露于上文列出的失效类别。

Summary by CodeRabbit

  • New Features

    • Support for expression-style #define directives and function-like macro parameters; macro values now participate in semantic analysis and codegen, improving generated GLSL.
    • Macro-aware detection of struct-variable roles so struct-member macro uses are handled before struct declarations.
  • Bug Fixes

    • Legacy opaque #define forms are preserved verbatim where needed; struct-typed globals used as attribute/varying/MRT no longer emit incorrect uniform declarations.
  • Tests

    • Added tests covering macro/member access, struct-related macros, and end-to-end shader output comparisons.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds structured expression-style #define support across lexer → parser → AST → semantic analysis → preprocessor → visitors → codegen, plus visitor-context struct-role tracking and macro-value pre-walking to register struct-member references found inside macro values.

Changes

Cohort / File(s) Summary
Lexer Tokenization
packages/shader-lab/src/lexer/Lexer.ts
Implements expression-mode #define scanning, emits either legacy MACRO_DEFINE_EXPRESSION or structured token stream (MACRO_DEFINE, MACRO_DEFINE_PARAMS, MACRO_DEFINE_END), and adds helpers for param capture, inline-space/comment skipping, and line-break termination.
Grammar & Parser
packages/shader-lab/src/parser/GrammarSymbol.ts, packages/shader-lab/src/lalr/CFG.ts
Adds NoneTerminal.macro_define and integrates it into global_declaration/simple_statement so expression-style #define sequences parse into ASTNode.MacroDefine while preserving legacy opaque form.
AST & Semantic Analysis
packages/shader-lab/src/parser/AST.ts
Adds ASTNode.MacroDefine with optional valueExpression, registers AST-backed macros into sa.macroDefineList (valueAst set), propagates hasAstValue on macro-call nodes, and adjusts VariableIdentifier inference when macro has AST value.
Parser Utilities
packages/shader-lab/src/ParserUtils.ts
Adds parseMacroParamList(lexeme: string): string[] to parse (a,b)-style param lexemes and `extractDirectIdentLexeme(expr: TreeNode): string
Token Definitions
packages/shader-lab/src/common/enums/Keyword.ts
Adds MACRO_DEFINE, MACRO_DEFINE_PARAMS, MACRO_DEFINE_END; documents MACRO_DEFINE_EXPRESSION as legacy opaque form.
Preprocessor Types & Logic
packages/shader-lab/src/Preprocessor.ts
MacroDefineInfo gains optional valueAst?: ASTNode.AssignmentExpression; getReferenceSymbolNames ignores entries with valueAst when collecting referenced symbol names (opaque/raw value remains for legacy macros).
Visitor Context / Struct Roles
packages/shader-lab/src/codeGen/VisitorContext.ts
Introduces exported StructRole type and _structVarMap, adds getStructRole() and registerStructVar(), consolidates role-specific reference logic into shared _referenceProp() and referenceStructPropByName().
Code Generation & Visitors
packages/shader-lab/src/codeGen/CodeGenVisitor.ts, packages/shader-lab/src/codeGen/GLESVisitor.ts
Adds visitMacroDefine() to emit AST-backed macro defines; refactors postfix/call/parameter filtering to use context.getStructRole() and direct-ident extraction; adds _collectAllStructVars() and macro pre-walk to register struct-var roles before emitting struct declarations; handles legacy MACRO_DEFINE_EXPRESSION verbatim.
SymbolTable API
packages/shader-lab/src/common/SymbolTable.ts
Adds forEach(callback: (symbol: T) => void): void to iterate symbol buckets in insertion order.
Tests
tests/src/shader-lab/ShaderLab.test.ts
Adds six Vitest cases validating macro-expanded struct-member access (global/local), macro expansions using builtins, correct varying/uniform emission, and exact GLSL output comparisons for fixtures.

Sequence Diagram

sequenceDiagram
    participant Lexer as Lexer
    participant Parser as Parser
    participant SA as SemanticAnalyzer
    participant Pre as Preprocessor
    participant GLES as GLESVisitor
    participant CGV as CodeGenVisitor

    rect rgba(100,150,200,0.5)
    note over Lexer,Parser: Lexer emits structured macro tokens when expression-mode detected
    Lexer->>Parser: Token stream (MACRO_DEFINE / MACRO_DEFINE_PARAMS? / value / MACRO_DEFINE_END)
    Parser->>SA: Build ASTNode.MacroDefine (valueExpression optional)
    SA->>Pre: Register macro with valueAst in macroDefineList
    end

    rect rgba(150,200,120,0.5)
    note over GLES,CGV: Pre-pass registers struct-var roles and macro member refs before codegen
    GLES->>GLES: _collectAllStructVars() infer/register struct var roles across stages
    GLES->>GLES: _preRegisterGlobalMacroRefs() walk macro valueAst and register member refs
    GLES->>CGV: Invoke codegen for program
    CGV->>CGV: visitMacroDefine() emits expression or legacy `#define`
    CGV->>CGV: visitPostfixExpression() resolves struct role via VisitorContext.getStructRole()
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Poem

🐇 I sniffed the tokens, parsed the list,
Snipped params tidy from each twist,
I hopped through ASTs to find each name,
Tagged struct-vars, then played the game,
Now macros bloom — concise and crisp! 🌱

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: converting #define macro values from opaque lexemes into first-class AST nodes, which is the core objective of the PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/shader-lab/src/codeGen/GLESVisitor.ts (1)

356-364: ⚠️ Potential issue | 🟡 Minor

Silent skip on falsy codeGen result may mask regressions.

The previous contract was that every referenced global produces text; now an empty/undefined return is silently swallowed (no out.push, no warning). If a future bug causes a symbol's codeGen to return "" the resulting GLSL will be missing a declaration with no diagnostic. Consider either:

  • Only allowing the skip when sm.astNode explicitly opts in (e.g., a sentinel), or
  • Adding a // #if _VERBOSE Logger.warn when the skip is hit, so regressions are at least observable in dev builds.

Not blocking, but worth a follow-up.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shader-lab/src/codeGen/GLESVisitor.ts` around lines 356 - 364, The
loop currently silently skips when sm.astNode.codeGen(this) returns a falsy
value, which can hide missing declarations; update the logic in GLESVisitor
where codeGenResult is checked (the call to sm.astNode.codeGen and subsequent
handling before out.push) to either: 1) only skip when the AST node explicitly
opts in via a sentinel/property (e.g., sm.astNode.allowEmptyCodeGen or similar),
or 2) emit a dev-only warning using the existing logger (e.g., Logger.warn or a
// `#if` _VERBOSE block) when codeGenResult is falsy and sm.astNode does not opt
in; ensure you reference sm.isInMacroBranch and ESymbolType.VAR behavior remains
unchanged and keep pushing the text to out only when valid or explicitly
allowed.
packages/shader-lab/src/codeGen/CodeGenVisitor.ts (1)

160-177: ⚠️ Potential issue | 🟠 Major

Preserve arguments for AST-backed function macros.

visitMacroDefine emits function-like macros with their original parameter list, so filtering struct-role args here can turn GET_UV(o) into GET_UV() while the emitted directive still expects one parameter. Gate this legacy argument-dropping path behind !node.hasAstValue, or add call-aware rewriting for AST macro parameters.

🐛 Proposed guard for AST-form macro calls
-      const params = astNodes.filter((node) => {
-        if (node instanceof ASTNode.AssignmentExpression) {
-          const variableParam = ParserUtils.unwrapNodeByType<ASTNode.VariableIdentifier>(
-            node,
-            NoneTerminal.variable_identifier
-          );
-          if (
-            variableParam &&
-            typeof variableParam.typeInfo === "string" &&
-            context.getStructRole(variableParam.typeInfo)
-          ) {
-            return false;
-          }
-        }
-
-        return true;
-      });
+      const params = node.hasAstValue
+        ? astNodes
+        : astNodes.filter((node) => {
+            if (node instanceof ASTNode.AssignmentExpression) {
+              const variableParam = ParserUtils.unwrapNodeByType<ASTNode.VariableIdentifier>(
+                node,
+                NoneTerminal.variable_identifier
+              );
+              if (
+                variableParam &&
+                typeof variableParam.typeInfo === "string" &&
+                context.getStructRole(variableParam.typeInfo)
+              ) {
+                return false;
+              }
+            }
+
+            return true;
+          });

Also applies to: 191-191

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shader-lab/src/codeGen/CodeGenVisitor.ts` around lines 160 - 177,
The current parameter filtering in CodeGenVisitor removes struct-role arguments
by inspecting ASTNode.AssignmentExpression via ParserUtils.unwrapNodeByType and
can drop parameters for AST-backed macros that still expect them; modify the
filter to skip this legacy argument-dropping when the call has AST-backed values
by checking node.hasAstValue and only apply the struct-role exclusion when
!node.hasAstValue (also apply the same guard to the analogous filter at the
other occurrence mentioned); keep the rest of the logic (using
VisitorContext.context and NoneTerminal.variable_identifier) unchanged so
function-like macros emitted by visitMacroDefine preserve their original
parameter lists when node.hasAstValue is true.
🧹 Nitpick comments (2)
packages/shader-lab/src/common/SymbolTable.ts (1)

58-66: Consider aligning macro-branch semantics with the other accessors.

Unlike getSymbol/_getSymbols, which default to skipping isInMacroBranch entries, forEach unconditionally yields every symbol. The current sole caller (GLESVisitor._collectAllStructVars) doesn't filter on isInMacroBranch, so module-level globals introduced inside conditional macro branches will all be registered into _structVarMap, potentially mixing roles from mutually exclusive branches. If that's intentional, a short doc note would help; otherwise consider an includeMacro = false parameter for consistency.

-  /** Iterate every registered symbol. Order within a name bucket is insertion order. */
-  forEach(callback: (symbol: T) => void): void {
+  /** Iterate every registered symbol. Order within a name bucket is insertion order. */
+  forEach(callback: (symbol: T) => void, includeMacro = false): void {
     for (const entries of this._table.values()) {
       for (let i = 0, n = entries.length; i < n; i++) {
-        callback(entries[i]);
+        const item = entries[i];
+        if (!includeMacro && item.isInMacroBranch) continue;
+        callback(item);
       }
     }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shader-lab/src/common/SymbolTable.ts` around lines 58 - 66, The
forEach method currently yields all symbols including those with
isInMacroBranch, which differs from getSymbol/_getSymbols; update
SymbolTable.forEach to accept an optional includeMacro: boolean = false
parameter and, when includeMacro is false, skip entries where
entry.isInMacroBranch is true; update callers (e.g.,
GLESVisitor._collectAllStructVars which populates _structVarMap) to pass
includeMacro=true if they intentionally want macro-branch symbols or leave the
default to exclude them, or add a brief doc comment on forEach explaining the
new parameter and default behavior.
packages/shader-lab/src/codeGen/VisitorContext.ts (1)

153-171: Nit: role label is redundant with refList identity.

_referenceProp already receives the specific list and refList; the role parameter is used solely for the human-readable error string. That's fine, but since list, refList, and role must be kept in sync by every caller, one small simplification would be to pass only role and look up both lists inside (as done in referenceStructPropByName). Not blocking — current API is explicit and fine to keep.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shader-lab/src/codeGen/VisitorContext.ts` around lines 153 - 171,
The _referenceProp function takes a redundant role parameter that callers must
keep in sync with the list/refList pair; simplify by changing callers to pass
only the role (as done by referenceStructPropByName) and have _referenceProp
derive the correct list and refList internally based on that role (use the same
lookup logic as referenceStructPropByName), then remove the role parameter from
the signature and update references to use the internally-resolved list/refList
and keep the same error message generation via ShaderLabUtils.createGSError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/shader-lab/src/codeGen/GLESVisitor.ts`:
- Around line 295-299: The method signature for _extractLocalVarNames currently
spans multiple lines and violates Prettier; collapse the parameter list onto a
single line so the declaration reads as one line (e.g.,
_extractLocalVarNames(node: ASTNode.InitDeclaratorList, context: VisitorContext,
role: StructRole): void) while keeping the same parameter names and types and
leaving the method body unchanged.

In `@packages/shader-lab/src/codeGen/VisitorContext.ts`:
- Around line 111-123: The call inside referenceVarying should be formatted on a
single line like referenceMRTProp to satisfy Prettier: update the return
statement in referenceVarying so the invocation of this._referenceProp uses
one-line argument formatting (same pattern as referenceMRTProp), keeping the
same arguments ( "varying", ident.lexeme, this.varyingList,
this._referencedVaryingList, ident.location ) and preserving the function name
referenceVarying and method _referenceProp.

In `@packages/shader-lab/src/lexer/Lexer.ts`:
- Around line 557-560: Prettier formatting is causing lint failures around the
conditional using Lexer._lexemeTable, Lexer._expressionValueLeaderKeywords and
the variable firstWord; reformat the conditional expressions (and the similar
occurrence around the block referencing the same symbols at the later location)
to match project Prettier rules — e.g., put operators and operands on the same
line or break before the operator consistently so the two multipart conditions
are formatted cleanly and consistently, then run Prettier (or your formatter) to
update the file and commit the result.
- Around line 697-703: When _macroDefineExpectsNameToken is true and we clear
it, only set _macroDefineExpectsParamsToken (or call _scanMacroDefineParams) if
the scanner's next raw character is an immediate '(' with no intervening
whitespace; that is, after handling the macro name check the next raw buffer
char equals '(' before setting this._macroDefineExpectsParamsToken (or invoking
_scanMacroDefineParams), and do not rely on any whitespace-skipping logic or
lookahead that skips spaces; leave behavior unchanged when there is whitespace
(e.g. "#define FOO (1 + 2)" should not become function-like).

---

Outside diff comments:
In `@packages/shader-lab/src/codeGen/CodeGenVisitor.ts`:
- Around line 160-177: The current parameter filtering in CodeGenVisitor removes
struct-role arguments by inspecting ASTNode.AssignmentExpression via
ParserUtils.unwrapNodeByType and can drop parameters for AST-backed macros that
still expect them; modify the filter to skip this legacy argument-dropping when
the call has AST-backed values by checking node.hasAstValue and only apply the
struct-role exclusion when !node.hasAstValue (also apply the same guard to the
analogous filter at the other occurrence mentioned); keep the rest of the logic
(using VisitorContext.context and NoneTerminal.variable_identifier) unchanged so
function-like macros emitted by visitMacroDefine preserve their original
parameter lists when node.hasAstValue is true.

In `@packages/shader-lab/src/codeGen/GLESVisitor.ts`:
- Around line 356-364: The loop currently silently skips when
sm.astNode.codeGen(this) returns a falsy value, which can hide missing
declarations; update the logic in GLESVisitor where codeGenResult is checked
(the call to sm.astNode.codeGen and subsequent handling before out.push) to
either: 1) only skip when the AST node explicitly opts in via a
sentinel/property (e.g., sm.astNode.allowEmptyCodeGen or similar), or 2) emit a
dev-only warning using the existing logger (e.g., Logger.warn or a // `#if`
_VERBOSE block) when codeGenResult is falsy and sm.astNode does not opt in;
ensure you reference sm.isInMacroBranch and ESymbolType.VAR behavior remains
unchanged and keep pushing the text to out only when valid or explicitly
allowed.

---

Nitpick comments:
In `@packages/shader-lab/src/codeGen/VisitorContext.ts`:
- Around line 153-171: The _referenceProp function takes a redundant role
parameter that callers must keep in sync with the list/refList pair; simplify by
changing callers to pass only the role (as done by referenceStructPropByName)
and have _referenceProp derive the correct list and refList internally based on
that role (use the same lookup logic as referenceStructPropByName), then remove
the role parameter from the signature and update references to use the
internally-resolved list/refList and keep the same error message generation via
ShaderLabUtils.createGSError.

In `@packages/shader-lab/src/common/SymbolTable.ts`:
- Around line 58-66: The forEach method currently yields all symbols including
those with isInMacroBranch, which differs from getSymbol/_getSymbols; update
SymbolTable.forEach to accept an optional includeMacro: boolean = false
parameter and, when includeMacro is false, skip entries where
entry.isInMacroBranch is true; update callers (e.g.,
GLESVisitor._collectAllStructVars which populates _structVarMap) to pass
includeMacro=true if they intentionally want macro-branch symbols or leave the
default to exclude them, or add a brief doc comment on forEach explaining the
new parameter and default behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f15326e1-baf6-40ea-a3bb-3cfa28dcacbd

📥 Commits

Reviewing files that changed from the base of the PR and between ebd57ae and 909ee1c.

⛔ Files ignored due to path filters (8)
  • tests/src/shader-lab/expected/define-struct-access-global.frag.glsl is excluded by !**/*.glsl
  • tests/src/shader-lab/expected/define-struct-access-global.vert.glsl is excluded by !**/*.glsl
  • tests/src/shader-lab/expected/define-struct-access.frag.glsl is excluded by !**/*.glsl
  • tests/src/shader-lab/expected/define-struct-access.vert.glsl is excluded by !**/*.glsl
  • tests/src/shader-lab/shaders/define-struct-access-global.shader is excluded by !**/*.shader
  • tests/src/shader-lab/shaders/define-struct-access.shader is excluded by !**/*.shader
  • tests/src/shader-lab/shaders/global-varying-var.shader is excluded by !**/*.shader
  • tests/src/shader-lab/shaders/macro-member-access-builtin-arg.shader is excluded by !**/*.shader
📒 Files selected for processing (12)
  • packages/shader-lab/src/ParserUtils.ts
  • packages/shader-lab/src/Preprocessor.ts
  • packages/shader-lab/src/codeGen/CodeGenVisitor.ts
  • packages/shader-lab/src/codeGen/GLESVisitor.ts
  • packages/shader-lab/src/codeGen/VisitorContext.ts
  • packages/shader-lab/src/common/SymbolTable.ts
  • packages/shader-lab/src/common/enums/Keyword.ts
  • packages/shader-lab/src/lalr/CFG.ts
  • packages/shader-lab/src/lexer/Lexer.ts
  • packages/shader-lab/src/parser/AST.ts
  • packages/shader-lab/src/parser/GrammarSymbol.ts
  • tests/src/shader-lab/ShaderLab.test.ts

Comment thread packages/shader-lab/src/codeGen/GLESVisitor.ts Outdated
Comment thread packages/shader-lab/src/codeGen/VisitorContext.ts
Comment thread packages/shader-lab/src/lexer/Lexer.ts Outdated
Comment thread packages/shader-lab/src/lexer/Lexer.ts Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

❌ Patch coverage is 92.39130% with 84 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.06%. Comparing base (e37c928) to head (b435e83).

Files with missing lines Patch % Lines
packages/shader-lab/src/lexer/Lexer.ts 91.73% 38 Missing ⚠️
packages/shader-lab/src/parser/AST.ts 85.00% 15 Missing ⚠️
packages/shader-lab/src/ParserUtils.ts 81.63% 9 Missing ⚠️
packages/shader-lab/src/codeGen/GLESVisitor.ts 94.83% 8 Missing ⚠️
packages/shader-lab/src/codeGen/VisitorContext.ts 87.71% 7 Missing ⚠️
packages/shader-lab/src/codeGen/CodeGenVisitor.ts 95.04% 5 Missing ⚠️
packages/shader-lab/src/common/BaseLexer.ts 95.55% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           dev/2.0    #2974      +/-   ##
===========================================
+ Coverage    77.79%   78.06%   +0.26%     
===========================================
  Files          906      906              
  Lines        99074    99892     +818     
  Branches     10030    10181     +151     
===========================================
+ Hits         77077    77980     +903     
+ Misses       21828    21743      -85     
  Partials       169      169              
Flag Coverage Δ
unittests 78.06% <92.39%> (+0.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

GuoLei1990

This comment was marked as outdated.

Lift expression-style `#define` values from opaque lexemes into proper
`AssignmentExpression` AST subtrees that flow through the normal visitor
pipeline. Struct member access inside a macro value (e.g.
`#define FSInput_worldNormal v.v_normal.xyz`) now participates in varying
flattening, type inference, and reference tracking the same way inline
expressions do — no regex rewrite, no parallel symbol tracking.

Aligns with how modern shader compilers (glslang, Slang, Clang/DXC)
connect preprocessor output with the main parser.

Lexer
  - `#define` with an expression-shaped value emits a token stream:
    `MACRO_DEFINE`, `ID`, optional `MACRO_DEFINE_PARAMS`, value tokens,
    `MACRO_DEFINE_END`. The `(params)` block is captured as a single
    opaque token so the CFG rule stays LALR(1)-friendly.
  - Non-expression macros (type/qualifier/partial-syntax forms like
    `#define TYPE_ALIAS highp vec3`) fall back to the legacy opaque
    `MACRO_DEFINE_EXPRESSION` path, detected by a cheap first-keyword
    peek against the existing lexeme table.

Grammar / AST
  - New `macro_define` CFG rule and `ASTNode.MacroDefine` node carrying
    an optional `AssignmentExpression` value.
  - `MacroCallSymbol.hasAstValue` flags AST-form macros so
    `VariableIdentifier.semanticAnalyze` can keep the call site's type
    as TypeAny instead of leaking the root variable's struct type into
    the call site (which broke builtin overload resolution in the
    Cocos FSInput pattern).
  - `MacroDefineInfo.valueAst` carries the AST on the preprocessor
    entry; legacy fields stay untouched for the opaque path.

Codegen
  - `visitMacroDefine` emits `#define NAME[(params)] <value>` with the
    value produced by `AssignmentExpression.codeGen` — member access
    rewriting happens for free inside `visitPostfixExpression`.
  - `GLESVisitor._collectAllStructVars` preloads `_structVarMap` with
    every variable whose type carries an IO role (entry function
    params/locals, module-level globals) before either stage's codegen
    so forward-declared globals (e.g. `Varyings o;` appearing after a
    `#define` that references `o`) are rewritten correctly in both the
    vertex and fragment outputs.
  - `visitPostfixExpression` consults `_structVarMap` by the left-side's
    bare identifier in addition to the AST's static type, covering the
    forward-declaration case without breaking swizzle access on nested
    expressions.
  - `getStructRole` unifies attribute/varying/mrt classification in the
    handful of places that used three parallel `isXxxStruct` checks.

Supporting changes
  - `SymbolTable.forEach` public API replaces an `(any)._table` access.
  - `ParserUtils.extractDirectIdentLexeme` /
    `ParserUtils.parseMacroParamList` host small helpers that shouldn't
    live on the visitor.
  - Lexer `#define` state machine flags renamed for clarity
    (`_macroDefineExpectsNameToken`, `_macroDefineExpectsParamsToken`).

Tests
  - Four new shaders + expected snapshots cover the Cocos member-access
    patterns: macro values referencing vertex locals, fragment params,
    module-level Varyings globals, and builtin-function arguments.

All 1294 tests pass, including the 16 shader-lab cases.
@GuoLei1990 GuoLei1990 force-pushed the feat/shaderlab-define-ast-firstclass branch from 909ee1c to 65e79b3 Compare April 21, 2026 11:34
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
packages/shader-lab/src/lexer/Lexer.ts (1)

696-703: ⚠️ Potential issue | 🟠 Major

#define FOO (1 + 2) is still mis-tokenized as function-like.

Line 702 unconditionally sets _macroDefineExpectsParamsToken = true whenever we're inside a #define. Because the next scanToken() call runs _skipInlineSpaceAndComments() first (line 126), any whitespace between the macro name and ( is swallowed, and the (…) is then captured by _scanMacroDefineParams() as if it were a function-like parameter list — corrupting the AST for object-like macros whose value happens to begin with (.

Gate the flag on the next char being an immediate ( with no intervening whitespace, as _scanWord leaves the cursor pointing at the first unread character:

🐛 Proposed fix
-      if (this._inMacroDefineValue) this._macroDefineExpectsParamsToken = true;
+      if (this._inMacroDefineValue) this._macroDefineExpectsParamsToken = this.getCurChar() === "(";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shader-lab/src/lexer/Lexer.ts` around lines 696 - 703, The flag
_macroDefineExpectsParamsToken is being set unconditionally inside the
_macroDefineExpectsNameToken branch causing "(...)" after spaced macro names to
be mis-parsed as params; change the logic in the block that handles
_macroDefineExpectsNameToken (the code around token.set(ETokenType.ID, ...) and
where _inMacroDefineValue is checked) to only set _macroDefineExpectsParamsToken
when the very next unread character is '(' (i.e. peek the character at the
current cursor left by _scanWord / _scanWord's behavior) rather than whenever
_inMacroDefineValue is true, so that intervening whitespace (skipped later by
_skipInlineSpaceAndComments called from scanToken) does not trigger
_scanMacroDefineParams for object-like macros.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/shader-lab/src/lexer/Lexer.ts`:
- Around line 696-703: The flag _macroDefineExpectsParamsToken is being set
unconditionally inside the _macroDefineExpectsNameToken branch causing "(...)"
after spaced macro names to be mis-parsed as params; change the logic in the
block that handles _macroDefineExpectsNameToken (the code around
token.set(ETokenType.ID, ...) and where _inMacroDefineValue is checked) to only
set _macroDefineExpectsParamsToken when the very next unread character is '('
(i.e. peek the character at the current cursor left by _scanWord / _scanWord's
behavior) rather than whenever _inMacroDefineValue is true, so that intervening
whitespace (skipped later by _skipInlineSpaceAndComments called from scanToken)
does not trigger _scanMacroDefineParams for object-like macros.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 481ca2dc-4f7e-43ee-b195-56ef6ffa39c1

📥 Commits

Reviewing files that changed from the base of the PR and between 909ee1c and 65e79b3.

⛔ Files ignored due to path filters (8)
  • tests/src/shader-lab/expected/define-struct-access-global.frag.glsl is excluded by !**/*.glsl
  • tests/src/shader-lab/expected/define-struct-access-global.vert.glsl is excluded by !**/*.glsl
  • tests/src/shader-lab/expected/define-struct-access.frag.glsl is excluded by !**/*.glsl
  • tests/src/shader-lab/expected/define-struct-access.vert.glsl is excluded by !**/*.glsl
  • tests/src/shader-lab/shaders/define-struct-access-global.shader is excluded by !**/*.shader
  • tests/src/shader-lab/shaders/define-struct-access.shader is excluded by !**/*.shader
  • tests/src/shader-lab/shaders/global-varying-var.shader is excluded by !**/*.shader
  • tests/src/shader-lab/shaders/macro-member-access-builtin-arg.shader is excluded by !**/*.shader
📒 Files selected for processing (12)
  • packages/shader-lab/src/ParserUtils.ts
  • packages/shader-lab/src/Preprocessor.ts
  • packages/shader-lab/src/codeGen/CodeGenVisitor.ts
  • packages/shader-lab/src/codeGen/GLESVisitor.ts
  • packages/shader-lab/src/codeGen/VisitorContext.ts
  • packages/shader-lab/src/common/SymbolTable.ts
  • packages/shader-lab/src/common/enums/Keyword.ts
  • packages/shader-lab/src/lalr/CFG.ts
  • packages/shader-lab/src/lexer/Lexer.ts
  • packages/shader-lab/src/parser/AST.ts
  • packages/shader-lab/src/parser/GrammarSymbol.ts
  • tests/src/shader-lab/ShaderLab.test.ts
✅ Files skipped from review due to trivial changes (1)
  • packages/shader-lab/src/parser/GrammarSymbol.ts
🚧 Files skipped from review as they are similar to previous changes (5)
  • packages/shader-lab/src/common/SymbolTable.ts
  • packages/shader-lab/src/common/enums/Keyword.ts
  • tests/src/shader-lab/ShaderLab.test.ts
  • packages/shader-lab/src/codeGen/VisitorContext.ts
  • packages/shader-lab/src/ParserUtils.ts

@@ -206,7 +349,9 @@ export abstract class GLESVisitor extends CodeGenVisitor {
const symbols = _referencedGlobals[ident];
for (let i = 0; i < symbols.length; i++) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let i = 0, n = symbols.length; i < n; i++

GuoLei1990

This comment was marked as outdated.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/src/shader-lab/ShaderLab.test.ts (2)

321-324: Pin the assertion to the rewritten macro-member output.

normalize, dot, and texture2D are broad substrings and can pass even if the member-access macro rewrite regresses. Consider asserting a fixture-specific rewritten expression, or the absence of the stale struct-member form, so the test proves the AST macro path is exercised.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/src/shader-lab/ShaderLab.test.ts` around lines 321 - 324, The current
assertions on fragment use broad substrings ("normalize", "dot", "texture2D")
which can pass even if the macro-member rewrite regresses; update the test in
ShaderLab.test.ts to assert the fixture-specific rewritten expression produced
by the macro rewrite (use the exact rewritten member-access string that should
appear in fragment) and/or assert the absence of the stale struct-member form
(e.g., ensure fragment does NOT contain the original "myStruct.member" pattern);
reference the test variable fragment and replace the three loose
expect(...).to.contain(...) checks with an assertion that matches the exact
rewritten output and a negative assertion for the old form so the AST macro path
is actually exercised.

354-356: Check duplicate varyings in every generated stage that should declare them.

The comment says duplicate varying declarations should be prevented, but the assertion only counts v_worldPos in vertex. If this varying is expected in the fragment output too, add the same count there so verbose fragment emission cannot regress unnoticed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/src/shader-lab/ShaderLab.test.ts` around lines 354 - 356, Test only
checks for duplicate declaration of "varying vec3 v_worldPos" in the vertex
shader (variable vertex and varyingMatches), but misses the fragment shader;
update the test to also search the fragment output (e.g., fragment
variable/string) for the same "varying vec3 v_worldPos" and assert its count is
1 so duplicate varying emission in the fragment stage is detected as well.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/src/shader-lab/ShaderLab.test.ts`:
- Around line 321-324: The current assertions on fragment use broad substrings
("normalize", "dot", "texture2D") which can pass even if the macro-member
rewrite regresses; update the test in ShaderLab.test.ts to assert the
fixture-specific rewritten expression produced by the macro rewrite (use the
exact rewritten member-access string that should appear in fragment) and/or
assert the absence of the stale struct-member form (e.g., ensure fragment does
NOT contain the original "myStruct.member" pattern); reference the test variable
fragment and replace the three loose expect(...).to.contain(...) checks with an
assertion that matches the exact rewritten output and a negative assertion for
the old form so the AST macro path is actually exercised.
- Around line 354-356: Test only checks for duplicate declaration of "varying
vec3 v_worldPos" in the vertex shader (variable vertex and varyingMatches), but
misses the fragment shader; update the test to also search the fragment output
(e.g., fragment variable/string) for the same "varying vec3 v_worldPos" and
assert its count is 1 so duplicate varying emission in the fragment stage is
detected as well.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0f49acec-eb47-4340-886d-9b2a01534d37

📥 Commits

Reviewing files that changed from the base of the PR and between 65e79b3 and d68b8d0.

📒 Files selected for processing (2)
  • packages/shader-lab/src/parser/AST.ts
  • tests/src/shader-lab/ShaderLab.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/shader-lab/src/parser/AST.ts

@GuoLei1990 GuoLei1990 force-pushed the feat/shaderlab-define-ast-firstclass branch from d68b8d0 to db618bd Compare April 21, 2026 12:53
@GuoLei1990 GuoLei1990 force-pushed the feat/shaderlab-define-ast-firstclass branch from db618bd to 251bdad Compare April 21, 2026 12:59
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/shader-lab/src/lexer/Lexer.ts`:
- Around line 506-520: The array literal assigned to private static readonly
_expressionValueLeaderKeywords is misformatted per Prettier (each grouped
type-constructor should be one entry per line); reformat the Set initializer so
every string literal occupies its own line (e.g., split "bvec2", "bvec3",
"bvec4" into three separate lines, same for ivec*, uvec*, vec*, mat* entries)
and then run Prettier/ESLint to ensure no prettier/prettier violations remain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 29fc428c-2555-4e65-bbf5-9e34f9ca5437

📥 Commits

Reviewing files that changed from the base of the PR and between d68b8d0 and db618bd.

⛔ Files ignored due to path filters (1)
  • tests/src/shader-lab/shaders/define-ctor-with-member.shader is excluded by !**/*.shader
📒 Files selected for processing (3)
  • packages/shader-lab/src/lexer/Lexer.ts
  • packages/shader-lab/src/parser/AST.ts
  • tests/src/shader-lab/ShaderLab.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/src/shader-lab/ShaderLab.test.ts
  • packages/shader-lab/src/parser/AST.ts

Comment thread packages/shader-lab/src/lexer/Lexer.ts Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/src/shader-lab/ShaderLab.test.ts (1)

267-273: Nit: prefer ShaderLanguage.GLSLES100 over the literal 0 for the backend argument.

_parseShaderPass takes a ShaderLanguage enum value ({ vertex, fragment } = shaderLabVerbose._parseShaderPass(..., 0, "")). Using the named constant makes the intent obvious and protects the tests if the enum ordering ever changes. Applies to all four new call sites (Lines 271, 291, 313, 338).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/src/shader-lab/ShaderLab.test.ts` around lines 267 - 273, Tests call
shaderLabVerbose._parseShaderPass with a magic literal backend value (0);
replace that literal with the enum constant ShaderLanguage.GLSLES100 in each
call site to make intent explicit and resilient to enum reordering—update the
four new invocations that pass 0 to instead pass ShaderLanguage.GLSLES100
(references: _parseShaderPass, ShaderLanguage, shaderLabVerbose).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/src/shader-lab/ShaderLab.test.ts`:
- Around line 267-273: Tests call shaderLabVerbose._parseShaderPass with a magic
literal backend value (0); replace that literal with the enum constant
ShaderLanguage.GLSLES100 in each call site to make intent explicit and resilient
to enum reordering—update the four new invocations that pass 0 to instead pass
ShaderLanguage.GLSLES100 (references: _parseShaderPass, ShaderLanguage,
shaderLabVerbose).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8d6471a0-0e9b-4c68-abc9-28065b515078

📥 Commits

Reviewing files that changed from the base of the PR and between db618bd and 251bdad.

⛔ Files ignored due to path filters (1)
  • tests/src/shader-lab/shaders/define-ctor-with-member.shader is excluded by !**/*.shader
📒 Files selected for processing (3)
  • packages/shader-lab/src/lexer/Lexer.ts
  • packages/shader-lab/src/parser/AST.ts
  • tests/src/shader-lab/ShaderLab.test.ts

Type keywords like `vec4`, `mat3`, `float`, etc. can legitimately start a
GLSL expression via constructor-call syntax (e.g. `mat3(v.v_tangent, …)`).
They were previously misclassified as declaration-style macro leaders and
sent down the opaque legacy path, so member access inside such values
wasn't rewritten — producing broken GLSL when the macro referenced a
varying/attribute struct variable.

Extend `_expressionValueLeaderKeywords` to include all scalar/vector/matrix
type names. Qualifier keywords (`highp`, `uniform`, `struct`, `in`, …) and
sampler types remain excluded so true declaration-style macros stay on the
legacy path.

New test: `define-ctor-with-member` covers `#define TBN_BLEND mat3(v.v_tangent, ...)`.
@GuoLei1990 GuoLei1990 force-pushed the feat/shaderlab-define-ast-firstclass branch from 251bdad to b248dc7 Compare April 21, 2026 13:06
GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

GuoLei1990

This comment was marked as outdated.

Switch the `#define` expression-path whitelist from a string Set to a
`Set<Keyword>`, and drop a helper that became dead when the macro-value
codegen moved to AST traversal.

- `_expressionLeaderKeywords`: elements are now Keyword enum values.
  Eliminates the string/Keyword double-truth around `_lexemeTable` and
  reuses the keyword lookup already done for the main classification.
- `VisitorContext.referenceStructPropByName`: removed. The AST path
  calls `referenceVarying`/`referenceAttribute`/`referenceMRTProp`
  directly with a proper `location`; the by-name helper with its
  `undefined as any` location is no longer reachable.
- `GLESVisitor._getGlobalSymbol`: hoist `symbols.length` out of the
  loop header, consistent with the other loops in this file.
GuoLei1990

This comment was marked as outdated.

@GuoLei1990 GuoLei1990 requested a review from zhuxudong April 24, 2026 03:47
…adjacency

`#define FOO (1 + 2)` (space before `(`) was being misclassified as a
function-like macro `FOO()` with body `1 + 2`, then failing to parse
because the CFG's `macro_define` rule requires a value after the params
block. Per C99 §6.10.3/3 — which GLSL ES 3.00 §3.4 inherits verbatim —
the function-like form requires `(` immediately after the name, with no
intervening whitespace.

Fix: after scanning the macro name, set `_macroDefineExpectsParamsToken`
only when the *current* char is `(`. A space-separated `(` falls through
to the normal value token stream, yielding the correct object-like
macro with value `(1 + 2)`.

Regression test covers both shapes side-by-side in one shader:
`OBJ_PAREN (1 + 2)` (object-like) and `FN_LIKE(x) (x + 3.0)`
(function-like).
GuoLei1990

This comment was marked as outdated.

…n-like macro

MacroCallFunction covers two call shapes sharing one AST:

  (a) object-like macro whose value is a function name, used as a call
      — `#define FN foo` + `FN(varyings, …)`. The driver expands `FN`
      to `foo`; `foo` is a ShaderLab function whose IO-struct params
      have been flattened. Call site must drop IO-struct args.

  (b) true function-like macro — `#define MAX3(a,b,c) …`
      + `MAX3(v.v_normal.x, v.v_normal.y, v.v_normal.z)`. ShaderLab
      doesn't expand the macro; the driver does, and the `#define`
      fixes parameter count. Args must be preserved verbatim — a
      member-access arg unwraps to an IO-struct root identifier but
      dropping the arg changes the macro's arity.

Previously `visitMacroCallFunction` applied the shape-(a) filter
uniformly, so any function-like macro call whose arg unwrapped to
an IO-struct variable had that arg silently stripped, yielding
empty-arg-list expansions like `MAX3()` that fail to compile.

Fix: thread `isFunctionLikeMacro` from MacroCallSymbol.semanticAnalyze
through MacroCallFunction. When the `#define` is function-like (has
`isFunction` in macroDefineList), skip the drop-IO-arg filter.

Regression test: `macro-call-struct-arg-repro.shader`
  #define MAX3(a, b, c) max(max(a, b), c)
  float m = MAX3(v.v_normal.x, v.v_normal.y, v.v_normal.z);
@GuoLei1990
Copy link
Copy Markdown
Member Author

Re: [第八轮 P1] visitMacroCallFunction struct-arg 过滤

回复 review #2974 (review) 提出的唯一待修 P1。

Bug 确实存在且 P1 级别,已在 fdcb7ea1d 修复并 push。但建议的判据(hasAstValue)不正确,我们用了 isFunction,原因说明如下。

1. Bug 实证

新增回归测试 tests/src/shader-lab/shaders/macro-call-struct-arg-repro.shader

struct Varyings { vec3 v_normal; };

#define MAX3(a, b, c) max(max(a, b), c)

void frag(Varyings v) {
  float m = MAX3(v.v_normal.x, v.v_normal.y, v.v_normal.z);
  gl_FragColor = vec4(m);
}

修复前输出:

float m = MAX3() ;     // ← 三个实参全被 filter 过滤掉

driver 展开 max(max(a,b),c)a/b/c undeclared → GLSL 编译失败,shaderProgram.isValid === false

2. hasAstValue 不能作为判据

hasAstValue 的语义是"#define 的 value 是否被 parse 成 AST 子树",和 call shape 正交。两类会引起判断歧义:

  • object-like 宏作为函数名#define FN foo + FN(varyings, …)
    • value foo 是非关键字 identifier → 走 AST → hasAstValue=true
    • 但这里的应该过滤 IO 实参 —— driver 展开 FNfoo,而 foo 是 ShaderLab 函数,签名已被 flatten,call site 必须保持一致
  • true function-like 宏#define MAX3(a,b,c) max(…) + MAX3(v.x, …)
    • value 也走 AST → hasAstValue=true
    • 这里不应该过滤实参 —— ShaderLab 不展开宏,#define 固定了参数个数,改 arity 会让 driver 展开后语法错

两种 case hasAstValue 都是 true,无法区分。

另外 CR 里说的 "AST-backed macro 的值已由 visitMacroDefine 在定义处完成 struct 展平" 对 #define GET_UV(input) input.v_uv 这个具体例子不成立input 是宏形参,不在 _structVarMap 里,visitPostfixExpression 不会展平 input.v_uv,原样输出。这个场景属于 PR body "已知局限"第 1 条(函数式宏 struct 形参),两种方案都处理不了,不在本 fix 范围内。

3. 正确判据:isFunction

共享 MacroCallFunction AST 节点的两种 call shape:

Call shape Driver 展开后 Callee 签名 flatten? 过滤 IO 实参?
(a) object-like 宏作为函数名 ShaderLab 函数 foo(…)
(b) true function-like 宏 无 callee,文本替换 (破坏 arity)

区分它们的定义端特征就是 isFunctionmacroDefineList 里记录的宏是不是 function-like)。

修复:把 isFunction 透传到 call site,按此分叉。

// AST.ts: MacroCallSymbol.semanticAnalyze
this.isFunctionLikeMacro = sa.macroDefineList[name]?.some(info => info.isFunction) ?? false;

// CodeGenVisitor.ts: visitMacroCallFunction
const params = node.isFunctionLikeMacro
  ? astNodes                                   // shape (b): 保持全部实参
  : astNodes.filter(/* 既有 IO struct arg drop */);  // shape (a): 与 visitFunctionCall 同规则

4. visitFunctionCall 不需要同步 guard

CR 的补充建议 "visitFunctionCall 的同类 filter 也需要 hasAstValue guard" —— 此处不需要。

visitFunctionCall 处理 FunctionCall AST 节点,callee 是确定的 FnSymbol(ShaderLab 函数)。filter 按 callee 形式参数的静态类型 (paramInfoList[i].typeInfo) 过滤,而非按实参表达式的类型。只要 callee 形参是 IO struct 类型就 drop —— 这是对"callee 签名已被 flatten"的正确反应,与实参是不是从 AST-backed macro 展开来的无关。

如果担心 "object-like macro 展开后作为 function name 调用" 的情况,这个其实走的就是 MacroCallFunction 路径(上表 shape (a)),由 visitMacroCallFunction 处理;而一旦进入 visitFunctionCall,callee 就一定是 FnSymbol,filter 总是对的。

5. 验证

  • 全量测试 1300/1300 通过
  • PBR 中的 FUNCTION_DIFFUSE_IBL(varyings, surfaceData, …)(shape (a))继续 work
  • 新 repro MAX3(v.v_normal.x, …)(shape (b))修复
  • 已知局限(PR body 列出的 3 类)未变化

感谢这轮细致 review — 发现了一个我们之前没识别到的独立 bug。Commit:fdcb7ea1d

GuoLei1990

This comment was marked as outdated.

@zhuxudong
Copy link
Copy Markdown
Member

zhuxudong commented Apr 24, 2026

Review: AST-first #define 方案的边界问题

基于 GLSL ES 3.00(Galacean 目标版本)和 ANGLE 实现对齐审查。运行时路径通过 ShaderMacroProcessor.evaluate + GSP 指令集已验证不受本 PR 影响,以下问题全部集中在"源码 → GLSL 字符串"的 shader-lab 内部编译阶段。


🔴 Blocker 1:合法但非 expression 的 replacement list 全挂

GLSL ES 3.00 spec 明确 #define 的 replacement list 是任意 token 序列,不限定为有效表达式。以下全部是 spec 合法、baseline 可用、PR parse error:

#define BEGIN {                     // 作用域包裹宏
#define END }
#define COMMA ,                     // 参数表分隔符宏
#define SEMI ;
#define REPEAT3(x) x,x,x            // body 不是单一 assignment_expression

失败路径: _defineHasValue 对首字符非 alpha 的值返回 true → 进 AST 路径 → parser 期待 assignment_expression starter → { / , / ; 都不匹配 → Unexpected token 报错。

Baseline 下这些值作为 opaque lexeme 透传到 ShaderInstructionEncoder,经 ShaderMacroProcessor.evaluate 纯字符串替换展开,驱动能接受。


🔴 Blocker 2:TargetParser.y 未同步,新 grammar 未经 bison 冲突验证

packages/shader-lab/src/parser/TargetParser.y 头部注释明确写着 // For cfg conflict test, used by bison——是让 bison 对 LALR(1) conflict 做严谨诊断的 grammar 镜像。两边应当同步。

PR 只改了 CFG.ts(runtime 用),.y 文件完全没动:

CFG.ts (PR):          15 处 macro_define / MACRO_DEFINE 相关引用
TargetParser.y (PR):  3 处(全是 baseline 原有的 MACRO_DEFINE_EXPRESSION)
git diff dev/2.0..PR -- TargetParser.y  →  0 lines

后果:

  1. 新增的 macro_define 非终结符、MACRO_DEFINE / MACRO_DEFINE_END / MACRO_DEFINE_PARAMS 三个 token、以及 global_declaration → macro_definestatement → macro_define 两条产生式全部没经过 bison 的冲突验证
  2. PR 注释里声称 "MACRO_DEFINE_PARAMS 作为 opaque token 是为了避开与 function_call_parameter_list 的 shift/reduce 冲突"——这个判断没有工具自动验证,完全靠人工思辨
  3. 复用 assignment_expression 作为宏值的产生式是否真 conflict-free,没人证明。特别是 postfix_expression → postfix_expression DOT function_callmacro_define 作用域下的 follow-set 可能有问题
  4. 未来维护用 .y 做 grammar debug / 工具链集成时拿到过期 grammar

fix 成本极小: 把 PR 对 CFG.ts 的改动镜像到 .y——加 3 个 %token 声明、1 个非终结符、2 条产生式。十几行。


🔴 Blocker 3:macroDefineList 双路径冗余 + 每次调用发假 warning

Preprocessor._parseMacroDefines 用 regex 扫描源码一遍,MacroDefine.semanticAnalyze 在 parser reduction 时又 push 一次。对每个表达式宏,macroDefineList[name]2 条冗余记录

// Preprocessor 侧:
{ value: "v.v_uv", valueType: Other, valueAst: undefined }
// Parser 侧:
{ value: "", valueType: Other, valueAst: <AssignmentExpression> }

getReferenceSymbolNames 遍历 infos,AST 条目被 if (info.valueAst) continue 跳过,但 Preprocessor 条目的 valueType === Other 会走到:

Logger.warn(`Macro "${info.name}" has an unrecognized value "${info.value}". ShaderLab does not validate this type.`);

每个 member-access #define 每次访问都触发一条误导的 warning。跑 macro-member-access-builtin-arg.shader 即可复现。PR 测试只比 snapshot,没断言 logger 静默,所以遮盖了这个问题。


🟡 Warning 1:_defineHasValue 的 keyword peek 漏过注释

while (i < src.length && (src[i] === " " || src[i] === "\t")) i++;
// 只跳空格+tab,不跳 /* */ 和 //
if (BaseLexer.isAlpha(src.charCodeAt(i))) {
  // 检查首词是否是声明式关键字
}

#define HP /* comment */ highp

  • peek 见 / 非 alpha → 白名单检查被跳过 → 返回 true → 进 AST 路径
  • value 模式下 _skipInlineSpaceAndComments 跳过 /* */,扫到 highp → HIGH_PRECISION 关键字
  • parser 期待 assignment_expression starter,HIGH_PRECISION 不匹配 → parse error

Baseline 整行打成 opaque,不会坏。违反 GLSL 对注释的透明性约定。


🟡 Warning 2:分流策略反向定义

_expressionLeaderKeywords 白名单是"反向枚举"——"首词不是声明关键字就当表达式"。这个闭集随 GLSL 版本升级/扩展新增 sampler/image 类型时需要维护,且漏掉 Blocker 1 里那些非 alpha 首字符的合法 value。

正向分流应该是: peek 整条 replacement list,只有包含 . 成员访问(真正需要 ShaderLab 介入 flatten)才走 AST,其他全部保持 opaque。这样 Blocker 1、Warning 1 一并解决。


🔵 Nit 1:hasAstValue.some() 在跨 #ifdef 分支时语义错

this.hasAstValue = defList?.some((info) => info.valueAst != null) ?? false;

PR "Known limitation 3" 已承认但未分析影响。对:

#ifdef FAST
  #define FOO vec4(1.0)    // AST 路径
#else
  #define FOO highp vec4    // legacy 路径
#endif

call site 类型推导按 hasAstValue=true 跳过,但实际 codegen 按 active 分支决定。两阶段决策口径不同,在 builtin overload 解析等下游决策时会错。结构性 fix 应该按当前 active 分支挑 info,不是 .some()


🔵 Nit 2:表达式宏的 value 在 parser reduction 时 semantic-analyze,产生假警告

#define FRAG_UV v.v_uv 写在 module 级时:

  • v 作为 VariableIdentifier 在规约时走 semanticAnalyze
  • sa.symbolTableStack.scope.getSymbol("v") 在 module 作用域找不到 v(它是后面 void frag(Varyings v) 的入口参数)
  • #if _VERBOSE 分支发 "Please sure the identifier 'v' will be declared before used."

Baseline 下 value 是 opaque string,不做 semantic 分析,无警告。


性能实测数据

PR 描述的 "PBR parser steps 65647 不变"与实测不符。以下是我本地实测:

复现方法

1. 在两个分支的 ShaderTargetParser.ts 埋点:

   static _singleton: ShaderTargetParser;

+  /** @internal Last-parse diagnostic — for benchmarking only. */
+  static _lastLoopCount: number = 0;
+  static _lastParseMs: number = 0;

       } else if (actionInfo?.action === EAction.Accept) {
+        const elapsed = performance.now() - start;
+        ShaderTargetParser._lastLoopCount = loopCount;
+        ShaderTargetParser._lastParseMs = elapsed;
         Logger.info(
-          `[Task - AST compilation] Accept! State automata run ${loopCount} times! cost time ${
-            performance.now() - start
-          }ms`
+          `[Task - AST compilation] Accept! State automata run ${loopCount} times! cost time ${elapsed}ms`
         );

2. benchmark 测试文件 tests/src/shader-lab/ParserBenchmark.test.ts

import { ShaderLab as ShaderLabVerbose } from "@galacean/engine-shaderlab/verbose";
import { registerIncludes } from "@galacean/engine-shader";
import { Logger, WebGLEngine } from "@galacean/engine";
import { describe, expect, it } from "vitest";

Logger.enable();
registerIncludes();
const shaderLab = new ShaderLabVerbose();

let _lastLoops = 0, _lastMs = 0, _allLoops: number[] = [];
const LOG_RE = /State automata run (\d+) times.*?cost time ([\d.]+)ms/;

// Logger 在 enable() 时绑定 console.info, 需要直接 patch Logger.info
const origInfo = Logger.info.bind(Logger);
Logger.info = (...args: any[]) => {
  if (typeof args[0] === "string") {
    const m = args[0].match(LOG_RE);
    if (m) {
      _lastLoops = parseInt(m[1], 10);
      _lastMs = parseFloat(m[2]);
      _allLoops.push(_lastLoops);
    }
  }
  origInfo(...args);
};

function makeShader(defines: string[], body = "vec2 uv = a.TEXCOORD_0;"): string {
  return `Shader "bench" { SubShader "D" { Pass "F" {
    struct A { vec4 POSITION; vec2 TEXCOORD_0; };
    struct V { vec2 v_uv; vec3 v_normal; vec4 v_color; };
    VertexShader = vert; FragmentShader = frag;
${defines.map((d) => "    " + d).join("\n")}
    V vert(A a) { V o; gl_Position = a.POSITION; ${body}
      o.v_uv = uv; o.v_normal = vec3(0.0); o.v_color = vec4(1.0); return o; }
    void frag(V v) { gl_FragColor = vec4(v.v_uv, 0.0, 1.0); }
  } } }`;
}

function parseAndMeasure(src: string) {
  _allLoops = []; _lastLoops = 0; _lastMs = 0;
  const shader = shaderLab._parseShaderSource(src);
  const pass = shader.subShaders[0].passes[0];
  shaderLab._parseShaderPass(pass.contents, pass.vertexEntry, pass.fragmentEntry, 0, "");
  return { loops: _lastLoops, parseMs: _lastMs };
}

function avg(src: string, runs = 5) {
  const times: number[] = []; let loops = 0;
  for (let i = 0; i < runs; i++) {
    const r = parseAndMeasure(src); times.push(r.parseMs); loops = r.loops;
  }
  times.sort((a, b) => a - b);
  return { loops, parseMs: times[Math.floor(runs / 2)] };
}

describe("Parser Benchmark", async () => {
  const canvas = document.createElement("canvas");
  await WebGLEngine.create({ canvas });

  it("baseline (0 defines)", () => {
    const r = avg(makeShader([]));
    console.log(`[BENCH] baseline: loops=${r.loops} ms=${r.parseMs.toFixed(3)}`);
  });

  it("10 numeric", () => {
    const d = Array.from({ length: 10 }, (_, i) => `#define N${i} ${(i + 1) * 0.1}`);
    const r = avg(makeShader(d));
    console.log(`[BENCH] 10_numeric: loops=${r.loops} ms=${r.parseMs.toFixed(3)}`);
  });

  it("10 member-access", () => {
    const d = Array.from({ length: 10 }, (_, i) => `#define A${i} v.v_uv`);
    const r = avg(makeShader(d));
    console.log(`[BENCH] 10_member: loops=${r.loops} ms=${r.parseMs.toFixed(3)}`);
  });

  it("10 constructor", () => {
    const d = Array.from({ length: 10 }, (_, i) => `#define C${i} vec4(v.v_uv, ${i}.0, 1.0)`);
    const r = avg(makeShader(d));
    console.log(`[BENCH] 10_ctor: loops=${r.loops} ms=${r.parseMs.toFixed(3)}`);
  });

  it("10 complex (mat3+cross) — Cocos TBN-style", () => {
    const d = Array.from({ length: 10 }, (_, i) =>
      `#define X${i} mat3(v.v_normal, v.v_color.xyz, cross(v.v_normal, v.v_color.xyz))`
    );
    const r = avg(makeShader(d));
    console.log(`[BENCH] 10_complex: loops=${r.loops} ms=${r.parseMs.toFixed(3)}`);
  });

  it("100 numeric (scale)", () => {
    const d = Array.from({ length: 100 }, (_, i) => `#define N${i} ${(i + 1) * 0.01}`);
    const r = avg(makeShader(d));
    console.log(`[BENCH] 100_numeric: loops=${r.loops} ms=${r.parseMs.toFixed(3)}`);
  });

  it("500 numeric (stress)", () => {
    const d = Array.from({ length: 500 }, (_, i) => `#define N${i} ${(i + 1) * 0.001}`);
    const r = avg(makeShader(d), 3);
    console.log(`[BENCH] 500_numeric: loops=${r.loops} ms=${r.parseMs.toFixed(3)}`);
  });

  it("10 legacy (#define X highp — opaque path)", () => {
    const d = Array.from({ length: 10 }, (_, i) => `#define L${i} highp`);
    const r = avg(makeShader(d));
    console.log(`[BENCH] 10_legacy: loops=${r.loops} ms=${r.parseMs.toFixed(3)}`);
  });
});

3. 在两个分支各自执行:

npm run build
cd tests
HEADLESS=true npx vitest run src/shader-lab/ParserBenchmark.test.ts --reporter=verbose 2>&1 | grep BENCH

实测结果(Chromium HEADLESS 模式)

场景 Baseline loops PR loops Δ Per-directive
0 defines(基准) 473 473 0
10 × #define X 3.14 503 703 +200 3 → 23 loops (7.7x)
10 × #define X v.v_uv 503 743 +240 3 → 27 loops (9x)
10 × #define X vec4(v.v_uv, i, 1.0) 503 1,373 +870 3 → 90 loops (30x)
10 × #define X mat3(v.v_normal, v.v_color.xyz, cross(...)) 503 1,983 +1,480 3 → 151 loops (50x)
100 × #define X 3.14 773 2,773 +2,000 3 → 23 loops
500 × #define X 3.14 1,973 11,973 +10,000 3 → 23 loops
10 × #define X highp(legacy 路径) 503 503 0 3 loops(持平 ✓)

关键观察:

  • Baseline 每 #define 固定 3 loops(1 shift + 1 reduce for MACRO_DEFINE_EXPRESSION + 1 for global_declaration
  • PR 路径每 #define 23~151 loops,因为 value 要完整走 assignment_expression 的 18 层规约链
  • Legacy path(#define X highp)验证分流策略正确——走 legacy 的宏与 baseline 持平
  • PBR 47 个有值 #define(源码 + include 的 Common/Shadow/BSDF 等共约 49 条)全部走 AST 路径,保守估计 +940 loops,PR 描述"不变"不准确

Runtime 不受影响(Shader.create 每 pass 只 parse 一次,变体编译走 ShaderMacroProcessor.evaluate 不重 parse,.gsp 加载跳过 shader-lab),但编辑器/预编译工具链每个 shader 会多花 1-10ms 量级


建议合入前修的最小集

  1. Blocker 1/Warning 2 合并修: 分流策略从"首词不是声明关键字 → AST"改成"值含 . 成员访问 → AST"。正向识别真正需要 ShaderLab 介入的 value。改完后:

    • #define BEGIN { / #define COMMA , / #define REPEAT3(x) x,x,x 回到 legacy 路径,不 parse error
    • #define PI 3.14 / #define MAX 4 / #define FOO bar 这类纯常量宏也回到 legacy,parser 成本消失
    • Warning 1(注释盲点)副产品解决
  2. Blocker 2 修:CFG.ts 里新增的 macro_define 产生式 + 3 个 token 声明同步到 TargetParser.y,跑 bison TargetParser.y 确认无 shift/reduce 和 reduce/reduce 冲突。

  3. Blocker 3 修: 消除 Preprocessor regex 与 MacroDefine.semanticAnalyze 双路径 push。推荐方案:regex collect 跳过 AST-form 的 name(需要 Lexer 提前标记),或者 AST 路径覆写同名条目而非 append。

  4. PR 描述修正: PBR parser steps 重测更新;"不变"改成真实数字。

Warning 1 做完 Blocker 1 会自动修;Nit 1/Nit 2 可以后续独立 PR 迭代。


参考

@GuoLei1990
Copy link
Copy Markdown
Member Author

谢谢非常精准的双 bug 报告。已修复,commit ae5be7970。两个 bug 都真实存在,而且实际比报告还多了一处——让我把完整链路说清楚。

Fact-check:3 个互相叠加的 bug

报告说"两个 bug 必须一起修"——确认。但 fix 到一半发现还有第三处需要动:

Bug 1(PR 引入):_defineHasValue 漏判 (v).v_uvv . v_uv ✓ 你说的对

按你的反向 enumeration 修了:「除非 digit.digit 是数字小数点,其他 . 都是成员访问」。

Bug 2(baseline 长期):semanticAnalyze 错误 _VERBOSE 包裹 ✓ 你说的对

不只 Expression.semanticAnalyze 一个——AssignmentExpression.semanticAnalyze 也被同样错误地包了(行 852-862)。release 下 type 链断在 AssignmentExpression 这一层。Expression.semanticAnalyze 是次级问题,单独修不够。

Bug 3(被前两个掩盖的更深问题):extractDirectIdentLexeme 不穿透括号

即使前两个 bug 修完,宏路径下 #define UV_PAREN (v).v_uv 还是不 flatten —— 因为:

  • _structVarMap['v']_collectAllStructVars 里已注册为 'varying'
  • 但 codegen 时 directRoot = extractDirectIdentLexeme((v)) 返回 null(因为 PrimaryExpression length=3 即括号包裹被早期 return null)
  • getStructRole(postExpr.type) 又因为宏定义时 v 没绑定到 frag context 而拿到 TypeAny
  • 两个 fallback 都失败 → 不 flatten

修法:让 extractDirectIdentLexeme 穿透 ( expr ) 包装 + 任何 single-child precedence wrapper。代码反而变更短

static extractDirectIdentLexeme(expr: TreeNode): string | null {
  let cur: TreeNode | Token = expr;
  while (cur) {
    if (cur instanceof ASTNode.VariableIdentifier) {
      const child = cur.children[0];
      return child instanceof Token ? child.lexeme : null;
    }
    // `( expression )` form on PrimaryExpression — descend into the wrapped expr.
    if (cur instanceof ASTNode.PrimaryExpression && cur.children.length === 3) {
      cur = cur.children[1];
      continue;
    }
    // Single-child precedence wrappers — covers Postfix / Assignment /
    // Conditional / LogicalOr / … (verbose chain) and elision-collapsed
    // chain (release).
    if (cur instanceof ASTNode.ExpressionAstNode && cur.children.length === 1) {
      cur = cur.children[0];
      continue;
    }
    return null;
  }
  return null;
}

之前的 PostfixExpression-only 特殊处理被通用化了。

测试覆盖

新增 paren-member-access-repro.shader

#define UV_PAREN  (v).v_uv      // bug 1+2+3
#define UV_SPACE  v . v_uv      // bug 1+2

void frag(Varyings v) {
  vec2 a = (v).v_uv;            // inline form (bug 2)
  vec2 b = v . v_uv;
  vec2 c = ((v)).v_uv;          // 多层括号
  gl_FragColor = vec4(a + b + c + UV_PAREN + UV_SPACE, 0.0, 1.0);
}

release + verbose 下 fragment 都正确 flatten 成 varying vec2 v_uv; + 所有引用替换成 v_uv28/28 shader-lab 测试通过

反思:第一性原理上为什么三处都错

三处的根因是同一个对"identifier"的局部判断

  • Bug 1:用「. 前一个字符是不是数字」判断成员访问
  • Bug 2:把 type propagation 当作 verbose-only 错误辅助
  • Bug 3:extractDirectIdentLexeme 不允许 expression-form root identifier

每一处的 cheap shortcut 都丢失了"完整 token 上下文"信息。修法都是用更通用的语法定义代替局部 heuristic

  • Bug 1: digit.digit 才是 numeric literal 小数点(明确语法定义)
  • Bug 2: type propagation 是语义分析的核心职责(不该 verbose-gate)
  • Bug 3: identifier root 可以包在任意 precedence wrapper / ( expr ) 里(穿透)

你提到的其他 _VERBOSE-gated semanticAnalyze

我审查了一下 AST.ts 里其它 15 处 // #if _VERBOSE-gated semanticAnalyze。它们都在 expression precedence chain(UnaryExpressionMultiplicativeExpression、…、ConditionalExpression)里,这些类整个(连同 @ASTNodeDecorator)都被 _VERBOSE 包了——release 下 class 本身不存在,CFG.ts 里 ASTNode.LogicalOrExpression.pool 等是 undefined → 触发 TrivialNode elision → release 下这些 wrapper 节点根本不被实例化。所以它们的 semanticAnalyze 不需要在 release 跑,这一类 _VERBOSE 是正确的(已经被 elision 优化代偿)。

只有 ExpressionAssignmentExpression 不一样:它们的类本身在 release 也注册(ASTNode.AssignmentExpression.pool 在 release 也存在),所以 elision 不触发,instance 真的会被实例化,但 semanticAnalyze 被错误剥离 → type 不传播。这两处单独需要修。

CI

ae5be79 已 push。

…ease

Three interlocked bugs caused `(v).v_uv` and `v . v_uv` to skip varying
flatten in release builds:

1. `_defineHasValue` only treated `.` as member-access when its
   left-side run was an identifier-leading alnum sequence — wrongly
   excluded `(v).v_uv` (run preceded by `)`) and `v . v_uv` (run empty
   due to whitespace). Inverted the rule: any `.` is member-access
   except `digit.digit` (numeric literal `3.14`, `1.0e-5`).

2. `Expression.semanticAnalyze` and `AssignmentExpression.semanticAnalyze`
   were wrapped in `// #if _VERBOSE` — type propagation got stripped in
   release builds so outer `(expr).field` codegen saw `TypeAny` and
   skipped flatten. Both are release-required for codegen correctness,
   not verbose-only logging.

3. `extractDirectIdentLexeme` only forwarded through PostfixExpression
   and PrimaryExpression(len=1) — `(v)` returned null, missing the
   `_structVarMap` lookup that macro-path codegen relies on (since
   macro-time `v` has no symbol-table binding, type fallback is
   TypeAny). Generalized to forward through any single-child
   ExpressionAstNode + the `( expr )` form, covering the entire
   precedence chain in both verbose and release builds.

Adds regression `paren-member-access-repro.shader` covering inline +
macro forms of `(v).v_uv`, `v . v_uv`, `((v)).v_uv` in both release
and verbose builds.

28/28 shader-lab tests pass. Release AST node count unchanged
(types of `pool.get()` calls untouched; only method bodies
re-enabled).
@GuoLei1990 GuoLei1990 force-pushed the feat/shaderlab-define-ast-firstclass branch from ae5be79 to 072c276 Compare April 27, 2026 13:29
@GuoLei1990
Copy link
Copy Markdown
Member Author

谢谢精准的双 bug 报告。已修复,commit 072c2766d。两个 bug 都真实存在,而且实测发现还多一处需要动——让我把完整链路说清楚。

Fact-check:3 个互相叠加的 bug

报告说"两个 bug 必须一起修"——确认。但 fix 到一半发现还有第三处需要动:

Bug 1(PR 引入):_defineHasValue 漏判 (v).v_uvv . v_uv

按你的反向 enumeration 修了:「除非 digit.digit 是数字小数点,其他 . 都是成员访问」。

Bug 2(baseline 长期):semanticAnalyze 错误 _VERBOSE 包裹 ✓

不只 Expression.semanticAnalyze 一个——AssignmentExpression.semanticAnalyze 也被同样错误地包了(行 852-862)。release 下 type 链断在 AssignmentExpression 这一层。Expression.semanticAnalyze 是次级问题,单独修不够。

我顺便审了 AST.ts 里其它 15 处 _VERBOSE-gated semanticAnalyze:它们都在 expression precedence chain(UnaryExpressionMultiplicativeExpression、…、ConditionalExpression)里。这些类整个(连同 @ASTNodeDecorator)都被 _VERBOSE 包了 → release 下 class 不存在 → CFG.ts 里它们的 pool 是 undefined → 触发 TrivialNode elision → release 下根本不被实例化。所以它们的 semanticAnalyze 不需要在 release 跑,这一类 _VERBOSE 是正确的(已经被 elision 优化代偿)。

只有 ExpressionAssignmentExpression 不一样:它们的类本身在 release 也注册(pool 在 release 也存在),所以 elision 不触发,instance 真的会被实例化,但 semanticAnalyze 被错误剥离 → type 不传播。这两处单独需要修。

Bug 3(被前两个掩盖的更深问题):extractDirectIdentLexeme 不穿透括号

即使前两个 bug 修完,宏路径下 #define UV_PAREN (v).v_uv 还是不 flatten —— 因为:

  • _structVarMap['v']_collectAllStructVars 里已注册为 'varying'
  • 但 codegen 时 directRoot = extractDirectIdentLexeme((v)) 返回 null(因为 PrimaryExpression length=3 即括号包裹被早期 return null)
  • getStructRole(postExpr.type) 又因为宏定义时 v 没绑定到 frag context 而拿到 TypeAny
  • 两个 fallback 都失败 → 不 flatten

修法:让 extractDirectIdentLexeme 穿透 ( expr ) 包装 + 任意 single-child precedence wrapper(用 ExpressionAstNode base class 一笔覆盖整个 chain):

static extractDirectIdentLexeme(expr: TreeNode): string | null {
  let cur: TreeNode | Token = expr;
  while (cur) {
    if (cur instanceof ASTNode.VariableIdentifier) {
      const child = cur.children[0];
      return child instanceof Token ? child.lexeme : null;
    }
    if (cur instanceof ASTNode.PrimaryExpression && cur.children.length === 3) {
      cur = cur.children[1];   // ( expr ) → expr
      continue;
    }
    if (cur instanceof ASTNode.ExpressionAstNode && cur.children.length === 1) {
      cur = cur.children[0];   // 任意 single-child wrapper
      continue;
    }
    return null;
  }
  return null;
}

之前的 PostfixExpression-only 特殊处理被通用化了 —— 一处 11 行,覆盖所有 precedence wrapper 和括号嵌套。

三个 bug 的根因都是「局部 heuristic 代替完整语法定义」

  • Bug 1:用「. 前一个字符是不是数字」判断成员访问 → 漏掉 (v).x / v . x
  • Bug 2:把 type propagation 当作 verbose-only 错误辅助 → release 链断
  • Bug 3extractDirectIdentLexeme 只列两类节点 → 漏掉括号包裹和其它 wrapper

修法都是回归到精确的语法定义

  • Bug 1: digit.digit 才是 numeric literal 小数点
  • Bug 2: type propagation 是 codegen 必需的语义分析职责
  • Bug 3: identifier root 可包在任意层 precedence wrapper / ( expr )

性能影响

去掉 // #if _VERBOSEExpression / AssignmentExpressionsemanticAnalyze 在 release 也跑。方法体是 1-2 次属性读 + 1 次属性写,纳秒级。Release AST 节点数量不变(pool/elision 机制由 astTypePool 决定,与方法体是否被 gate 无关),实测 50×4 PBR median 39-41ms 与之前一致,无 regression。

测试覆盖

新增 paren-member-access-repro.shader

#define UV_PAREN  (v).v_uv      // bug 1+2+3
#define UV_SPACE  v . v_uv      // bug 1+2

void frag(Varyings v) {
  vec2 a = (v).v_uv;            // inline form (bug 2)
  vec2 b = v . v_uv;
  vec2 c = ((v)).v_uv;          // 多层括号
  gl_FragColor = vec4(a + b + c + UV_PAREN + UV_SPACE, 0.0, 1.0);
}

release + verbose 下 fragment 都正确 flatten 成 varying vec2 v_uv; + 所有引用替换成 v_uv

28/28 shader-lab 测试通过。

CI

072c2766d 已 push。

GuoLei1990

This comment was marked as outdated.

@GuoLei1990
Copy link
Copy Markdown
Member Author

#2980

…an#2980)

Remove the regex-based pre-scan in Preprocessor that registered
`#define` entries into `macroDefineList`. The Lexer now fills the list
inline as it tokenizes — single source of truth, no drift.

Two interpreters of the same source had been silently disagreeing:

- Preprocessor's `_macroRegex` didn't understand block comments or
  `\`-line-continuation, so `/* #define MAX_LIGHTS 4 */` got registered
  as a real macro, and `#define LONG \\\n a*b` truncated its value.
- Lexer's `_defineHasValue` peek had its own corner cases (commit
  `5b7f8ae37` patched one for `#define HP /* comment */ highp`).
- The `_expressionLeaderKeywords` whitelist conflated "can lead a
  constructor call" with "is a complete expression", breaking FXAA
  type-alias macros (commit `992956ffe` patched that).

Each was a one-off symptom of the same architectural fact: two
analyzers parsing the same source independently will drift.

Approach
--------

`#define` is registered exactly once, by the entity that already
parses every byte of the source: the Lexer.

- `_scanDirectives` records the directive's start offset on `#define`.
- The two `#define` paths funnel through one helper:
    - AST path: `_emitMacroDefineEnd` slices the source between the
      recorded start and the directive-terminating newline.
    - Legacy path: `_scanDirectives` slices the same range after
      `_scanUtilBreakLine`.
- `_registerMacroDefine(directive)` runs a single regex on that
  slice, builds a `MacroDefineInfo`, and pushes (with the existing
  duplicate guard).

Preprocessor now only handles `#include` expansion. `_macroRegex`,
`_parseMacroDefines`, `_registerDefine`, `_referenceReg`, `_isExist`,
`_mergeMacroDefineLists`, and the dual-field `_chunkCache` are gone.
The chunk cache shrinks to a path → expanded-string Map.

Net diff (excluding tests):
  Preprocessor.ts   227 → 92  (−135)
  Lexer.ts          781 → 820 (+39)
  ShaderLab.ts        1-line caller update
  Total                       (−96 lines)

Performance
-----------

Same-session 4 rounds × 50 samples vs commit `072c2766d`
(release build, median-of-medians):

  PBR (complex)        40.60 → 38.50  (−5.2%)
  waterfull (medium)    7.30 →  7.05  (−3.4%)
  multi-pass            2.00 →  1.95  (−2.5%)
  macro-pre             1.10 →  1.00  (−9.1%)

Single-pass eliminates the Preprocessor's full-file `_macroRegex`
exec loop; per-directive regex on a small slice is cheaper.

Tests
-----

30/30 shader-lab tests pass, including two new regressions for
Issue galacean#2980:
  - define-in-comment-repro.shader  (block-commented #define)
  - define-line-continuation-repro.shader  (`\`-continuation in value)
@zhuxudong
Copy link
Copy Markdown
Member

Nit:hasAstValue = .some(...) 在跨 #ifdef 分支混合形态时跳过类型推导且无诊断

缺陷类型

支持范围与警告系统都有漏洞

  • 分流逻辑承诺"value 形态决定走 AST 还是 legacy"
  • 但同名宏跨 #ifdef 分支时,调用点用 .some() 把任一分支的 AST 性质放大到所有调用点
  • 实际编译变体走 legacy 分支时,调用点的类型推导被静默跳过
  • 没有 warning、没有 error——用户无任何感知

代码现场

AST.ts:1658 调用点设置 flag:

this.hasAstValue = defList?.some((info) => info.valueAst != null) ?? false;
this.isFunctionLikeMacro = defList?.some((info) => info.isFunction) ?? false;

defListmacroDefineList[macroName] 的全部条目(包含所有 #ifdef 分支的定义)。.some() 任一为真则 flag 为真。

AST.ts:1428 用 flag 决定是否跳过类型推导:

if (child instanceof BaseToken || !child.hasAstValue) {
  this.typeInfo = symbols[0].dataType?.type;
}

hasAstValue=true → 跳过 → typeInfo 保持 undefined → getter 默认 TypeAny

触发场景

uniform vec3 u_globalLightDir;

#ifdef USE_INTERPOLATED_NORMAL
  #define LIGHT_INPUT v.v_normal.xyz       // 含 . → AST 路径,valueAst != null
#else
  #define LIGHT_INPUT u_globalLightDir       // 不含 . → legacy 路径,referenceName="u_globalLightDir"
#endif

void frag(Varyings v) {
  vec3 n = normalize(LIGHT_INPUT);
}

macroDefineList["LIGHT_INPUT"] 同时持有:

Entry valueAst referenceName 来源
#0 AST 子树 "" #ifdef 分支
#1 undefined "u_globalLightDir" #else 分支

调用点 LIGHT_INPUT

  • hasAstValue = .some(... valueAst != null) = true(被 #0 触发)
  • referenceSymbolNames = ["u_globalLightDir"](来自 Chore/workflow #1
  • 符号表查 u_globalLightDir → 找到 uniform vec3
  • hasAstValue=true跳过 typeInfo = symbols[0].dataType?.type
  • typeInfo = undefined → 调用点类型 = TypeAny

行为不一致

实际编译变体 应有类型推导 实际类型推导
USE_INTERPOLATED_NORMAL 定义 TypeAny(按设计 skip 是对的,AST 路径自带类型) TypeAny ✓
USE_INTERPOLATED_NORMAL 未定义 vec3(应当从 u_globalLightDir 推) TypeAny ❌

第二种场景:active 分支是 legacy,本应走类型推导 fallback,被 .some() 拉成"按 AST 处理",typeInfo 默认 TypeAny

下游影响

依赖 typeInfo 的几处 codegen 决策都会受影响:

  1. builtin 函数重载解析normalize(LIGHT_INPUT) 当 typeInfo 是 TypeAny 时,重载选择行为依赖 shader-lab 对 TypeAny 的容忍策略——能 match 上是侥幸,匹配不上则报 "no matching overload"
  2. IO struct flattengetStructRole(typeInfo.typeLexeme) 在 TypeAny 时返回 undefined → 跳过本应触发的成员访问改写
  3. 类型驱动的 codegen 分支:任何 if (typeInfo === SOMETHING) 都会走错路径

具体出哪种症状取决于调用点上下文。关键问题不是"哪种症状最常见",而是"系统对用户做出了一个它无法保证的承诺,且失败时无诊断"。

警告系统也漏

理论上 shader-lab 应该在以下情况警告用户:

  • 同名宏跨 #ifdef 分支定义时形态不一致(一边 AST 一边 legacy)
  • .some() 把 flag 拉真后,对应 active 分支可能行为异常

实际:两个警告都没有MacroCallSymbol.semanticAnalyze 直接 .some() 后用 flag,没有任何 sanity check。

修复方向

选项 A — 严格 fix:parser 跟踪 #ifdef 嵌套,每个 MacroDefineInfo 记录定义所在的 branch path,调用点按 active branch 过滤。代价:parser 状态机改动较大。

选项 B — 防御式 warning:在 MacroCallSymbol.semanticAnalyze 检测 defList 同时含 AST 形态和 legacy 形态时,emit warning:

const hasAst = defList?.some((info) => info.valueAst != null) ?? false;
const hasLegacy = defList?.some((info) => info.valueAst == null) ?? false;
if (hasAst && hasLegacy) {
  // #if _VERBOSE
  sa.reportWarning(this.location,
    `Macro "${macroName}" has mixed AST/legacy definitions across #ifdef branches; ` +
    `call-site type inference may be unreliable.`);
  // #endif
}
this.hasAstValue = hasAst;

不修底层语义,但至少让用户知道这条调用点行为可能有问题——把"静默错误"降级为"显式 warning"。

选项 C — 文档列入 Known limitation:明确告诉用户"同名宏在不同 #ifdef 分支里 value 形态必须一致(要么都含成员访问、要么都不含),否则调用点类型推导可能不准确"。把缺陷转移给文档,但仍然没有任何编译期检查保证用户遵守

关键问题

.some() 在跨分支场景下没有定义清晰的语义。要么 fix(选项 A)、要么 warn(选项 B)、要么明确写入文档(选项 C),但当前实现既不 fix 也不 warn 也不 doc——这是设计缺陷

GuoLei1990

This comment was marked as outdated.

…cean#2980)

Same `#define` repeated in disjoint `#ifdef` branches used to conflate at the
call site: `MacroCallSymbol.hasAstValue` was set from `defList.some(valueAst)`,
so any AST-form entry across any branch silently disabled legacy type
inference — even when the active branch's entry was the legacy form. Result:
call-site `typeInfo` stranded as `TypeAny`, downstream codegen (builtin
overload selection, varying flatten) emitted incorrect GLSL.

Root cause: `macroDefineList` was branch-flat. No data was tracking which
`#ifdef` branch a `#define` was registered in, so consumers couldn't filter
to the entries actually reachable from a call site.

Fix: make `macroDefineList` branch-aware end-to-end.
  - `BranchConstraint` / `BranchSignature` types in common/BaseToken
  - Lexer maintains a branch stack across `#ifdef`/`#ifndef`/`#else`/`#endif`,
    stamps every emitted token's `branch` field, attaches a snapshot to
    every registered `MacroDefineInfo`
  - `Lexer.sameBranch` / `Lexer.isVisibleFrom` as static helpers
  - `MacroCallSymbol.semanticAnalyze` filters `defList` by `isVisibleFrom`
    against the call site's own branch signature, then derives
    `hasAstValue` / `isFunctionLikeMacro` / `referenceSymbolNames` from
    only the visible entries
  - `MacroDefine.semanticAnalyze` upgrade matches branch signatures so
    AST `valueExpression` lands on the right branch's entry

Tests: define-mixed-form-repro (AST/legacy mixed across `#ifdef`),
define-branch-scoped-ast (both AST forms, different members),
define-nested-ifdef (nested branches register under combined signatures).
176/176 shader-lab tests pass.
@GuoLei1990
Copy link
Copy Markdown
Member Author

新增 commit e31c24544:第一性原理修复 mixed-form macro bug

详细见 #2980 (comment)

简要:把 macroDefineList 改成分支感知的,从根上消除"跨 #ifdef 分支启发式 → call site 类型 stranded TypeAny"这条路径。修了 issue 里之前列为 follow-up 的 nit。

性能(本 commit 后重测,4 rounds × 50 samples,median-of-medians)

Shader dev/2.0 PR HEAD 提升
PBR (complex) 49.50ms 40.75ms −17.7% (−8.75ms)
waterfull (medium) 10.00ms 7.30ms −27.0% (−2.70ms)
macro-pre 1.30ms 1.10ms −15.4% (−0.20ms)
multi-pass 2.40ms 2.00ms −16.7% (−0.40ms)

数值跟 PR body 里之前的 −21.9%/−28.6% 略有差异(同 session 取样的随机波动 ±5%),趋势一致。本 commit 引入分支栈维护 + 每 token branch tagging,理论上有微小开销,但实测没有引入回退:4 个 shader 全部仍然显著快于 dev/2.0 baseline。

测试

176/176 shader-lab 测试全过,含 3 个新增 regression:

  • define-mixed-form-repro.shader(AST/legacy 跨分支)
  • define-branch-scoped-ast.shader(同 AST 形态、不同 member)
  • define-nested-ifdef.shader(嵌套 #ifdef 组合签名)

GuoLei1990

This comment was marked as outdated.

@cptbtptpbcptdtptp
Copy link
Copy Markdown
Collaborator

Fact-check:4 个 #define lexer 边界 bug(实测 hard parse error,建议升级)

切到 PR HEAD e31c245 直接跑通了 4 个最小 repro,发现 4 处 hard parse error / 内部状态退化。原本我以为是 silent miscompile,实测后3 条要升 P1


🟠 P1-1:_defineHasValue 不剥离注释 → AST 误路由 → parse 死

Repro

#define HP /* a.b */ highp

实测

Unexpected token highp
parse threw: AssertionError: 'Shader pass "verify-c1-comment-with-…' was thrown

根因Lexer._defineHasValue. 时直接遍历原始字符流,注释里的 . 被当成成员访问 → 路由 AST → highp 不是合法 assignment_expression 起始 → fail。

// Lexer.ts:631-642
for (let k = i; k < len; k++) {
  const c = src.charCodeAt(k);
  if (c === 10 || c === 13) break;
  if (c !== 46 /* `.` */) continue;
  // ... 没考虑 // 和 /* */
}

修法:扫 . 循环里同步跳 //…/* … */_skipInlineSpaceAndComments 已有同款逻辑,可抽公共。


🟠 P1-2:_defineHasValue 不跟随 \\\n → legacy 误路由 → .field 漏在 directive 之外 → parse 死

Repro

#define UV foo \
            .field

实测

CompilationError: Unexpected token .
|···#define UV foo \
|···                  .field
                      ^

根因:第一行无 ._defineHasValue 返回 false → 走 legacy → _scanUtilBreakLine 在第一个 \n 处停(同样不认 \\\n 行连续)→ .field 漏到 directive 外面 → 作为顶层 token,. 直接 parse 失败。

修法_defineHasValue 内识别 \\\n 跨行;同理 _scanUtilBreakLine 也要认行连续。


🟡 P2-3:_defineDirectiveReg 对带 \\\n 的 directive NO MATCH → 内部状态退化

Repro

#define LONG_VAL  v.v_uv \
                + v.v_uv
...
gl_FragColor = vec4(LONG_VAL, 0.0, 1.0);

实测

[CLAIM 3] frag contains 'LONG_VAL': true   ← 调用点没替换
[CLAIM 3] frag tail:
  #define LONG_VAL v_uv + v_uv
  varying vec2 v_uv;
  void main() { gl_FragColor = vec4 ( LONG_VAL , 0.0 , 1.0 ) ; }

外加 2 条 spurious warning:Please sure the identifier "v" will be declared before used

因果链实证

  1. _defineDirectiveReg = /^\s*#define\s+(\w+)[ ]*(\(([^)]*)\))?(?:[ \t]+([^\n\r]*?))?\s*$/ 里值组 [^\n\r]*? 排除 \n,但切片保留原文 \\\n → regex NO MATCH(独立 Node REPL 验证过)
  2. _registerMacroDefine 早 return,macroDefineList[LONG_VAL] 未填充
  3. → 调用点 LONG_VAL 被 lexer 识别为 ID(不是 MACRO_CALL,lexer dispatch 看 macroDefineList[word]
  4. → ShaderLab 把 v.v_uv 当顶层语句,v 未声明 → spurious warning
  5. → AST 路径的 MacroDefine.semanticAnalyze fallback push fresh entry 并 attach valueAst#define LONG_VAL v_uv + v_uv 还是 codegen 出来了
  6. → driver 文本替换救场,最终 GLSL 能编译,但 ShaderLab 内部语义层完全错乱

driver 救场掩盖了 bug,但内部 invariant 已破。

修法:parse 前先把 \\\n 替换成空字符串,或 regex 显式允许行连续。


🟠 P1-4:_scanMacroDefineParams 不认 \\\n → params 跨行 parse 死

Repro

#define MAX3( \
          a, b, c \
        ) max(max(a, b), c)

实测

CompilationError: Unexpected token ,
|···#define MAX3( \
|···                  a, b, c \
                       ^

直接 parse 失败。比预测的「\ 字符塞进 buffer」严重得多。

修法_scanMacroDefineParams 循环里识别并跳过 \\\n


真实 shader 触发面

4 条都是合法 GLSL ES 3.00 §3.4,但出现频率不同:

  • P1-1(注释含 .):在「文档化注释」里非常常见——版本号 // Based on FXAA 3.11、URL // see https://example.com/...、引用 // see eq. 4.7 of …最容易踩
  • P1-2 / P1-4(行连续):长 PBR/FXAA chunk 里偶现,移植类 shader 风险较大。
  • P2-3:同 P1-2 触发条件。

测试建议

把 4 个 repro shader 加进 tests/src/shader-lab/shaders/

  • define-comment-with-dot-repro.shader
  • define-cont-line-dot-repro.shader
  • define-line-cont-use-repro.shader
  • define-params-line-cont-repro.shader

我本地 4 个 repro 都已构造完成,需要 PR 的话可以推过来。


验证脚本

我用 fork 上的 e31c245 跑的,方法:

git clone --depth=1 -b feat/shaderlab-define-ast-firstclass https://github.com/hhhhkrx/runtime engine-pr-2974
cd engine-pr-2974
pnpm install --prefer-offline
pnpm run b:module
# 添加 4 个 _verify-*.shader 到 tests/src/shader-lab/shaders/
# 添加对应 it() block 到 ShaderLab.test.ts
HEADLESS=true npx vitest run tests/src/shader-lab/ShaderLab.test.ts -t "verify-c"

@cptbtptpbcptdtptp
Copy link
Copy Markdown
Collaborator

Fact-check 第二轮:剩余 P3 项目

把上一轮 review 里没实测的 P3 也走一遍。结论是 5 条 P3 里 4 条是伪问题,可关闭;只有「代码冗余」一条是真的,但确实只是冗余、不是 bug。


✅ 关闭:「TrivialNode elision 下游兼容性」(原 review 担忧 instanceof TrivialNode 失效)

grep -rn "TrivialNode" packages/shader-lab/src
packages/shader-lab/src/lalr/Utils.ts:33:    const pool = astTypePool ?? ASTNode.TrivialNode.pool;
packages/shader-lab/src/lalr/Utils.ts:37:      // to a semantic-empty `TrivialNode` wrapper. Elide it at reduce time
packages/shader-lab/src/parser/AST.ts:120:  export class TrivialNode extends TreeNode {}

全代码库 0 处 instanceof TrivialNode 检查。elision 后没有消费者会受影响,伪问题。


✅ 关闭:「Lexer stateful flag 跨 parse 串状态」

// ShaderLab.ts:64
const lexer = new Lexer(noIncludeContent, macroDefineList);

每次 parse 都 new Lexer,所有 _inMacroDefineValue / _macroDefineExpectsNameToken / _macroDefineExpectsParamsToken / _branchStack / _pendingBranchPushDefined 都靠默认初始化器,不存在复用。

BaseToken 有 pool 复用,但 BaseToken.set() 里显式 this.branch = EMPTY_BRANCH;,已 reset。伪问题。


✅ 关闭:「release 启用类型传播可能引入回归」

AST.tsAssignmentExpression.semanticAnalyze / Expression.semanticAnalyze 上的 // #if _VERBOSE 围栏被删 → release 现在也跑类型传播。这是意图的(修 (v).v_uv 在 release 下 flatten 失效的 baseline bug)。

实测:跑全套 tests/src/shader-lab/ShaderLab.test.ts

Test Files  1 passed (1)
Tests  37 passed (37)   ← 33 原有 + 4 我新加的 verify-c

33 个原有测试(含 release-mode glslValidate ×N)全绿。CI 也全绿。无回归。


✅ 关闭:「测试断言过宽(normalize/dot/texture2D 子串)」

coderabbitai 已经提出,作者已标 P3-不阻塞。我重复一遍没意义,撤回。


🟢 保留 P3:parseMacroParamList_registerMacroDefine 内联实现重复

两份实现各 6 行,逻辑完全一样:

// Lexer.ts:766 (内联)
const params = paramsStr
  ? paramsStr.split(",").map((p) => p.trim()).filter(Boolean)
  : [];
// ParserUtils.ts:23 (parseMacroParamList)
const inner = lexeme.replace(/^\s*\(\s*|\s*\)\s*$/g, "");
if (!inner) return [];
return inner.split(",").map((s) => s.trim()).filter(Boolean);

两份逻辑保持同步是 fragility 来源(且 parseMacroParamList 多了一层 (...) 剥壳逻辑)。但确实只是冗余,无 bug,可作为 follow-up。


最终评级汇总(取代上一轮 review 的判断)

问题 实测后评级 状态
_defineHasValue 不剥离注释 🟠 P1 待修
_defineHasValue 不跟随 \\\n 🟠 P1 待修
_scanMacroDefineParams 不认 \\\n 🟠 P1 待修
_defineDirectiveReg\\\n NO MATCH 🟡 P2 待修(driver 救场,但内部状态退化)
parseMacroParamList 实现重复 🟢 P3 可作为 follow-up
TrivialNode elision 下游兼容性 关闭(伪问题)
Lexer stateful flag 复用 关闭(伪问题)
release 启用类型传播 关闭(伪问题,意图行为且无回归)
测试断言过宽 关闭(与 coderabbitai 重复)

净增的待修 issue:3 P1 + 1 P2,全部围绕 \\\n 行连续 + 注释 / . 检测的 lexer 边界。

@zhuxudong
Copy link
Copy Markdown
Member

Bug:commit e31c24544 的 branch tracking 漏掉 #if / #elif,导致栈错配

切到 PR 最新 HEAD(e31c24544)静态审 Lexer.tokenize 的栈维护逻辑,发现 #if#elif 没被处理,会把外层 #ifdef/#ifndef 的 constraint 错弹。

代码现场

Lexer.tstokenize() 内的栈维护 switch:

switch (tok.type as Keyword) {
  case Keyword.MACRO_IFDEF:
    this._pendingBranchPushDefined = true;
    break;
  case Keyword.MACRO_IFNDEF:
    this._pendingBranchPushDefined = false;
    break;
  case Keyword.MACRO_ELSE: {
    const top = this._branchStack[this._branchStack.length - 1];
    if (top) this._branchStack[this._branchStack.length - 1] = { name: top.name, defined: !top.defined };
    break;
  }
  case Keyword.MACRO_ENDIF:
    this._branchStack.pop();
    break;
}

MACRO_IFMACRO_ELIF 没有 case —— #if expr 不动 stack,但 #endif 总是 pop。

失败序列

#ifdef A           // push [{A, true}],stack = [A]
  #if FOO          // 不 push,stack = [A]
  #endif           // pop!stack = [] —— A 被错弹
  #define X v.uv   // X.branch = [],错记成 top-level
#endif             // pop empty stack(silent no-op)

期望:X.branch === [{A, true}](在 #ifdef A 内)
实际:X.branch === [](被认为是 top-level 全局可见)

影响

MacroCallSymbol.semanticAnalyze 通过 isVisibleFrom(info.branch, callSiteBranch) 过滤可见 def。X 的 branch 被错记为空 → 任何 callsite 都"可见"。

跨分支混合形态时受影响:

#ifdef A
  #if FOO #endif       // 错配 pop A
  #define X v.uv       // 错记 branch = []
#endif

#ifdef B
  #define X 42         // branch = [B]
#endif

#ifndef A              // callsite branch = [{A, false}]
  ...vec3(X)...
#endif

callsite branch = [{A, false}],按 visibility filter:

allAst = false(混合),hasAstValue = false。但正确行为应该是 def #1#ifndef A 上下文里不应可见——它是 #ifdef A 内定义的。

更直接的影响:内层 #define 被错认为 top-level macro,runtime expansion 时若用户在 #ifdef A 外触发该 macro 名(理论上不该可见),shader-lab 仍把它当 MacroCall 处理(因为 macroDefineList 是 flat 的,name 唯一)—— hasAstValue 等元信息走错分支。

注释里的 claim 不成立

isVisibleFrom 注释说:

Conservative for #if expr (not modeled — its branch is empty, so compatible with everything)

实际实现上 #if 根本没 push 占位符——isVisibleFrom 看到的是错弹后的栈状态,不是 #if expr 自身的"empty branch"。结果不是 conservative,是 active corruption:外层 #ifdef 的 constraint 被 #endif 错弹了

修复

#if / #elif 必须 push 占位符维持栈平衡:

case Keyword.MACRO_IF:
  // Push an empty-name placeholder. `isVisibleFrom` skips constraints
  // whose name doesn't appear on the call-site signature, so the
  // placeholder is conservatively visible from anywhere — but keeps
  // the stack depth correct so a subsequent `#endif` pops the right level.
  this._branchStack.push({ name: "", defined: true });
  break;

case Keyword.MACRO_ELIF:
  // `#elif` ends the previous `#if`/`#elif` branch and begins a new one
  // at the same depth. Replace the top placeholder rather than push
  // (otherwise the stack would grow unboundedly across alternatives).
  if (this._branchStack.length > 0) {
    this._branchStack[this._branchStack.length - 1] = { name: "", defined: true };
  }
  break;

name === "" 的 placeholder 在 isVisibleFromif (d.name === c.name && d.defined !== c.defined) 检查里永远不会触发(callsite 不会有 name === "" 的 constraint),所以 placeholder 保持 conservative 可见——但栈深度正确#endif 不再错弹外层。

真实命中

#if expr 在 GLSL shader 里非常常见:

#if SCENE_DIRECT_LIGHT_COUNT > 0
#if FOO_VERSION >= 2
#if (COLOR_SPACE == 1)

任何在 #ifdef X ... #if Y #endif ... #define Z ... #endif 嵌套结构里的 shader 都会撞栈错配。Common.glsl / Light.glsl / BSDF.glsl 这类公共 chunk 里很可能有这种模式。

测试建议

补两个 regression:

// 1. 栈平衡测试:#ifdef + #if + #endif + #define + #endif
//    断言:内层 #define 的 branch 包含外层 #ifdef 的 constraint
#ifdef A
  #if FOO
  #endif
  #define INNER_X v.field
#endif

// 2. #elif 测试
#ifdef A
  #if F
    #define BR_F 1
  #elif G
    #define BR_G 2
  #endif
  #define OUTER_A 3
#endif

测试需要直接 inspect macroDefineList 中各 entry 的 branch 字段,单纯"shader 能编译"过不了这个 regression(runtime macro 展开不依赖 branch tracking 状态,错配只在 shader-lab 编译期的 hasAstValue 推导里反映)。

Reviewer fact-check identified 4 issues where the `#define` scan path
diverged from C/GLSL preprocessor rules. All four share the same root
cause: directive-scanning helpers each rolled their own
"scan-to-end-of-line" loop without honoring `\` + newline
line-continuation or comments.

  - P1-1: `_defineHasValue` traversed raw chars, so a `.` inside a
    block comment (`#define HP /* a.b */ highp`) wrongly routed the
    directive to the AST path; `highp` is not a valid expression
    starter and parse failed.
  - P1-2: `_defineHasValue` bailed at the first `\n`, so a line-
    continued `.field` (`#define UV foo \\n .field`) was missed and
    the directive misrouted to the legacy path; `.field` then leaked
    out as a stray top-level token.
  - P1-4: `_scanUtilBreakLine` and `_scanMacroDefineParams` likewise
    stopped at the first `\n`, so multi-line function-like macro
    headers (`#define MAX3( \\n a, b, c \\n ) …`) lost the trailing
    physical lines and pushed `\` `\n` into the params lexeme,
    breaking the grammar.
  - P2-3: `_defineDirectiveReg`'s value group rejects newlines, so a
    directive slice still containing `\` + `\n` would NO MATCH and
    `_registerMacroDefine` silently skipped registration. Driver
    text-substitution masked the issue, but ShaderLab's internal
    invariant was broken (call site was tagged ID instead of
    MACRO_CALL, raising spurious "v not declared" warnings).

Fix consolidates two layers of duplicated scanning into shared atoms:

  BaseLexer
    + _skipBlockComment / _skipLineComment  ← single source of truth
                                              for comment scanning
    skipCommentsAndSpace                    ← uses the atoms

  Lexer
    + _skipLineContinuation                 ← single source of truth
                                              for `\` + newline
    + _skipNonSemantic                      ← uses comment atoms +
                                              continuation atom
    + _lineContinuationReg                  ← regex form for
                                              one-shot string fold
    _scanUtilBreakLine                      ← uses _skipLineContinuation
    _scanMacroDefineParams                  ← uses _skipLineContinuation
    _registerMacroDefine                    ← folds before regex match
    _defineHasValue                         ← uses _skipNonSemantic

Also fixes a latent BaseLexer block-comment bug: an unterminated
`/* …` would advance `index` two past the source end, harmless at
top-level but fragile. The new `_skipBlockComment` clamps to `len`.

Tests: 4 new regression fixtures (define-comment-with-dot,
define-line-continuation-member-access, define-line-continuation-no-dot,
define-multiline-params). 180/180 shader-lab tests pass including
PrecompileBenchmark.
GuoLei1990

This comment was marked as outdated.

`tokenize`'s branch-stack switch only handled `#ifdef`/`#ifndef`/`#else`/
`#endif`. `#if expr` and `#elif` were silently dropped: `#if` opened a
conditional level without pushing onto the stack, so the matching
`#endif` would pop the wrong level — the outer `#ifdef A` constraint —
leaving any `#define` after the inner `#endif` registered with an empty
branch signature instead of `[{A, true}]`.

Real shaders nest these constantly. `packages/shader/src/shaders/`
alone has 37 `#if` directives, with structures like Fog.glsl's
`#if SCENE_FOG_MODE != 0` wrapping `#if SCENE_FOG_MODE == 1`, and
BlendShape.glsl's chains of `#if defined(...)`. Anything inside such a
nest emits `#define`s with a corrupted branch field, breaking
per-branch visibility filtering at call sites.

Fix: `#if` pushes a shared sentinel constraint (`name === ""`,
`defined === true`) — `isVisibleFrom` ignores empty-name entries in
its polarity check, so the sentinel is conservatively visible
everywhere, but it occupies one stack slot so `#endif` pops the right
depth. `#elif` doesn't change depth (it's another arm at the same
level) so no case is needed there.

Sentinel is shared across all `#if` opens to avoid per-token
allocation. `#else` flipping a sentinel's `defined` is harmless since
`isVisibleFrom` ignores empty-name entries either way.

Test: `define-if-stack-balance.shader` covers `#ifdef A / #if expr /
#endif / #define X / #endif` and `#ifdef A / #if F / #elif G / #endif /
#define Y / #endif`. 38/38 shader-lab tests pass.
@GuoLei1990
Copy link
Copy Markdown
Member Author

@cptbtptpbcptdtptp Fact-checked 你两轮 review 中的所有 P1/P2 项,5 项都是真 bug,全部已修。

修复 commit

  1. a7a4948cbfix(shader-lab): comments/line-continuation in #define lexing

    • P1-1 _defineHasValue 不剥离注释 → #define HP /* a.b */ highp 误路由 AST 导致 parse 死
    • P1-2 _defineHasValue 不跟随 \\\n#define UV foo \\\n .field 误路由 legacy 导致 .field 漏出
    • P1-4 _scanUtilBreakLine / _scanMacroDefineParams 不认 \\\n#define MAX3( \\\n a, b, c \\\n ) parse 死
    • P2-3 _defineDirectiveReg\\\n NO MATCH → registration 静默失败、内部状态退化
  2. ca950efcafix(shader-lab): track #if/#endif depth in branch stack

    • #if expr / #elif 的栈错配 bug(你第二条 review 报告的)→ 内嵌 #if .. #endif 导致外层 #ifdef A 的 constraint 被错弹

第一性原理修法

4 个 \\\n/注释相关 bug 同源——Lexer 多个扫描函数各自手写"扫到行尾"循环,都没遵守 GLSL preprocessor 规范的注释透明 + 行连续规则。

修复抽出 3 层公共原子:

  • BaseLexer._skipBlockComment / _skipLineComment(原子) — 单一来源的注释扫描
  • Lexer._skipLineContinuation — 单一来源的 \\\n 检测
  • Lexer._skipNonSemantic — 用上面三个原子组合
  • Lexer._lineContinuationReg — 一次性 string fold(regex 形式,服务 _registerMacroDefine

BaseLexer.skipCommentsAndSpace 也改用同一对注释原子——普通代码路径和 macro 路径现在共用同一份注释扫描逻辑,未来不会再出"两边各自维护、漏改一边"的问题。顺便修了 BaseLexer 一个 latent bug:/* 不闭合时 index 越过 source 末尾。

#if/#elif 栈错配修法用 sentinel 单例({name:"", defined:true})而非 push 真 constraint:占一个栈深度但 isVisibleFrom 永远忽略 empty name 的 polarity 检查 → conservative 可见。#elif 不需要 case(同层 noop,replace top 是死代码)。比"push 真 constraint" 简单:3 行 case + 1 个静态常量。

测试覆盖

5 个新 regression fixture:

  • define-comment-with-dot.shader (P1-1)
  • define-line-continuation-member-access.shader (P1-2)
  • define-line-continuation-no-dot.shader (P2-3)
  • define-multiline-params.shader (P1-4)
  • define-if-stack-balance.shader (#if 栈错配)

38/38 shader-lab 测试 + 全 PrecompileBenchmark 通过

性能(5 轮 × 50 samples,median-of-medians)

Shader dev/2.0 baseline PR HEAD 改善
PBR (complex) 49.60ms 44.30ms −10.7%
waterfull (medium) 9.90ms 7.95ms −19.7%
multi-pass 2.40ms 2.10ms −12.5%
macro-pre 1.30ms 1.10ms −15.4%

新增的 5 个 fix(含 sentinel push 每个 #if、3 处 _skipLineContinuation helper、_registerMacroDefine 的 regex fold)相对前一个 commit 实测无可观测回退。4 个核心 shader 仍全面优于 dev/2.0

剩下的 follow-up

parseMacroParamList vs _registerMacroDefine 内联 split 的 P3 重复——你确认是 follow-up,本 PR 不动。

@cptbtptpbcptdtptp
Copy link
Copy Markdown
Collaborator

增量 CR — a7a4948c + ca950efc 实测

切到最新 HEAD ca950efc,把上轮 4 个 reviewer fact-check repro 重跑一遍。

修复结果

Repro 上轮(e31c245 现在(ca950ef
#define HP /* a.b */ highp parse 死 ✅ 通过,输出 #define HP /* a.b */ highp
#define UV foo \\\n .field parse 死 ✅ 通过,输出 #define UV foo.field
#define LONG_VAL v.v_uv \\\n + v.v_uv 调用点 调用点退化为 ID + spurious warning ✅ 通过,输出 #define LONG_VAL v_uv + v_uv(varying flatten 生效,无 warning)
#define MAX3( \\\n a, b, c \\\n ) … parse 死 ✅ 通过,参数表正常识别

4 个 P1/P2 全修,加上新发现的 #if 嵌套 stack balance bug,5 个根因 fix 一并落地。

实现质量观察

  1. _skipBlockComment / _skipLineCommentBaseLexer_skipLineContinuation / _skipNonSemanticLexer — 之前散在 _defineHasValue / _skipInlineSpaceAndComments / _scanUtilBreakLine / _scanMacroDefineParams 四处的临时实现,现在都走单 source of truth。这正是 review 里提到的「分散度」问题,作者直接做了正确的解。
  2. _lineContinuationReg 配合 _registerMacroDefine 的 fold-before-match — 优雅修了 P2-3 的 regex NO MATCH。
  3. 顺手修的 BaseLexer block-comment unterminated 越界 — baseline 长期潜在 bug,新代码用 _skipBlockCommentlen 钳位掉,没增加额外测试但风险消除。
  4. _IF_SENTINEL 处理 #if expr stack balance — 之前 #if 不入栈但 #endif 出栈,深嵌套(Fog.glsl / BlendShape.glsl 这类 chain)下分支签名彻底错乱。sentinel 用 name === ""isVisibleFrom 跳过 polarity 检查,既保 stack 平衡又保留「empty signature 处处可见」的语义。共享 sentinel 实例避免每次 #if 分配。#elif 不入栈正确(同层另一臂)。
  5. #else 翻转 sentinel 的 defined 用新对象赋值而非 mutate 共享对象——细节正确,避免共享 mutate 副作用。

潜在 nit(非阻塞)

  • _scanMacroDefineParams\\\n只跳行连续字符本身,不归一化前后空白,所以 c4 的输出是 #define MAX3( a, b, c ) max(...)——driver 不在乎,但视觉冗余。可在 buffer push 时把行连续位置塞个空格、或者把多个空白折叠。Cosmetic only。
  • _scanMacroDefineParams 的循环结构是「先尝试跳行连续;不是行连续才 push 当前 char」。但字符 \ 本身在 GLSL ES #define 内非法——若行连续判定失败(例如 \ 后不是换行而是 ID),buffer 会包含 \。罕见,等同 baseline 行为。

测试套件

tests/src/shader-lab/ShaderLab.test.ts:38/38 PR 自带测试 + 4 个我的 verify case → 42/42 全绿
PrecompileBenchmark.test.ts:6/6 全绿。

性能(10 runs verbose build)

Shader baseline e31c245 ca950ef 当前 vs baseline
PBR 54.80ms 50.00ms 55.40ms 持平
waterfull 12.00ms 8.60ms 9.60ms −20%
multi-pass 2.80ms 2.30ms 2.90ms 持平
macro-pre 1.60ms 1.30ms 1.40ms −12.5%

新 commit 加的 comment/line-cont 处理 + sentinel 让 PBR 的加速被吃掉一些,但仍处 baseline 量级。其他 shader 维持加速。没有性能劣化

整体判断

这一轮修复扎实——根因 fix(不是 patching),抽离单 source of truth,顺手治了一个 baseline 长期 bug,测试覆盖到位(4 个新 regression fixture 命名一一对应 reviewer claim)。

✅ 可以合并。


verify 脚本:

git clone --depth=20 -b feat/shaderlab-define-ast-firstclass https://github.com/hhhhkrx/runtime engine-pr-2974
cd engine-pr-2974 && pnpm install --prefer-offline && pnpm run b:module
# 添加 4 个 _verify-*.shader 到 tests/src/shader-lab/shaders/
# 添加对应 it() block 到 ShaderLab.test.ts
HEADLESS=true npx vitest run tests/src/shader-lab/ShaderLab.test.ts -t "verify-c"

@zhuxudong
Copy link
Copy Markdown
Member

Bug:#elif arm 的 #define 被错误标记为前一 arm 的 branch

ca950efca#if 加了 sentinel 维持栈深度,但 #elif 仍然没 case——commit message 说 "doesn't change depth, no case is needed"。深度对,但 polarity 没翻

触发

#ifdef A
  #define X1 ...   // branch=[{A, true}] ✓
#elif B
  #define X2 ...   // branch=[{A, true}] ❌ — 实际只在 !A 时定义
#endif

对比 #else 的处理(line ~210,翻转 top polarity):#elif 在语义上等价于 #else + #if,应该至少翻转外层 polarity 以 capture "previous arm 不成立",而不是继承前一 arm 的 branch tag。

影响

MacroCallSymbol.semanticAnalyzeisVisibleFrom 过滤 def 的逻辑双向都可能错:

  • callsite 在 #ifdef X 里 → filter 说 X2 可见(branch=[X=true] 匹配),但 runtime 实际 X=true 时 #elif arm 不进入,X2 没定义 → 调用点元信息错误推到 X2
  • callsite 在 #ifndef X 里 → filter 说 X2 不可见(mutually exclusive),但 runtime 实际 !X && Y 时 X2 应该可见 → 错误排除

真实命中

$ grep -rn "^[ \t]*#elif" packages/shader/src/
shaders/Fog.glsl:14:        #elif SCENE_FOG_MODE == 2
shaders/Fog.glsl:17:        #elif SCENE_FOG_MODE == 3
shaders/shadingPBR/FragmentPBR.glsl:188:    #elif defined(HAS_DERIVATIVES)
shaders/shadingPBR/BSDF.glsl:398:        #elif REFRACTION_MODE == 1

FragmentPBR.glsl:188#elif defined(...) —— 任何用 #ifdef X / #elif defined(Y) 模式的代码都会撞。

修复

#elif 把 top 降级到 sentinel(丢失 polarity 精度但不会错误继承):

case Keyword.MACRO_ELIF: {
  // `#elif` semantically `#else + #if`. We can't model the elif expression,
  // but inheriting the previous arm's `[A=true]` is wrong — this arm is only
  // active when `!A` holds. Replace top with sentinel: drops precision (we
  // no longer constrain on A's polarity), but never wrongly tags defs as
  // belonging to the previous arm.
  const top = this._branchStack[this._branchStack.length - 1];
  if (top && top.name !== "") {
    this._branchStack[this._branchStack.length - 1] = Lexer._IF_SENTINEL;
  }
  break;
}

为什么不直接翻 polarity(像 #else 那样):因为 #elif#if A / #elif B / #elif C 第三 arm 是 !A && !B && C,连续翻转会变 ping-pong(A=true → A=false → A=true),第三 arm 错误标记为 A=true。降级到 sentinel 一致正确。

测试

#ifdef A
  #define IN_A 1
#elif B
  #define IN_ELIF 2
#endif

断言:

  • IN_A.branch === [{name: "A", defined: true}]
  • IN_ELIF.branch === [{name: "", defined: true}](sentinel,不是 [A=true]

GuoLei1990

This comment was marked as outdated.

`#elif` opens a new arm at the same depth, but its actual condition is
"none of the previous arms held AND <elif expr>". Without a switch case,
the new arm inherits the previous arm's branch tag — exactly the
opposite of where it's active:

  #ifdef A
    #define X1 ...   // branch=[{A, true}]   ✓
  #elif B
    #define X2 ...   // branch=[{A, true}]   ✗ — X2 is only active when !A
  #endif

The bug shows up at call sites: `MacroCallSymbol.semanticAnalyze` filters
defs by `isVisibleFrom`, so a callsite under `#ifdef A` would wrongly
treat X2 as visible (X2 inherits `[A=true]`), and a callsite under
`#ifndef A` would wrongly treat X2 as excluded. Engine shaders use this
pattern: FragmentPBR.glsl:188 has `#ifdef RENDERER_HAS_NORMAL / ... /
#elif defined(HAS_DERIVATIVES)`. User shaders are likely to as well.

Fix: replace the top constraint with the `#if` sentinel on every `#elif`.
Drops the precision of the previous arm's polarity (the new arm is no
longer constrained on `A`), but never wrongly inherits the wrong
polarity. Considered flipping polarity instead — works for the first
`#elif` of a chain, but `#ifdef A / #elif B / #elif C` would ping-pong
A's polarity. Sentinel is uniformly correct.

This was missed in `ca950efca`, where the commit message claimed
"`#elif` doesn't change depth, no case is needed" — depth is unchanged,
but polarity is part of the stack state and must be cleared.

Test: `define-elif-polarity.shader` covers `#ifdef A / #elif defined(B)
/ #elif defined(C)` chains. 39/39 shader-lab tests pass.
@GuoLei1990
Copy link
Copy Markdown
Member Author

@cptbtptpbcptdtptp 你这条 review 完全准确,bug 已修:commit `b435e83c7`。

实测复现(修前):
```
[DEBUG] IN_A.branch=[{"name":"A","defined":true}] ✓
[DEBUG] IN_ELIF.branch=[{"name":"A","defined":true}] ✗ (should not inherit A's polarity)
```

`ca950efca` 那条 commit message 我写的 "`#elif` doesn't change depth, no case is needed" 是错的——深度对,但 polarity 也是栈状态的一部分,没翻转就让 `#elif` arm 继承了前一 arm 的 tag。

修法采用你的简单方案(任何 `#elif` 把 top 替换为 sentinel)。也想过精确方案(第一次 `#elif` 翻 polarity 保留 `[A=false]`,后续 `#elif` 才降级),但:

  1. 需要 per-stack-level state 追踪"是否已 elif 过"
  2. 引擎自身 `#elif` arm 内没 `#define`(FragmentPBR.glsl:188 的 elif arm 只有变量声明),精度提升用不上
  3. 你说的 ping-pong 风险确实存在(`#ifdef A / #elif B / #elif C` 第三 arm 实际是 `!A && !B`)

新增 regression:`define-elif-polarity.shader` 覆盖 3-arm `#elif defined(...)` chain。39/39 测试通过。

Copy link
Copy Markdown
Collaborator

@cptbtptpbcptdtptp cptbtptpbcptdtptp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@cptbtptpbcptdtptp cptbtptpbcptdtptp merged commit a6f0504 into galacean:dev/2.0 Apr 28, 2026
12 checks passed
Copy link
Copy Markdown
Member Author

@GuoLei1990 GuoLei1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review (Round 20) — incremental on b435e83c7

已关闭问题清单

问题 关闭原因
Prettier lint (VisitorContext.ts:117, GLESVisitor.ts:299) 已修复 (commit 65e79b3)
cptbtptpbcptdtptp: let i=0,n=symbols.length pattern 已修复 (commit 0a056ae)
coderabbitai: test assertions too broad P3 建议,不阻塞
referenceStructPropByName 死代码 (P3) 已修复 (commit 0a056ae)
[P1] Lexer.ts — _macroDefineExpectsParamsToken 无条件置 true 已修复 (commit 79511dc)
[P1] CodeGenVisitor.ts — visitMacroCallFunction struct-arg 过滤对 function-like macro 错误生效 已修复 (commit fdcb7ea)
zhuxudong Blocker 2: TargetParser.y 未同步 已修复 (commit 828ac81)
zhuxudong Blocker 1 (部分): 非 alpha 首字符的 non-expression replacement list 已修复 (commit 8f589ab)
zhuxudong Warning 1: _defineHasValue keyword peek 漏过注释 已修复 (commit 8f589ab + 5b7f8ae)
zhuxudong Blocker 3: macroDefineList 双路径冗余 + 假 warning 已修复 (commit b09fd50)
[P2] Blocker 3 残留:双路径冗余产生假 warning 已修复 (commit b09fd50)
[P2] 性能:纯常量宏不必要走 AST 路径 已修复 (commit 992956f)
[P1] function-like macro body 含 top-level comma parse error 已修复 (commit 992956f)
zhuxudong New Blocker: FXAA type-alias parse error 已修复 (commit 992956f)
[P1] _chunkOutputCache 静态 Map 无限增长 非回归,P1→P2 降级
[P1] _preRegisterGlobalMacroRefs double codegen 保持 P2
[P2] _defineHasValue 命名误导 保持 P3,不阻塞
[P2] MacroDefine.semanticAnalyze upgrade-in-place 隐式时序 已改善 (3668d8c)
[P2] 移除 _VERBOSE 守卫导致 release 多做类型传播 功能正确性必要,TrivialNode elision 已抵消
[P3] MacroCallFunction.init() 创建新数组 仍存在但不阻塞
[P3] _scanMacroDefineParams buffer push + join 仍存在但不阻塞
[P3] #define FOO .5 leading-dot false positive 仍存在但不阻塞
[P2] _isExist 去重丢失 value 字段比较 已消除 (3668d8c)
zhuxudong Nit: .some() cross-branch mixed AST/legacy 已修复 (commit e31c245)
cptbtptpbcptdtptp P1-1: _defineHasValue 不剥离注释 → AST 误路由 已修复 (commit a7a4948)
cptbtptpbcptdtptp P1-2: \+\n line continuation 不被 _defineHasValue 跟随 已修复 (commit a7a4948)
cptbtptpbcptdtptp P1-4: _registerMacroDefine regex 对含 \+\n 的 directive text NO MATCH 已修复 (commit a7a4948)
cptbtptpbcptdtptp P2-3: _scanMacroDefineParams\ \n 推入 params buffer 已修复 (commit a7a4948)
zhuxudong Bug: #if/#elif 未在 _branchStack 中 push/replace → 栈错配 已修复 (commit ca950ef)
[P1] console.log 调试代码残留 (Lexer.ts:842) 已修复 (commit ca950ef)
zhuxudong Bug: #elif arm 继承前一 arm 的 branch tag → 极性错误 已修复 (commit b435e83 — sentinel 降级)

总结

增量审查(第二十轮,最终轮)。审查 commit b435e83c7(fix: degrade #elif arm to sentinel branch)。

b435e83c7#elif sentinel 降级

问题ca950efca#if push sentinel + #endif pop 已经修正了栈深度,但 #elif 被遗漏。在 #ifdef A / #define X1 / #elif B / #define X2 / #endif 中,X2 继承了上一 arm 的 [A=true] tag — 这恰恰是 X2 活跃的条件。zhuxudong 在 review comment 中精确报告了此 bug,并指出 FragmentPBR.glsl:188#ifdef RENDERER_HAS_NORMAL / ... / #elif defined(HAS_DERIVATIVES) 是典型的引擎内部 repro case。

修复策略对比

策略 描述 适用范围 结论
Flip polarity(类似 #else [A=true][A=false] 仅第一个 #elif 正确;#ifdef A / #elif B / #elif C 会在 [A=true][A=false] 之间 ping-pong
Sentinel 降级 替换栈顶为 {name:"", defined:true} 对所有链长度均正确 — empty-name 在 isVisibleFrom 中透明 ✓(选择此方案)
完整条件建模 #elif expr 建模表达式 精确但需完整 constant expression evaluator 过重,不值得

Sentinel 降级是最小正确方案。代价是精度损失:#elif arm 下的 #define 的 branch stamp 不再携带具体约束(因为 sentinel name=""isVisibleFrom 忽略),但绝不会产生 false negative(不会错误排除可达定义)。False positive(让不可达定义通过)在 #ifdef A / #elif B 的场景中发生 — callsite 在 #ifdef A 下时,X2(branch=sentinel)不会被排除,而理想情况下应该排除(因为 #elif B 隐含 !A)。但这只是精度损失,不是正确性问题 — 最终 GLSL 的 #ifdef guard 仍然保证运行时正确性,这里只是编译期 codegen 可能多生成一些不可达 arm 的代码,不影响运行结果。

实现细节分析

case Keyword.MACRO_ELIF:
  if (this._branchStack.length > 0) {
    this._branchStack[this._branchStack.length - 1] = Lexer._IF_SENTINEL;
  }
  break;
  1. length > 0 guard:防御畸形输入(裸 #elif without prior #if/#ifdef)下标溢出。与 #else 分支的 if (top) guard 一致。✅

  2. 共享 _IF_SENTINEL 引用安全性#elif 写入 sentinel 引用到数组槽位。#else 在同一轮(如果 #elif 后跟 #else)会读取 sentinel 并创建新对象 { name: top.name, defined: !top.defined } 替换槽位 — 不修改 sentinel 本身。_branchStack.slice() 在 L194 复制引用而非深拷贝,但 sentinel 作为 frozen-by-convention 的共享对象,引用复制是安全的。✅

  3. 时序#elif token 自身的 branch stamp 在 L194 处发生(switch 之前),携带的是上一 arm 的签名。#elif 之后的 token 看到 sentinel。这与 #ifdef 的行为对称(#ifdef token 不含自己打开的约束),语义正确。✅

  4. #ifdef A / #elif B / #else / #endif 场景

    • #ifdef A arm: [A=true]
    • #elif B arm: sentinel(降级后) — 保守正确
    • #else arm: flip sentinel {name:"", defined:true}{name:"", defined:false} — 两个极性都被 isVisibleFrom 忽略,透明 ✅
    • #endif: pop,栈深度正确 ✅
  5. JSDoc 更新:L196-200 精简了旧注释中关于 "#elif doesn't change depth, no case is needed" 的错误论断。L212-220 的新注释精确解释了 ping-pong 问题和 sentinel 降级的设计理由。L227-229 更新 #else 注释覆盖 #elif 链场景。清晰。✅

测试覆盖

新增 define-elif-polarity.shader(43 行)。测试 #ifdef USE_BR_A / #elif defined(USE_BR_B) / #elif defined(USE_BR_C) 三 arm 链,每个 arm 包含一个 #define。测试结构覆盖了 ping-pong 场景(链长度 > 2)。在 shaderLabReleaseshaderLabVerbose 双模式下验证。39/39 tests pass。✅

Commit message 质量:这是全 PR 中最好的 commit message。精确描述了 bug 的 root cause(继承而非覆盖)、为什么 polarity flip 不够(ping-pong)、为什么选择 sentinel degradation、以及回溯指出 ca950efca commit message 中的错误论断。对未来维护者极有价值。

Nit

  • Nit — ShaderLab.test.ts:475-477 多余空行

    define-elif-polarityfrag-return-vec4 之间有 3 个多余空行(应为 1 个)。这个 nit 从 Round 19 就存在,Round 20 新增测试注册后从 2 行增长到 3 行。无功能影响,建议清理。

P2(重述,无变更)

  • P2 — _chunkOutputCache 静态 Map 无限增长:Preprocessor.ts:36。编辑器场景 concern,运行时安全。
  • P2 — _preRegisterGlobalMacroRefs double codegen:GLESVisitor.ts。宏数量小,绝对成本可接受。

P3(重述,无变更)

  • P3 — _branchStack.slice() per-token 分配:Lexer.ts:194。copy-on-write 可优化但实际嵌套深度极浅。
  • P3 — _defineHasValue 命名误导:建议 _defineNeedsAstPath()
  • P3 — _scanMacroDefineParams buffer push + join:连续子串可用 slice 替代。

总体评价

APPROVE

b435e83c7 关闭了 zhuxudong 报告的最后一个语义 bug:#elif arm 继承前一 arm 的 branch tag。Sentinel 降级是三种方案中唯一对所有链长度均正确的设计(flip 会 ping-pong,完整表达式建模过重)。实现极其精简 — 3 行核心代码 + 15 行注释。

全 PR 共 21 commits, ~4200 行 diff, 39 个测试,全部通过。无 P0/P1。剩余 P2/P3 均不阻塞。

20 轮 review 共发现并关闭 31 个问题(含 7 个 P1)。PR 从初始的双解析器架构一路演进到当前的 Lexer 单一真相源 + 正向 . 路由 + BranchSignature 精确过滤 + sentinel stack balance,每一步都向更正确、更简单的方向收敛。最后两个 commit(ca950efca + b435e83c7)将 #if/#elif/#endif 栈管理做到了 minimal-correct — 不建模表达式,只保证深度和极性不被继承。

Ship it. 🚀

zhuxudong added a commit to zhuxudong/engine that referenced this pull request Apr 28, 2026
Bring in commit a6f0504 (PR galacean#2974) promoting `#define` values into
the AST pipeline so macro-as-type-alias and struct-member references
stop poisoning downstream type inference.

Conflict resolution:
- Map all changes from `packages/shader-lab/` → `packages/shader-compiler/`
  (and `tests/src/shader-lab/` → `tests/src/shader-compiler/`) per our
  earlier rename (commit ee6e8c0).
- Adopt dev/2.0's simplified `Preprocessor` (#include expansion only;
  `#define`/`#ifdef` move into the Lexer's tokenizing pass) on top of
  our prior `basePath` removal — `Preprocessor.parse(source)` is now
  single-arg and looks up include keys verbatim.
- Take dev/2.0's `MacroDefineInfo` shape (with `valueAst` + `branch` +
  `referenceName`) — replaces the legacy `MacroValueType` machinery.
- Take dev/2.0's `_referenceProp` helper consolidation in VisitorContext.
- Take dev/2.0's `canElide` optimization in lalr/Utils.
- Take dev/2.0's `_skipBlockComment` / `_skipLineComment` split in
  BaseLexer; drop the now-dead `ShaderCompilerUtils.skipComment` import.
- Apply the rename consistently to merged content: `ShaderLab*` →
  `ShaderCompiler*` for class/utils references; `shaderLab*` →
  `shaderCompiler*` for test-local instance variable names.
- Drop the trailing `""` basePath argument from new `_parseShaderPass`
  call sites — our signature is 4-arg now.

Existing FXAA_11 workaround (commit a69041d) still on top — to be
reverted in the next commit now that the proper macro AST fix lands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants