Skip to content

Update CANN Open Software License to Version 2.0#5

Merged
Hzfengsy merged 1 commit into
mainfrom
fix_license
Jan 5, 2026
Merged

Update CANN Open Software License to Version 2.0#5
Hzfengsy merged 1 commit into
mainfrom
fix_license

Conversation

@Hzfengsy
Copy link
Copy Markdown
Member

@Hzfengsy Hzfengsy commented Jan 5, 2026

Update the license from CANN Open Software License Agreement Version 1.0 to Version 2.0. Key changes include expanding processor compatibility from Ascend processors to Huawei AI Processors (including Ascend, Kirin, and Yueying branded chipsets), clarifying usage restrictions and distribution terms, and strengthening the termination provisions with enhanced legal remedies.

Update the license from CANN Open Software License Agreement Version 1.0 to
Version 2.0. Key changes include expanding processor compatibility from Ascend
processors to Huawei AI Processors (including Ascend, Kirin, and Yueying
branded chipsets), clarifying usage restrictions and distribution terms, and
strengthening the termination provisions with enhanced legal remedies.
Copilot AI review requested due to automatic review settings January 5, 2026 06:31
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Hzfengsy, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request updates the CANN Open Software License Agreement from Version 1.0 to Version 2.0. The primary goal is to broaden the scope of compatible hardware to include a wider range of Huawei AI Processors, beyond just Ascend, and to refine the legal terms regarding usage, distribution, and termination. This ensures the license accurately reflects the current product ecosystem and strengthens Huawei's intellectual property protections.

Highlights

  • License Version Update: The CANN Open Software License Agreement has been updated from Version 1.0 to Version 2.0.
  • Expanded Processor Compatibility: The definition of compatible processors has been broadened from solely 'Ascend processors' to 'Huawei AI Processors,' which now includes Ascend, Kirin, and Yueying branded chipsets, as well as other Huawei-controlled or manufactured AI chipsets.
  • Clarified Usage and Distribution: Usage restrictions and distribution terms have been refined to align with the expanded processor compatibility, explicitly reserving all non-granted rights to Huawei.
  • Strengthened Termination Provisions: The termination clause now includes enhanced legal remedies for Huawei in case of breach or violation, stating that all granted rights become void ab initio upon certain termination events.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the CANN Open Software License from version 1.0 to 2.0. The changes expand the scope of compatible processors to include more Huawei AI Processors, clarify usage and distribution terms, and strengthen termination clauses. My review identifies a couple of minor opportunities to improve the clarity and formatting of the license text. Specifically, I've suggested a correction for inconsistent formatting in a list of processor brands and a grammatical improvement for better readability.

Comment thread LICENSE
1.1 Software means the APIs, source code files, binaries, and related documents of Compute Architecture for Neural Networks("CANN") that are licensable by Huawei, and provided and licensed under this Agreement.

1.2 Ascend processors means the chipsets branded with "Ascend" that are manufactured and supplied by Huawei.
1.2 Huawei AI Processors mean AI chipsets (i) branded with "Ascend", "Kirin"," Yueying" or other brands owned or controlled by Huawei; or (ii) manufactured (including have manufactured), supplied (including have supplied) or designed (including have designed) by Huawei.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a minor formatting inconsistency in the list of brands. The part "Kirin"," Yueying" has inconsistent spacing. For clarity and consistency, I suggest changing it to "Kirin", "Yueying".

1.2   Huawei AI Processors mean AI chipsets (i) branded with "Ascend", "Kirin", "Yueying" or other brands owned or controlled by Huawei; or (ii) manufactured (including have manufactured), supplied (including have supplied) or designed (including have designed) by Huawei.

Comment thread LICENSE

3. Restrictions
3.1 You are not authorized to, and shall not use, modify, or distribute this Software or its derivative works for any purpose except those expressly permitted by this Agreement. You shall not make any use of the Software or its derivative works to develop or distribute software for use in systems with processors other than Ascend processors.
3.1 You are not authorized to, and shall not use, modify, or distribute this Software or its derivative works for any other purposes than those expressly permitted by this Agreement. You shall not make any use of the Software or its derivative works to develop or distribute any software for use in systems with processors other than Huawei AI Processors. All rights not expressly granted herein are expressly reserved by Huawei.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The phrasing "for any other purposes than" is grammatically a bit awkward. For improved readability, consider rephrasing it to "for any purposes other than".

3.1 You are not authorized to, and shall not use, modify, or distribute this Software or its derivative works for any purposes other than those expressly permitted by this Agreement. You shall not make any use of the Software or its derivative works to develop or distribute any software for use in systems with processors other than Huawei AI Processors. All rights not expressly granted herein are expressly reserved by Huawei.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the CANN Open Software License from Version 1.0 to Version 2.0. The primary changes expand the license scope from Ascend processors to broader Huawei AI Processors (including Ascend, Kirin, and Yueying brands), refine distribution restrictions, and strengthen termination provisions with enhanced legal remedies.

Key Changes

  • Expanded processor definition from "Ascend processors" to "Huawei AI Processors" including multiple Huawei-branded chipsets
  • Added numbering to Section 2 (2.1) and clarified grant language to include systems with Software
  • Enhanced termination provisions with "void ab initio" clause and survivability specifications

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread LICENSE
1.1 Software means the APIs, source code files, binaries, and related documents of Compute Architecture for Neural Networks("CANN") that are licensable by Huawei, and provided and licensed under this Agreement.

1.2 Ascend processors means the chipsets branded with "Ascend" that are manufactured and supplied by Huawei.
1.2 Huawei AI Processors mean AI chipsets (i) branded with "Ascend", "Kirin"," Yueying" or other brands owned or controlled by Huawei; or (ii) manufactured (including have manufactured), supplied (including have supplied) or designed (including have designed) by Huawei.
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an inconsistent spacing issue with the quotation marks around brand names. The term "Kirin"," has a comma inside the closing quote followed by another quote, when it should be "Kirin", with the comma outside. This creates incorrect punctuation.

Suggested change
1.2 Huawei AI Processors mean AI chipsets (i) branded with "Ascend", "Kirin"," Yueying" or other brands owned or controlled by Huawei; or (ii) manufactured (including have manufactured), supplied (including have supplied) or designed (including have designed) by Huawei.
1.2 Huawei AI Processors mean AI chipsets (i) branded with "Ascend", "Kirin", "Yueying" or other brands owned or controlled by Huawei; or (ii) manufactured (including have manufactured), supplied (including have supplied) or designed (including have designed) by Huawei.

Copilot uses AI. Check for mistakes.
Comment thread LICENSE
1.1 Software means the APIs, source code files, binaries, and related documents of Compute Architecture for Neural Networks("CANN") that are licensable by Huawei, and provided and licensed under this Agreement.

1.2 Ascend processors means the chipsets branded with "Ascend" that are manufactured and supplied by Huawei.
1.2 Huawei AI Processors mean AI chipsets (i) branded with "Ascend", "Kirin"," Yueying" or other brands owned or controlled by Huawei; or (ii) manufactured (including have manufactured), supplied (including have supplied) or designed (including have designed) by Huawei.
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a missing space between "Kirin"," and "Yueying". After fixing the quote issue, there should be a space before "Yueying" to properly separate the listed brand names.

Suggested change
1.2 Huawei AI Processors mean AI chipsets (i) branded with "Ascend", "Kirin"," Yueying" or other brands owned or controlled by Huawei; or (ii) manufactured (including have manufactured), supplied (including have supplied) or designed (including have designed) by Huawei.
1.2 Huawei AI Processors mean AI chipsets (i) branded with "Ascend", "Kirin", "Yueying" or other brands owned or controlled by Huawei; or (ii) manufactured (including have manufactured), supplied (including have supplied) or designed (including have designed) by Huawei.

Copilot uses AI. Check for mistakes.
Comment thread LICENSE
1.1 Software means the APIs, source code files, binaries, and related documents of Compute Architecture for Neural Networks("CANN") that are licensable by Huawei, and provided and licensed under this Agreement.

1.2 Ascend processors means the chipsets branded with "Ascend" that are manufactured and supplied by Huawei.
1.2 Huawei AI Processors mean AI chipsets (i) branded with "Ascend", "Kirin"," Yueying" or other brands owned or controlled by Huawei; or (ii) manufactured (including have manufactured), supplied (including have supplied) or designed (including have designed) by Huawei.
Copy link

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase "including have manufactured" appears grammatically awkward. The correct phrasing should be "including having manufactured" to maintain parallel structure with the gerund form.

Suggested change
1.2 Huawei AI Processors mean AI chipsets (i) branded with "Ascend", "Kirin"," Yueying" or other brands owned or controlled by Huawei; or (ii) manufactured (including have manufactured), supplied (including have supplied) or designed (including have designed) by Huawei.
1.2 Huawei AI Processors mean AI chipsets (i) branded with "Ascend", "Kirin"," Yueying" or other brands owned or controlled by Huawei; or (ii) manufactured (including having manufactured), supplied (including having supplied) or designed (including having designed) by Huawei.

Copilot uses AI. Check for mistakes.
@Hzfengsy Hzfengsy merged commit e56fe1a into main Jan 5, 2026
6 checks passed
@Hzfengsy Hzfengsy deleted the fix_license branch January 5, 2026 06:34
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Feb 22, 2026
- Fix conftest docstring to say BEFORE_AND_AFTER (#1, hw-native-sys#5, hw-native-sys#8)
- Align InitMemRef pass property table and docs with code (hw-native-sys#2)
- Add INTERNAL_CHECK for null pass result in Pass::operator() (hw-native-sys#3)
- Switch SSA assignment tracking to pointer identity (hw-native-sys#4)
- Add null instrument checks in RunBeforePass/RunAfterPass (hw-native-sys#6)
- Add strict=True to xfail with descriptive reason (hw-native-sys#7)
- Fix pass_manager.md verification mode to BEFORE_AND_AFTER (hw-native-sys#9)
- Add ExitContext stack invariant check (hw-native-sys#10)
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Feb 22, 2026
- Use memory_order_relaxed for atomic ID counter (comment hw-native-sys#3)
- Fix UniqueId doc: "stable for the lifetime of the process" (comment hw-native-sys#2)
- Clarify ir.pyi docstring: variable identity is part of hash (comment #1)
- Fix Type overload docstring: note enable_auto_mapping scope (comment hw-native-sys#4)
- Remove redundant static_cast in hash_var_identity (comment hw-native-sys#5)
- Optimize triple As<T> to single GetKind() check (comment hw-native-sys#6)
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Feb 26, 2026
- Add tuple arity validation in Function binding (comments hw-native-sys#2/hw-native-sys#4)
- Add strict validation in param_directions deserialization (comments hw-native-sys#3/hw-native-sys#5)
syfeng-bot pushed a commit that referenced this pull request Mar 1, 2026
- Fix closure variable merging order to follow lexical scoping (comment #1)
- Fix error.span isinstance check - span is ir.Span not dict (comment #6)
- Use forward-compatible AST field copying with ast.copy_location (comment #3)
- Add return handling in inline mode with early termination after return (comment #4)
- Prevent leaking inline function variables to caller scope (comments #2, #5, #7)
werewb pushed a commit to werewb/pypto that referenced this pull request Mar 2, 2026
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Mar 2, 2026
- Reorder __all__ to match import order (comment #1)
- Document all 5 system ops in en/zh-cn syntax docs (comment hw-native-sys#3)
- Fix frame_offset=2 in helper functions for accurate span capture (comment hw-native-sys#4)
- Exclude 'system' from unified dispatch to prevent misleading errors (comment hw-native-sys#5)
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Mar 5, 2026
- Add dtype guard to is_const_int for round-trip fidelity (hw-native-sys#3)
- Document 1-3 arg concise forms in syntax blocks (hw-native-sys#5-7)
- Note concise forms are sugar; IR stores start/stop/step (#1-2)
- Fix range(0,10,1) comment to pl.range(10) (hw-native-sys#8-9)
- Fix inaccurate test comment about Var start printing (hw-native-sys#10)
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Mar 12, 2026
- Derive repo owner/name dynamically via gh repo view (hw-native-sys#2)
- Use comments(last: 1) to fetch latest comment per thread (hw-native-sys#4)
- Add pagination with pageInfo/endCursor for >100 threads (hw-native-sys#6)
- Extract run ID from check link field for gh run view (hw-native-sys#5, hw-native-sys#7)
- Document external check fallback (open link URL directly)
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Mar 13, 2026
- Expanded GraphQL query for readability (gemini hw-native-sys#2)
- Added gh pr checks command to define $LINK variable (gemini #1, copilot hw-native-sys#7)
- Replaced grep -oP with portable sed for run ID extraction (gemini #1, copilot hw-native-sys#7)
- Fixed grep pattern to handle whitespace in JSON (copilot hw-native-sys#6)
- Replaced non-standard BRE with grep -E (copilot hw-native-sys#5)
- Added local reproduction fallback for CI logs (gemini hw-native-sys#3)
- Removed ellipsis placeholder, save output inline (copilot hw-native-sys#4)
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request Mar 13, 2026
…e-sys#501

- Fix bugprone-unchecked-optional-access in op_registry.h fluent methods
  by extracting local ref after EnsureMemorySpec() with NOLINT
- Add missing #include <cstddef> in op_registry.h
- Add missing #include "pypto/ir/memory_space.h" in 7 tile op files
- Fix bugprone-parent-virtual-call in infer_tile_memory_space_pass.cpp:
  use IRMutator::VisitExpr/VisitStmt instead of ExprFunctor/StmtFunctor
- Remove unused #include "pypto/core/any_cast.h" from init_memref.cpp
- Return empty list instead of None for unconstrained input constraints
  in get_op_memory_spec binding (review comment hw-native-sys#5)
luohuan19 added a commit to luohuan19/pypto that referenced this pull request Mar 29, 2026
- Replace global _simpler_stamp_value with @functools.lru_cache(maxsize=1)
- Replace bare bool globals with list[bool] containers to avoid PLW0603
- Move all inline imports (ctypes, subprocess, torch, importlib.util, os,
  sys, logging) to top-level to fix PLC0415 violations
- Use importlib.import_module() for optional Simpler deps (code_runner,
  kernel_compiler, runtime_builder) to fix PLC0415 and pyright errors
- Write cached binaries atomically via temp file + os.replace() (hw-native-sys#5)
- Add _init_simpler_root_if_needed() to fix SIMPLER_ROOT timing in
  pytest_collection_finish (runs before session fixtures) (hw-native-sys#7)
- Log pre-build task failures instead of silently swallowing them (hw-native-sys#2)
- Extract _collect_test_case_from_item() helper to fix PLR0912 in
  pytest_collection_finish
- Fix E501 long print lines in conftest.py
- Move module-level logger _log to module scope in test_runner.py
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request May 6, 2026
- inline pass: extend DefVarCollector + VarSubstituteMutator to cover
  WhileStmt return_vars and ForStmt return_vars (Copilot hw-native-sys#5).
- inline pass: reject inline bodies with non-trailing ReturnStmts via
  NestedReturnCounter — silent miscompile if hand-built IR ever does
  early-return inside an If/For branch (Gemini #1).
- inline pass: drop the stale "Inline function as program entry: detected
  before splicing; raises" doc-comment line; the actual behaviour is
  silent removal in the cleanup phase (Copilot hw-native-sys#6).
- verifier: also flag *dangling* Calls — Calls whose callee GlobalVar
  isn't in program->functions_ — with error_code 2. The previous code
  short-circuited when no Inline functions survived, missing the case
  where the InlineFunctions pass dropped the function but left a Call
  behind (Copilot hw-native-sys#4).
- JIT _resolve_int: handle ast.Pow (only with non-negative int exponents)
  and ast.UAdd (Gemini hw-native-sys#2).
- docs/zh-cn/dev/passes/22-lower_pipeline_loops.md: narrow
  CanonicalizeIOOrder scope wording from "全程序每一个 SeqStmts" to
  "ForKind::Pipeline 作用域内的 SeqStmts" (CodeRabbit hw-native-sys#9).
- examples/models/qwen3_jit/kernels/rmsnorm.py: cast resid chunks to FP32
  in post_rmsnorm to mirror input_rmsnorm and avoid BF16 accumulation
  precision loss (CodeRabbit hw-native-sys#11).
Hzfengsy added a commit to Hzfengsy/pypto that referenced this pull request May 6, 2026
- inline pass: extend DefVarCollector + VarSubstituteMutator to cover
  WhileStmt return_vars and ForStmt return_vars (Copilot hw-native-sys#5).
- inline pass: reject inline bodies with non-trailing ReturnStmts via
  NestedReturnCounter — silent miscompile if hand-built IR ever does
  early-return inside an If/For branch (Gemini #1).
- inline pass: drop the stale "Inline function as program entry: detected
  before splicing; raises" doc-comment line; the actual behaviour is
  silent removal in the cleanup phase (Copilot hw-native-sys#6).
- verifier: also flag *dangling* Calls — Calls whose callee GlobalVar
  isn't in program->functions_ — with error_code 2. The previous code
  short-circuited when no Inline functions survived, missing the case
  where the InlineFunctions pass dropped the function but left a Call
  behind (Copilot hw-native-sys#4).
- JIT _resolve_int: handle ast.Pow (only with non-negative int exponents)
  and ast.UAdd (Gemini hw-native-sys#2).
- docs/zh-cn/dev/passes/22-lower_pipeline_loops.md: narrow
  CanonicalizeIOOrder scope wording from "全程序每一个 SeqStmts" to
  "ForKind::Pipeline 作用域内的 SeqStmts" (CodeRabbit hw-native-sys#9).
- examples/models/qwen3_jit/kernels/rmsnorm.py: cast resid chunks to FP32
  in post_rmsnorm to mirror input_rmsnorm and avoid BF16 accumulation
  precision loss (CodeRabbit hw-native-sys#11).
lyfne123 added a commit to lyfne123/pypto that referenced this pull request May 11, 2026
CI fix (root cause of pypto-lib-model + system-tests failures):

- P6's call-site injector previously inlined `tensor.as_layout(arg, DN)` into
  the kernel-call args directly, which the orchestration codegen rejects with
  `Call to '<callee>' arg N is neither a variable nor a recognized constant
  literal`. Refactor `CallSiteAsLayoutInjector` to operate at the statement
  level: for every AssignStmt / EvalStmt / ReturnStmt whose RHS targets a
  promoted callee, emit one `bridged_<param> = tensor.as_layout(arg, DN)`
  AssignStmt immediately before the call statement and replace the inline
  Call arg with the bound Var. Net IR is SSA-form and matches what
  orchestration codegen consumes per arg slot (Var | const-literal).

Review comments addressed:

- gemini #1: codegen `tensor.as_layout` now special-cases the identity flip
  (target layout == source layout) and emits a plain `Tensor result = input;`
  alias instead of a spurious `.transpose()`. Simplify still folds these
  before codegen in the default pipeline, but the codegen is now robust
  against ad-hoc compile paths that skip Simplify.
- coderabbitai hw-native-sys#2 / hw-native-sys#3: drop the "next default pass" wording in en/zh-cn doc
  17 — `MaterializeTensorStrides` runs later in the pipeline (after
  `CanonicalizeIOOrder`), not immediately after. The zh-cn doc's "17th pass"
  text is also clarified — the 17 is the docs/passes/ slot, not a literal
  pipeline call-count.
- coderabbitai hw-native-sys#4: `DeduceTensorAsLayoutType` now preserves the source
  TensorView's `valid_shape` (with trailing-pair swap on cross-layout flips)
  and `pad` through `tensor.as_layout`. Previously these fields were dropped,
  making the reinterpret silently lossy for sliced or fill-padded inputs.
- coderabbitai hw-native-sys#6: `MaterializeTensorStrides` direct-ctor rebuild path now
  forwards `op->attrs_`. The previous version preserved type and kwargs but
  dropped attrs, which would have silently discarded call metadata
  (arg_directions, manual_dep_edges) attached by earlier passes.
- coderabbitai hw-native-sys#7: update the stale comment block above `VisitExpr_` in
  `simplify_pass.cpp` — it still described the dropped shape-bearing
  `as_layout(x, shape, layout)` form and the never-implemented chain
  folding. New comment accurately describes the single identity-elimination
  rule and explains why chain folding is deferred.
- coderabbitai hw-native-sys#8 / hw-native-sys#9: the unit-test pattern that previously inspected an
  inline `tensor.as_layout` Call as a kernel-call arg no longer applies
  after the SSA refactor above. Tests now look up the bridge via
  `_find_assign_rhs(orch, var)` and guard `op is not None` before reading
  `op.name` (matching the defensive pattern already used in the B^T test).

Skipped (with reason):

- coderabbitai hw-native-sys#5: "Handle 3-arg `tile.load`". `tile.load` registers four
  mandatory args (tensor, offsets, shapes, valid_shapes) and the Python
  builder always materializes `valid_shapes` (defaults to `shapes` when the
  caller omits it). Once IR is constructed, every `tile.load` is 4-arg —
  the 3-arg form only exists at the DSL surface. The internal check stays
  as-is.
lyfne123 added a commit to lyfne123/pypto that referenced this pull request May 11, 2026
CI fix (root cause of pypto-lib-model + system-tests failures):

- P6's call-site injector previously inlined `tensor.as_layout(arg, DN)` into
  the kernel-call args directly, which the orchestration codegen rejects with
  `Call to '<callee>' arg N is neither a variable nor a recognized constant
  literal`. Refactor `CallSiteAsLayoutInjector` to operate at the statement
  level: for every AssignStmt / EvalStmt / ReturnStmt whose RHS targets a
  promoted callee, emit one `bridged_<param> = tensor.as_layout(arg, DN)`
  AssignStmt immediately before the call statement and replace the inline
  Call arg with the bound Var. Net IR is SSA-form and matches what
  orchestration codegen consumes per arg slot (Var | const-literal).

Review comments addressed:

- gemini #1: codegen `tensor.as_layout` now special-cases the identity flip
  (target layout == source layout) and emits a plain `Tensor result = input;`
  alias instead of a spurious `.transpose()`. Simplify still folds these
  before codegen in the default pipeline, but the codegen is now robust
  against ad-hoc compile paths that skip Simplify.
- coderabbitai hw-native-sys#2 / hw-native-sys#3: drop the "next default pass" wording in en/zh-cn doc
  17 — `MaterializeTensorStrides` runs later in the pipeline (after
  `CanonicalizeIOOrder`), not immediately after. The zh-cn doc's "17th pass"
  text is also clarified — the 17 is the docs/passes/ slot, not a literal
  pipeline call-count.
- coderabbitai hw-native-sys#4: `DeduceTensorAsLayoutType` now preserves the source
  TensorView's `valid_shape` (with trailing-pair swap on cross-layout flips)
  and `pad` through `tensor.as_layout`. Previously these fields were dropped,
  making the reinterpret silently lossy for sliced or fill-padded inputs.
- coderabbitai hw-native-sys#6: `MaterializeTensorStrides` direct-ctor rebuild path now
  forwards `op->attrs_`. The previous version preserved type and kwargs but
  dropped attrs, which would have silently discarded call metadata
  (arg_directions, manual_dep_edges) attached by earlier passes.
- coderabbitai hw-native-sys#7: update the stale comment block above `VisitExpr_` in
  `simplify_pass.cpp` — it still described the dropped shape-bearing
  `as_layout(x, shape, layout)` form and the never-implemented chain
  folding. New comment accurately describes the single identity-elimination
  rule and explains why chain folding is deferred.
- coderabbitai hw-native-sys#8 / hw-native-sys#9: the unit-test pattern that previously inspected an
  inline `tensor.as_layout` Call as a kernel-call arg no longer applies
  after the SSA refactor above. Tests now look up the bridge via
  `_find_assign_rhs(orch, var)` and guard `op is not None` before reading
  `op.name` (matching the defensive pattern already used in the B^T test).

Skipped (with reason):

- coderabbitai hw-native-sys#5: "Handle 3-arg `tile.load`". `tile.load` registers four
  mandatory args (tensor, offsets, shapes, valid_shapes) and the Python
  builder always materializes `valid_shapes` (defaults to `shapes` when the
  caller omits it). Once IR is constructed, every `tile.load` is 4-arg —
  the 3-arg form only exists at the DSL surface. The internal check stays
  as-is.
lyfne123 added a commit to lyfne123/pypto that referenced this pull request May 12, 2026
CI fix (root cause of pypto-lib-model + system-tests failures):

- P6's call-site injector previously inlined `tensor.as_layout(arg, DN)` into
  the kernel-call args directly, which the orchestration codegen rejects with
  `Call to '<callee>' arg N is neither a variable nor a recognized constant
  literal`. Refactor `CallSiteAsLayoutInjector` to operate at the statement
  level: for every AssignStmt / EvalStmt / ReturnStmt whose RHS targets a
  promoted callee, emit one `bridged_<param> = tensor.as_layout(arg, DN)`
  AssignStmt immediately before the call statement and replace the inline
  Call arg with the bound Var. Net IR is SSA-form and matches what
  orchestration codegen consumes per arg slot (Var | const-literal).

Review comments addressed:

- gemini #1: codegen `tensor.as_layout` now special-cases the identity flip
  (target layout == source layout) and emits a plain `Tensor result = input;`
  alias instead of a spurious `.transpose()`. Simplify still folds these
  before codegen in the default pipeline, but the codegen is now robust
  against ad-hoc compile paths that skip Simplify.
- coderabbitai hw-native-sys#2 / hw-native-sys#3: drop the "next default pass" wording in en/zh-cn doc
  17 — `MaterializeTensorStrides` runs later in the pipeline (after
  `CanonicalizeIOOrder`), not immediately after. The zh-cn doc's "17th pass"
  text is also clarified — the 17 is the docs/passes/ slot, not a literal
  pipeline call-count.
- coderabbitai hw-native-sys#4: `DeduceTensorAsLayoutType` now preserves the source
  TensorView's `valid_shape` (with trailing-pair swap on cross-layout flips)
  and `pad` through `tensor.as_layout`. Previously these fields were dropped,
  making the reinterpret silently lossy for sliced or fill-padded inputs.
- coderabbitai hw-native-sys#6: `MaterializeTensorStrides` direct-ctor rebuild path now
  forwards `op->attrs_`. The previous version preserved type and kwargs but
  dropped attrs, which would have silently discarded call metadata
  (arg_directions, manual_dep_edges) attached by earlier passes.
- coderabbitai hw-native-sys#7: update the stale comment block above `VisitExpr_` in
  `simplify_pass.cpp` — it still described the dropped shape-bearing
  `as_layout(x, shape, layout)` form and the never-implemented chain
  folding. New comment accurately describes the single identity-elimination
  rule and explains why chain folding is deferred.
- coderabbitai hw-native-sys#8 / hw-native-sys#9: the unit-test pattern that previously inspected an
  inline `tensor.as_layout` Call as a kernel-call arg no longer applies
  after the SSA refactor above. Tests now look up the bridge via
  `_find_assign_rhs(orch, var)` and guard `op is not None` before reading
  `op.name` (matching the defensive pattern already used in the B^T test).

Skipped (with reason):

- coderabbitai hw-native-sys#5: "Handle 3-arg `tile.load`". `tile.load` registers four
  mandatory args (tensor, offsets, shapes, valid_shapes) and the Python
  builder always materializes `valid_shapes` (defaults to `shapes` when the
  caller omits it). Once IR is constructed, every `tile.load` is 4-arg —
  the 3-arg form only exists at the DSL surface. The internal check stays
  as-is.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants