Skip to content

feat(distributed): pld.alloc_window_buffer / pld.window ops; remodel WindowBuffer as a Var#1346

Merged
lyfne123 merged 2 commits into
hw-native-sys:mainfrom
YunjiQin:feat/distributed-op
May 13, 2026
Merged

feat(distributed): pld.alloc_window_buffer / pld.window ops; remodel WindowBuffer as a Var#1346
lyfne123 merged 2 commits into
hw-native-sys:mainfrom
YunjiQin:feat/distributed-op

Conversation

@YunjiQin
Copy link
Copy Markdown
Contributor

@YunjiQin YunjiQin commented May 12, 2026

Summary

  • Remodel WindowBuffer as a Var subclass (mirroring MemRef) typed by a new singleton WindowBufferType. The buffer's runtime-unique identifier flows through the inherited name_hint; per-rank size is stored in bytes, making the buffer dtype-agnostic and aligning ChipBufferSpec.nbytes 1:1 with the manifest.
  • Add DistributedTensorType.window_buffer — an optional back-reference to the source WindowBuffer — so two same-shape/dtype slices of distinct allocations stay structurally distinct. None for user-declared parameter annotations.
  • Add two distributed ops at the C++, binding, parser, and DSL layers:
    • pld.alloc_window_buffer(size_bytes) — pure address-space alloc returning a Ptr. The parser intercepts the assignment to derive a program-global unique name from the LHS and rejects tuple unpacking / user-supplied kwargs.
    • pld.window(buf, [shape], dtype=...) — materialise a Ptr handle as a DistributedTensorType view.
  • Update serialization, structural_equal, structural_hash, and the python printer for the new type and optional back-reference.
  • Drop dtype / bits-per-element from the comm-manifest schema: slots now carry only name + nbytes. The runtime passes an opaque dtype placeholder to ChipBufferSpec, whose dtype field is not consumed.
  • Update docs/en/dev/ir/02-types.md and the zh-cn mirror for the new window_buffer back-reference and the redesigned WindowBuffer schema.

Relative Issue

#1127 Communication Interface impl N2

Test plan

  • All existing unit tests pass (1 unrelated pre-existing failure in test_split_vector_kernel.py regex, untouched by this PR)
  • New tests cover pld.alloc_window_buffer parser behaviour (tests/ut/ir/parser/test_alloc_window_buffer.py), pld.window parser behaviour (tests/ut/ir/parser/test_window_op.py), distributed-op type round-trip (tests/ut/ir/test_distributed_ops.py), structural equality with shared/independent slot Vars (tests/ut/ir/core/test_comm_group_schema.py), and the manifest pipeline end-to-end (tests/ut/runtime/test_chip_bootstrap_configs.py)
  • clang-format / clang-tidy / ruff / pyright pre-commit hooks all pass

…emodel WindowBuffer as a Var

WindowBuffer is now a Var subclass (mirroring MemRef) typed by the new
singleton WindowBufferType. The buffer's runtime-unique identifier flows
through the inherited name_hint; per-rank size is stored in bytes,
making it dtype-agnostic and aligning ChipBufferSpec.nbytes 1:1 with
the manifest. DistributedTensorType gains an optional window_buffer
back-reference so two same-shape/dtype slices of distinct allocations
stay structurally distinct.

Adds pld.alloc_window_buffer (pure address-space alloc returning a Ptr)
and pld.window (materialise a Ptr as a DistributedTensorType view) at
the C++, binding, parser, and DSL layers. The parser intercepts the
alloc assignment to derive a program-global unique name from the LHS
and to reject tuple unpacking / user-supplied kwargs.

Updates serialization, structural_equal, structural_hash, and the
python_printer to handle the new type and the optional back-reference.
The comm-manifest schema drops dtype/bits-per-element and carries only
name + nbytes; the runtime passes an opaque dtype placeholder to
ChipBufferSpec, which does not consume that field.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 12, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces first-class distributed window-buffer allocation and view-materialization operations (pld.alloc_window_buffer and pld.window) into the PyPTO IR, refactoring WindowBuffer from an IRNode to a Var subclass, adding a WindowBufferType singleton marker type, and integrating program-wide parser support with dtype-agnostic manifest/runtime handling for CommGroup window buffers.

Changes

Distributed Window-Buffer Support

Layer / File(s) Summary
IR type system: WindowBufferType and DistributedTensorType updates
include/pypto/ir/core.h, include/pypto/ir/kind_traits.h, include/pypto/ir/program.h, include/pypto/ir/type.h
WindowBufferType singleton marker type added to IR; WindowBuffer refactored from IRNode to Var subclass carrying base (allocation token) and size (per-rank bytes); DistributedTensorType extended with optional window_buffer_ field to preserve allocation provenance for shape/dtype combinations.
Op registration: pld.alloc_window_buffer and pld.window
src/ir/op/distributed/memory.cpp, CMakeLists.txt
Two distributed-memory ops registered with full validation: pld.alloc_window_buffer(size, *, name) returns PtrType; pld.window(buf, shape, *, dtype) validates Ptr input and MakeTuple shape, returns DistributedTensorType; new source added to build.
Serialization and deserialization: type persistence
src/ir/serialization/serializer.cpp, src/ir/serialization/deserializer.cpp, src/ir/serialization/type_deserializers.cpp
Serializer emits optional window_buffer node reference for DistributedTensorType; WindowBufferType serializes as a singleton marker; deserializer reconstructs optional memref/tensor_view/window_buffer and rehydrates WindowBuffer from base Var and size/staging flags.
Structural equality, hashing, and printing
src/ir/transforms/structural_equal.cpp, src/ir/transforms/structural_hash.cpp, src/ir/transforms/python_printer.cpp
Equality compares window_buffer_ presence and underlying Var identity; hashing mixes window_buffer presence and referenced node so identical shape/dtype from different buffers hash differently; WindowBufferType treated as singleton; printer emits pld.WindowBufferType.
Python DSL sentinels and type bindings
python/pypto/language/distributed/alloc.py, python/pypto/language/distributed/__init__.py, python/bindings/modules/ir.cpp, python/pypto/pypto_core/ir.pyi
Sentinels alloc_window_buffer(size) and window(buf, shape, *, dtype) added and exported; ir.WindowBufferType binding added; WindowBuffer Python binding switched to Var subclass accepting base/size/staging flags; pyi stubs updated to include window_buffer field and new constructors.
AST parser: distributed op parsing and name uniqueness
python/pypto/language/parser/ast_parser.py, python/pypto/language/parser/decorator.py
Parser recognizes pld.* calls, intercepts buf = pld.alloc_window_buffer(...) to inject LHS-derived name and enforce simple-assignment shape; implements pld.window parsing with Ptr/MakeTuple/dtype validation and program-wide uniqueness tracking via alloc_window_buffer_names.
Manifest schema and runtime: dtype-agnostic CommGroup buffers
python/pypto/ir/comm_manifest.py, python/pypto/runtime/distributed_runner.py
Manifest v2 slots now carry only name (name_hint), nbytes, and host flags; runtime uses an opaque dtype placeholder and consumes nbytes to build ChipBufferSpec counts.
Tests: IR ops, parser, and CommGroup schema
tests/ut/ir/core/test_comm_group_schema.py, tests/ut/ir/parser/test_alloc_window_buffer.py, tests/ut/ir/parser/test_window_op.py, tests/ut/ir/test_distributed_ops.py
New and updated tests cover WindowBufferType singleton behavior, alloc_window_buffer parsing (LHS/PtrType/name injection/uniqueness), pld.window parsing/validation, WindowBuffer Var structure, DistributedTensorType provenance and structural equality/hash distinctions.
Tests: runtime manifest and bootstrap configs
tests/ut/runtime/test_chip_bootstrap_configs.py
Updated tests assert manifest slot serialization uses nbytes and host flags, opaque dtype propagation in runtime, comm.window_size computed from summed nbytes, and AOT/roundtrip behavior with Var-based WindowBuffer slots.
Documentation: type system and DSL semantics
docs/en/dev/ir/02-types.md, docs/zh-cn/dev/ir/02-types.md
Clarifies that allocation metadata resides on ir.WindowBuffer (Var subclass), pld.window slices may carry an optional window_buffer back-reference to the source allocation, and user-declared annotations leave this field None.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • hw-native-sys/pypto#1297: Earlier PR on WindowBuffer/DistributedTensor/window-buffer typing, serialization, and Python bindings related to the same distributed allocation feature.

Suggested reviewers

  • lyfne123
  • Hzfengsy

Poem

🐇 I nibbled bytes where windows grow,

Buffers remember where they go,
Parsers bind names and types align,
Hashes whisper "not the same" in kind,
Hooray — the distributed views now know!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 49.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title clearly and concisely summarizes the primary changes: remodeling WindowBuffer as a Var subclass and adding two new distributed ops (pld.alloc_window_buffer and pld.window).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description clearly relates to the changeset, detailing the remodeling of WindowBuffer as a Var subclass, addition of DistributedTensorType.window_buffer back-reference, new distributed ops (pld.alloc_window_buffer and pld.window), serialization updates, and manifest schema changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request redesigns the WindowBuffer and DistributedTensorType infrastructure to mirror the MemRef pattern, making WindowBuffer a Var subclass that is dtype-agnostic and carries size in bytes. Key changes include the introduction of pld.alloc_window_buffer and pld.window ops, updated Python AST parsing to enforce global name uniqueness, and the addition of a window_buffer back-reference in DistributedTensorType to maintain structural identity. Feedback identifies a missing field in the DistributedTensorType deserialization logic, corrects an incorrect hint in a parser error message, and recommends using INTERNAL_CHECK for defensive null checks in C++ type deduction logic.

Comment thread src/ir/serialization/deserializer.cpp Outdated
Comment thread python/pypto/language/parser/ast_parser.py Outdated
Comment thread src/ir/op/distributed/memory.cpp
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
python/bindings/modules/ir.cpp (1)

1518-1523: 💤 Low value

Consider clarifying the base parameter requirement in the docstring.

The constructor accepts a VarPtr base but doesn't validate that it's actually typed as PtrType. While this mirrors the MemRef pattern (which similarly doesn't validate the base), the docstring could explicitly state the expected type constraint to guide users.

📝 Suggested docstring clarification
                           nb::arg("span") = Span::unknown(),
-                          "Create a WindowBuffer wrapping the given Ptr Var. The buffer's "
+                          "Create a WindowBuffer wrapping the given Var (which must be of type "
+                          "PtrType, produced by pld.alloc_window_buffer). The buffer's "
                           "runtime-unique identifier flows through the inherited "
                           "Var.name_hint (taken from base.name_hint).");
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/bindings/modules/ir.cpp` around lines 1518 - 1523, The docstring for
the WindowBuffer constructor (the nb::init<VarPtr, ExprPtr, bool, bool, Span>()
bound on window_buffer_class) should state that the base parameter must be a
pointer-typed Var (PtrType) even though no runtime check is performed; update
the string literal passed to window_buffer_class.def to explicitly mention that
base is expected to be a Var with PtrType (or equivalent pointer semantics) and
any consequences if a non-PtrType is supplied so users know the requirement when
calling the constructor.
include/pypto/ir/program.h (1)

55-61: ⚡ Quick win

Consider adding a null-check for the base parameter.

The constructor dereferences base->name_hint_ without validating that base is non-null. While call sites likely ensure a valid base, adding a defensive check or an assertion would prevent undefined behavior if misused.

🛡️ Suggested defensive check
   WindowBuffer(VarPtr base, ExprPtr size, bool load_from_host = false, bool store_to_host = false,
                Span span = Span::unknown())
-      : Var(base->name_hint_, GetWindowBufferType(), std::move(span)),
+      : Var(base ? base->name_hint_ : "", GetWindowBufferType(), std::move(span)),
         base_(std::move(base)),
         size_(std::move(size)),
         load_from_host_(load_from_host),
-        store_to_host_(store_to_host) {}
+        store_to_host_(store_to_host) {
+    if (!base_) {
+      throw pypto::ValueError("WindowBuffer requires a non-null base Ptr Var");
+    }
+  }

Alternatively, if null base is genuinely impossible (guaranteed by parser), a debug assertion (PYPTO_CHECK(base != nullptr)) would document the precondition without runtime cost in release builds.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@include/pypto/ir/program.h` around lines 55 - 61, The WindowBuffer
constructor currently dereferences base->name_hint_ without checking base; add a
defensive check or debug assertion at the start of the WindowBuffer(VarPtr base,
ExprPtr size, ...) constructor to ensure base is non-null (e.g.,
PYPTO_CHECK(base != nullptr) or an explicit if-check throwing/logging), then
proceed to use base->name_hint_ when constructing the Var; this documents the
precondition and prevents undefined behavior in WindowBuffer and related uses.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/pypto/language/parser/ast_parser.py`:
- Around line 4451-4455: The diagnostic message for pld.alloc_window_buffer is
giving a hint that includes a dtype keyword, but the parser path rejects kwargs
for pld.alloc_window_buffer; update the hint string in the error construction
(the block that emits "pld.alloc_window_buffer must appear as the RHS..." near
the pld.alloc_window_buffer handling in ast_parser.py) to show the correct
positional-only call signature (e.g., "Write 'buf = pld.alloc_window_buffer(N,
pl.FP32)'" or similar) so the hint matches the enforced signature and does not
suggest kwargs.

---

Nitpick comments:
In `@include/pypto/ir/program.h`:
- Around line 55-61: The WindowBuffer constructor currently dereferences
base->name_hint_ without checking base; add a defensive check or debug assertion
at the start of the WindowBuffer(VarPtr base, ExprPtr size, ...) constructor to
ensure base is non-null (e.g., PYPTO_CHECK(base != nullptr) or an explicit
if-check throwing/logging), then proceed to use base->name_hint_ when
constructing the Var; this documents the precondition and prevents undefined
behavior in WindowBuffer and related uses.

In `@python/bindings/modules/ir.cpp`:
- Around line 1518-1523: The docstring for the WindowBuffer constructor (the
nb::init<VarPtr, ExprPtr, bool, bool, Span>() bound on window_buffer_class)
should state that the base parameter must be a pointer-typed Var (PtrType) even
though no runtime check is performed; update the string literal passed to
window_buffer_class.def to explicitly mention that base is expected to be a Var
with PtrType (or equivalent pointer semantics) and any consequences if a
non-PtrType is supplied so users know the requirement when calling the
constructor.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f5ac3c1b-5622-401f-8be0-21d33a30294a

📥 Commits

Reviewing files that changed from the base of the PR and between daa6f9f and b0fb63e.

📒 Files selected for processing (27)
  • CMakeLists.txt
  • docs/en/dev/ir/02-types.md
  • docs/zh-cn/dev/ir/02-types.md
  • include/pypto/ir/core.h
  • include/pypto/ir/kind_traits.h
  • include/pypto/ir/program.h
  • include/pypto/ir/type.h
  • python/bindings/modules/ir.cpp
  • python/pypto/ir/comm_manifest.py
  • python/pypto/language/distributed/__init__.py
  • python/pypto/language/distributed/alloc.py
  • python/pypto/language/parser/ast_parser.py
  • python/pypto/language/parser/decorator.py
  • python/pypto/pypto_core/ir.pyi
  • python/pypto/runtime/distributed_runner.py
  • src/ir/op/distributed/memory.cpp
  • src/ir/serialization/deserializer.cpp
  • src/ir/serialization/serializer.cpp
  • src/ir/serialization/type_deserializers.cpp
  • src/ir/transforms/python_printer.cpp
  • src/ir/transforms/structural_equal.cpp
  • src/ir/transforms/structural_hash.cpp
  • tests/ut/ir/core/test_comm_group_schema.py
  • tests/ut/ir/parser/test_alloc_window_buffer.py
  • tests/ut/ir/parser/test_window_op.py
  • tests/ut/ir/test_distributed_ops.py
  • tests/ut/runtime/test_chip_bootstrap_configs.py

Comment thread python/pypto/language/parser/ast_parser.py
YunjiQin added a commit to YunjiQin/pypto that referenced this pull request May 12, 2026
- Round-trip the DistributedTensorType.window_buffer back-reference
  through serialization. The serializer now writes window_buffer as a
  shared node reference (WindowBuffer is an IRNode, so the ref-table
  handles identity); the deserializer reads it and uses a new full-
  fields DistributedTensorType constructor to populate every optional
  field (memref, tensor_view, window_buffer) in one shot.
- Correct the pld.alloc_window_buffer error hint: the op rejects all
  user kwargs, so the previous "dtype=pl.FP32" suggestion was
  contradictory.
- Round-trip the DistributedTensorType.window_buffer back-reference
  through serialization. The serializer now writes window_buffer as a
  shared node reference (WindowBuffer is an IRNode, so the ref-table
  handles identity); the deserializer reads it and uses a new full-
  fields DistributedTensorType constructor to populate every optional
  field (memref, tensor_view, window_buffer) in one shot.
- Correct the pld.alloc_window_buffer error hint: the op rejects all
  user kwargs, so the previous "dtype=pl.FP32" suggestion was
  contradictory.
@YunjiQin YunjiQin force-pushed the feat/distributed-op branch from 24a730d to 3b42458 Compare May 12, 2026 11:27
@lyfne123 lyfne123 merged commit 7da3b83 into hw-native-sys:main May 13, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants