Skip to content

Use typed RaggedDim sentinel for ragged dimensions#320

Merged
khatchad merged 4 commits into
masterfrom
ml-544-ragged-dim
May 22, 2026
Merged

Use typed RaggedDim sentinel for ragged dimensions#320
khatchad merged 4 commits into
masterfrom
ml-544-ragged-dim

Conversation

@khatchad

Copy link
Copy Markdown
Member

Summary

  • Add RaggedDim extends Dimension<Void> + DimensionType.Ragged in TensorType.java. Replaces raw null per-element entries that previously encoded ragged dimensions and violated the implicit non-null-element contract of Iterable<Dimension<?>>.
  • Migrate the 7 emit sites in 5 ragged generators (RaggedConstant, RaggedFromNestedRowLengths, RaggedFromNestedValueRowIds, RaggedRange, RaggedTensorFromValues) from shape.add(null) to shape.add(new RaggedDim()).
  • Update TensorShapeUtil.areBroadcastable and TensorShapeUtil.getBroadcastedShapes to recognize RaggedDim as broadcast-compatible and propagate it through the result. Without this, ragged-on-ragged broadcasts (testAdd66 through testAdd99, testGradient, testGradient2) throw NonBroadcastableShapesException.
  • Rename ragged-test fixtures from TENSOR_*_NONE_* to TENSOR_*_RAGGED_* to match the new representation. The genuinely-shared TENSOR_2_NONE_INT32 (used by testDataset10/testDataset10a, whose output comes from DatasetFromGeneratorGenerator, still null-emitting) is retained as-is.

Out of scope

Dynamic-batch and placeholder null sites in Input.java:137, Input.java:154, FlowFromDirectoryGenerator.java:75, and TensorGenerator.java:2683 still encode raw null. Tracked separately at wala#545.

Test plan

  • mvn clean install -DskipTests from root passes.
  • mvn test from root: 777 tests, 0 failures, 0 errors, 3 skipped.
  • Spotless clean on commit.

Closes wala#544.

🤖 Generated with Claude Code

Ragged dims were previously encoded as raw `null` entries in
`TensorType.dims`, violating the implicit non-null-element contract of
`Iterable<Dimension<?>>` and erasing the structural raggedness signal for
downstream consumers (Hybridize's input-signature inference collapsed `null`
to `SymbolicDim("?")`). Replace with a typed `RaggedDim extends
Dimension<Void>` plus `DimensionType.Ragged`, migrate the seven emit sites in
the five ragged generators, and update `TensorShapeUtil.areBroadcastable` and
`TensorShapeUtil.getBroadcastedShapes` to treat `RaggedDim` as
broadcast-compatible and propagate it through the result (preserving the
legacy null-as-compatible behavior the ragged-broadcast tests depend on).
Ragged-test fixtures renamed from `TENSOR_*_NONE_*` to `TENSOR_*_RAGGED_*` to
reflect the new representation; genuinely-shared `TENSOR_2_NONE_INT32`
(testDataset10's `DatasetFromGeneratorGenerator` output) retained as-is.

The dynamic-batch / placeholder `null` sites in `Input.java`,
`FlowFromDirectoryGenerator.java`, and `TensorGenerator.java` are out of
scope here; tracked separately at wala#545.

Closes wala#544.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 22, 2026 00:18
Per review on #320: the explanation of
why `super(null)` is mechanically necessary is an implementation detail, not
contract-level documentation. Tag it `@implNote` so the description line
stays focused on what the constructor does.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a typed sentinel (TensorType.RaggedDim) to represent ragged tensor dimensions (previously encoded as raw null entries), and updates ragged tensor generators, broadcasting utilities, and test fixtures to use and recognize this representation.

Changes:

  • Add TensorType.RaggedDim and DimensionType.Ragged to represent ragged dimensions explicitly.
  • Update ragged tensor shape emitters to append new RaggedDim() instead of null.
  • Teach TensorShapeUtil broadcast logic to treat RaggedDim as broadcast-compatible and to propagate raggedness into the broadcast result; update tests accordingly.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
com.ibm.wala.cast.python.ml/source/com/ibm/wala/cast/python/ml/util/TensorShapeUtil.java Extend broadcast compatibility/propagation rules to recognize RaggedDim.
com.ibm.wala.cast.python.ml/source/com/ibm/wala/cast/python/ml/types/TensorType.java Add RaggedDim sentinel + enum tag for typed ragged dimensions.
com.ibm.wala.cast.python.ml/source/com/ibm/wala/cast/python/ml/client/RaggedTensorFromValues.java Emit ragged shapes using RaggedDim instead of raw null.
com.ibm.wala.cast.python.ml/source/com/ibm/wala/cast/python/ml/client/RaggedRange.java Emit ragged axis using RaggedDim.
com.ibm.wala.cast.python.ml/source/com/ibm/wala/cast/python/ml/client/RaggedFromNestedValueRowIds.java Emit K ragged dimensions using RaggedDim.
com.ibm.wala.cast.python.ml/source/com/ibm/wala/cast/python/ml/client/RaggedFromNestedRowLengths.java Emit K ragged dimensions using RaggedDim.
com.ibm.wala.cast.python.ml/source/com/ibm/wala/cast/python/ml/client/RaggedConstant.java Emit ragged dimensions using RaggedDim.
com.ibm.wala.cast.python.ml.test/source/com/ibm/wala/cast/python/ml/test/TestTensorflow2Model.java Rename/update ragged fixtures to use RaggedDim-based shapes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@khatchad khatchad enabled auto-merge May 22, 2026 00:22
khatchad and others added 2 commits May 21, 2026 20:23
Per CodeQL / github-code-quality bot on #320:
the `Long` boxed counter at `RaggedConstant.java:432` is never null and never
needs to be. Switching to primitive `long` removes misleading nullability and
the per-iteration unboxing of `R` from `long`-vs-`Long` comparison.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per Copilot review on #320: the
constant name suggests a rank-4 shape with three unknown dims, but the value
was constructed with only two dimensions (`NumericDim(2), null`). The
constant carried `@SuppressWarnings("unused")` and had no readers anywhere
in the suite. Deleting clears the inconsistency without behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented May 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.93088% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.82%. Comparing base (9b5f424) to head (7cfbe39).

Files with missing lines Patch % Lines
.../com/ibm/wala/cast/python/ml/types/TensorType.java 45.45% 6 Missing ⚠️
.../ibm/wala/cast/python/ml/util/TensorShapeUtil.java 40.00% 1 Missing and 2 partials ⚠️
...wala/cast/python/ml/test/TestTensorflow2Model.java 99.48% 1 Missing ⚠️
.../cast/python/ml/client/RaggedTensorFromValues.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #320      +/-   ##
============================================
+ Coverage     70.80%   70.82%   +0.01%     
- Complexity     2606     2608       +2     
============================================
  Files           266      266              
  Lines         19815    19852      +37     
  Branches       3195     3198       +3     
============================================
+ Hits          14031    14061      +30     
- Misses         4504     4510       +6     
- Partials       1280     1281       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@khatchad khatchad added this pull request to the merge queue May 22, 2026
Merged via the queue into master with commit 13cc11e May 22, 2026
13 checks passed
@khatchad khatchad deleted the ml-544-ragged-dim branch May 22, 2026 00:39
khatchad added a commit to ponder-lab/Hybridize-Functions-Refactoring that referenced this pull request May 27, 2026
…typed dim sentinels

Mechanical update of 42 `testHasLikelyTensorParameter*` expected `TensorType`s
to match the new typed-sentinel encoding Ariadne 0.45.0 emits per wala/ML#545
(`DynamicDim`) and ponder-lab/ML#320 (`RaggedDim`):

- Tests 100-104, 124, 125 (`keras.Input`): leading batch dim `null`
  → `DynamicDim.INSTANCE`.
- Tests 59-66 (`RaggedTensor.from_nested_row_splits`), 71-82
  (`RaggedTensor.from_row_*` / `from_value_rowids`): interior ragged dims
  `null` → `RaggedDim.INSTANCE`.
- Tests 67-70 (`from_nested_value_rowids`), 118-122 (`from_nested_row_lengths`):
  mixed shape `(4, None, None, None)` → ragged dims at positions 1-2
  (`RaggedDim.INSTANCE`) plus a dynamic flat-values dim at position 3
  (`DynamicDim.INSTANCE`).
- Tests 112-117 (`ragged.range`): both parameter and signature dims tighten
  from `null` / `SymbolicDim("?")` to `NumericDim(5)` per ponder-lab/ML#332's
  `RaggedRange` precision improvement.

Also adds the two new imports (`DynamicDim`, `RaggedDim`) and drops the
now-resolved typed-sentinel TODO from `testHasLikelyTensorParameter59()`'s
docstring; the Hybridize#524 `RaggedTensorSpec` emission TODO stays.
khatchad added a commit to ponder-lab/Hybridize-Functions-Refactoring that referenced this pull request May 27, 2026
The per-position consensus loop in `inferSpec` collapsed every non-`NumericDim`
to a `SymbolicDim("?")` wildcard via a fall-through `else`. With Ariadne 0.45.0
shipping typed dim sentinels (`DynamicDim` per wala/ML#545; `RaggedDim` per
ponder-lab/ML#320), the consumer can name those cases explicitly.

Splits the fan-out into independent branches:

- Disagreement across contexts → wildcard.
- `instanceof NumericDim` → keep the concrete value.
- `instanceof DynamicDim` → wildcard (`tf.keras.Input`-style batch axes encode
  as `SymbolicDim("?")` in `TensorSpec`; same surface behavior as before).
- `instanceof RaggedDim` → wildcard, carrying a `TODO(#524)` marker so the
  `RaggedTensorSpec` emission work lands on a clearly-anchored branch.
- Otherwise → wildcard.

Surface behavior is unchanged; the existing `testHasLikelyTensorParameter*`
suite (492 tests, all green) covers the regression surface. Drops the stale
`wala/ML#544` reference from the inline comment—the typed-sentinel half of
that TODO shipped in Ariadne 0.45.0.

Closes #561.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use a typed RaggedDim sentinel for ragged dimensions instead of raw null in TensorType.dims

3 participants