test(#1346): canary for engine fix #362 reaching fused-Adam path + Skip'd regression for Tensors#396 by ooples · Pull Request #1386 · ooples/AiDotNet

ooples · 2026-05-19T13:42:56Z

Summary

Adds focused integration test coverage tied to AiDotNet#1346 (now closed) and the follow-up tracked at AiDotNet.Tensors#396.

Two tests in one new file:

FlashAttentionLayer_TrainViaFusedCompiledAdam_EngagesFusedPath (active, passes) — canary that locks in AiDotNet.Tensors PR [Test Coverage] Implement Tests for Basic Wavelets #362's reach into the AiDotNet fused-Adam training path. A Transformer<float> with FlashAttentionLayer in its explicit layer stack must engage CompiledTapeTrainingStep<float>.TryStepWithFusedOptimizer on the first Train() call. Pre-fix this canary would be invisible (the fused step ran but FA gradients were silently zeroed). Post-fix the engagement counter is the proxy for "the engine fix reached this path".
DenseIdentity_CCE_OnFusedAdam_DoesNotSilentlyZeroNaN (Skip'd until Tensors#396 ships) — future-fix regression for the consumer-side gap surfaced during the flashattentionlayer produces degenerate output + 3.76x slowdown vs documented 2-4x speedup #1346 investigation. When a multi-parameter model routes raw logits (IdentityActivation Dense) through CategoricalCrossEntropyLoss on the fused-Adam path, the CCE log(negative_logit + eps) chain produces NaN that the Tensors loss-readout silently zeroes — GetLastLoss() returns literal 0.0, the consumer thinks training converged, and the model never moves off random init. The Skip message includes the unblock condition (Tensors#396 fix + NuGet bump).

Why this scope

The bug investigation for #1346 found that the original engine-side gradient drop was correctly fixed by AiDotNet.Tensors PR #362 (in NuGet 0.81.3). The remaining "consumer still sees degenerate output" symptom turned out to be a much larger, not-FA-specific bug in the Tensors loss readout — diagnosed via 5 controlled bisection experiments and filed in detail at AiDotNet.Tensors#396. That fix lives in AiDotNet.Tensors/Engines/Compilation/CompiledTrainingPlan.cs, not in AiDotNet. AiDotNet#1346 is closed.

Test plan

dotnet build tests/AiDotNet.Tests/AiDotNetTests.csproj --framework net10.0 -c Release — 0 errors
dotnet test ... --filter "FullyQualifiedName~FlashAttentionFusedCompiledTrainingIssue1346Tests" --no-build — 1 passed, 1 skipped

Local run output:

Passed!  - Failed: 0, Passed: 1, Skipped: 1, Total: 2

🤖 Generated with Claude Code

Summary by CodeRabbit

Tests
- Added integration tests verifying FlashAttention engagement on the fused-Adam compiled training path.
- Includes a canary test that ensures fused-step tracking is triggered during training.
- Adds a skipped regression test (blocked externally) that validates loss behavior to prevent silent-zero failures.
- Tests emit diagnostic values for debugging and verification.

…ip'd regression for Tensors#396 Adds a focused integration test file with two tests: 1. FlashAttentionLayer_TrainViaFusedCompiledAdam_EngagesFusedPath (active) Locks in that AiDotNet.Tensors PR #362 (engine-side Engine.FlashAttention GraphMode.IsActive recording branch, shipped in NuGet 0.81.3) actually reaches the AiDotNet fused-Adam training path. A Transformer<float> with FlashAttentionLayer in its layer stack must engage CompiledTapeTrainingStep<float>.TryStepWithFusedOptimizer on the first Train() call. A future regression that breaks GraphMode trace through FA (or any future compile-side gate that rejects FA-containing graphs) would flip this red. 2. DenseIdentity_CCE_OnFusedAdam_DoesNotSilentlyZeroNaN (Skip'd) Future-fix regression for the consumer-side gap the #1346 investigation surfaced, now tracked at ooples/AiDotNet.Tensors#396. When a model with multiple trainable parameters routes raw logits (IdentityActivation Dense) through CategoricalCrossEntropyLoss on the fused-Adam path, the CCE chain produces NaN that the Tensors loss readout silently zeroes — consumer sees GetLastLoss=0 ('converged') while gradients are corrupted. Unskip once Tensors#396 fix ships and the consuming NuGet bumps. Closes #1346 — the engine-side fix is in master via #362; the remaining consumer-side fix lives in AiDotNet.Tensors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

vercel · 2026-05-19T13:43:04Z

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
aidotnet_website	Ignored	Preview	May 19, 2026 5:30pm
aidotnet-playground-api	Ignored	Preview	May 19, 2026 5:30pm

coderabbitai · 2026-05-19T13:43:20Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a9d2e34d-39ee-4ab0-8264-0c3dc1cdf582

📥 Commits

Reviewing files that changed from the base of the PR and between 44acd93 and 5778c50.

📒 Files selected for processing (1)

tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs

Walkthrough

Adds a new integration test file that verifies FlashAttentionLayer engages the compiled fused-Adam path and includes a skipped regression test ensuring last-loss is not a literal zero silent-failure.

Changes

FlashAttention Fused-Compiled Training Regression Tests

Layer / File(s)	Summary
Test class setup and shared constants `tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs`	Test class `FlashAttentionFusedCompiledTrainingIssue1346Tests` with `ITestOutputHelper`, collection attribute, shared constants, and class closure.
Transformer builder and wiring `tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs`	`BuildFlashAttentionTransformer(...)` constructs a `Transformer<float>` including `FlashAttentionLayer<float>`, normalization/slice, identity output Dense, wires `CategoricalCrossEntropyLoss<float>` and `AdamOptimizer<float>`.
Deterministic input/target builders `tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs`	`BuildFingerprintInput(...)` builds a deterministic `[1, SeqLen, EmbedDim]` tensor; `BuildOneHotTarget(...)` builds a `[1, NumClasses]` one-hot target.
Canary test for fused path engagement `tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs`	`FlashAttentionLayer_TrainViaFusedCompiledAdam_EngagesFusedPath()` resets/invalidate fused-step tracking, runs `Train()`, logs fused-step count, and asserts `fusedSteps > 0`.
Regression test for loss correctness (skipped) `tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs`	`DenseIdentity_CCE_OnFusedAdam_DoesNotSilentlyZeroNaN()` (skipped pending AiDotNet.Tensors#396) trains an identity-activated Dense→(layernorm/slice)→Dense model once on fused path, logs fused-step and `GetLastLoss()` diagnostics, asserts fused engagement, and verifies loss != literal `0f` (NaN/Infinity allowed).

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly Related Issues

Fused-compiled training plan drops gradients for 26/29 trainable params in standard Transformer (not just EmbeddingLayer) #1331: Tests exercise CompiledTapeTrainingStep fused-step tracking and relate to compiled fused-optimizer backward/update behavior targeted by this issue.
flashattentionlayer produces degenerate output + 3.76x slowdown vs documented 2-4x speedup #1346: Directly targeted — tests exercise FlashAttentionLayer on the compiled fused-Adam path implicated in this issue.
fused-Adam path silently zeroes NaN loss readout when trace lambda has many params + CCE-style chain AiDotNet.Tensors#396: The skipped regression test reproduces the silent-zero last-loss behavior described in this issue and is blocked on its resolution.

Possibly Related PRs

ooples/AiDotNet#1144: Related fused optimizer execution and plan invalidation logic that these tests validate.
ooples/AiDotNet#1107: Related changes to CompiledTapeTrainingStep<T> auto-compilation/invalidation exercised by these tests.
ooples/AiDotNet#1347: Another PR adding integration tests around FlashAttentionLayer<T> training correctness; closely related at test level.

Poem

Flash layers hum, fused steps ignite,
Tests pin the path in a single bright light,
One train call to prove the fused road runs,
A skipped guard waits till the tensor fix comes,
Small tests, big peace — CI sleeps tonight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding a canary test for `#1346` verifying fused-Adam path engagement plus a skipped regression test for Tensors#396, with direct issue references.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/1346-flashattention-compile-backward

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs`:
- Around line 37-38: Annotate the test class
FlashAttentionFusedCompiledTrainingIssue1346Tests with an xUnit collection
attribute and create a matching CollectionDefinition with DisableParallelization
= true to ensure the global CompiledTapeTrainingStep<float> fused-step counter
is not shared between parallel test threads; specifically, add
[Collection("FlashAttentionFusedTests")] to
FlashAttentionFusedCompiledTrainingIssue1346Tests and add a
CollectionDefinition("FlashAttentionFusedTests", DisableParallelization = true)
class to the test project so the tests run in isolation and the fused-step
counter reads/resets won't interleave.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 05d27118-9dac-4662-9383-f12e5e67f430

📥 Commits

Reviewing files that changed from the base of the PR and between 030a4d0 and 44acd93.

📒 Files selected for processing (1)

tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds integration test coverage to lock in behavior around AiDotNet#1346 and the follow-up AiDotNet.Tensors#396, specifically focusing on the compiled fused-Adam training path and loss readout behavior.

Changes:

Add a canary integration test ensuring a Transformer with FlashAttentionLayer engages the fused compiled-Adam training step.
Add a skipped regression test documenting and guarding against the fused-plan loss readout silently returning 0 for NaN/Inf cases (blocked on Tensors#396).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ooples · 2026-05-19T17:29:45Z

PR #1386 review C8Cn3 thread on Task.Yield/async: keeping the async Task + await Task.Yield() pair because [Fact(Timeout = 60000)] is xUnit-2-only-supports-async in this codebase. Verified empirically this session — when I tried [Fact(Timeout = N)] public void M() on a regression-pin test, xUnit rejected it at execution with "Tests marked with Timeout are only supported for async tests". Task.Yield() is the minimal-overhead trick to satisfy the async Task signature requirement (no real async work in the test body, just need the compiler-emitted state machine). Dropping the timeout would lose the CI safety net we want for tests that exercise full Transformer training — fused-Adam path bugs can spin/hang.

…sedOptimizerGlobalState collection

Copilot AI review requested due to automatic review settings May 19, 2026 13:42

vercel Bot deployed to Preview – aidotnet_website May 19, 2026 13:43 View deployment

coderabbitai Bot requested changes May 19, 2026

View reviewed changes

Comment thread ...t.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread ...t.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs

Comment thread ...t.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs

coderabbitai Bot approved these changes May 19, 2026

View reviewed changes

fix(PR #1386 review): serialize CompiledTapeTrainingStep tests via Fu…

5778c50

…sedOptimizerGlobalState collection

ooples merged commit 429e98c into master May 20, 2026
32 of 46 checks passed

ooples deleted the fix/1346-flashattention-compile-backward branch May 20, 2026 00:31

This was referenced May 20, 2026

fix(#1406): pinn train silently no-op when reusing fused-plan thread cache #1411

Merged

fix(#1380): set training mode to false for validation/test forward passes in Optimize loop #1412

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(#1346): canary for engine fix #362 reaching fused-Adam path + Skip'd regression for Tensors#396#1386

test(#1346): canary for engine fix #362 reaching fused-Adam path + Skip'd regression for Tensors#396#1386
ooples merged 2 commits into
masterfrom
fix/1346-flashattention-compile-backward

ooples commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented May 19, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

ooples commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ooples commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this scope

Test plan

Summary by CodeRabbit

Uh oh!

vercel Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated Code Review Effort

Possibly Related Issues

Possibly Related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

ooples commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ooples commented May 19, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented May 19, 2026 •

edited

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading