Skip to content

test(#1346): canary for engine fix #362 reaching fused-Adam path + Skip'd regression for Tensors#396#1386

Merged
ooples merged 2 commits into
masterfrom
fix/1346-flashattention-compile-backward
May 20, 2026
Merged

test(#1346): canary for engine fix #362 reaching fused-Adam path + Skip'd regression for Tensors#396#1386
ooples merged 2 commits into
masterfrom
fix/1346-flashattention-compile-backward

Conversation

@ooples
Copy link
Copy Markdown
Owner

@ooples ooples commented May 19, 2026

Summary

Adds focused integration test coverage tied to AiDotNet#1346 (now closed) and the follow-up tracked at AiDotNet.Tensors#396.

Two tests in one new file:

  1. FlashAttentionLayer_TrainViaFusedCompiledAdam_EngagesFusedPath (active, passes) — canary that locks in AiDotNet.Tensors PR [Test Coverage] Implement Tests for Basic Wavelets #362's reach into the AiDotNet fused-Adam training path. A Transformer<float> with FlashAttentionLayer in its explicit layer stack must engage CompiledTapeTrainingStep<float>.TryStepWithFusedOptimizer on the first Train() call. Pre-fix this canary would be invisible (the fused step ran but FA gradients were silently zeroed). Post-fix the engagement counter is the proxy for "the engine fix reached this path".

  2. DenseIdentity_CCE_OnFusedAdam_DoesNotSilentlyZeroNaN (Skip'd until Tensors#396 ships) — future-fix regression for the consumer-side gap surfaced during the flashattentionlayer produces degenerate output + 3.76x slowdown vs documented 2-4x speedup #1346 investigation. When a multi-parameter model routes raw logits (IdentityActivation Dense) through CategoricalCrossEntropyLoss on the fused-Adam path, the CCE log(negative_logit + eps) chain produces NaN that the Tensors loss-readout silently zeroes — GetLastLoss() returns literal 0.0, the consumer thinks training converged, and the model never moves off random init. The Skip message includes the unblock condition (Tensors#396 fix + NuGet bump).

Why this scope

The bug investigation for #1346 found that the original engine-side gradient drop was correctly fixed by AiDotNet.Tensors PR #362 (in NuGet 0.81.3). The remaining "consumer still sees degenerate output" symptom turned out to be a much larger, not-FA-specific bug in the Tensors loss readout — diagnosed via 5 controlled bisection experiments and filed in detail at AiDotNet.Tensors#396. That fix lives in AiDotNet.Tensors/Engines/Compilation/CompiledTrainingPlan.cs, not in AiDotNet. AiDotNet#1346 is closed.

Test plan

  • dotnet build tests/AiDotNet.Tests/AiDotNetTests.csproj --framework net10.0 -c Release — 0 errors
  • dotnet test ... --filter "FullyQualifiedName~FlashAttentionFusedCompiledTrainingIssue1346Tests" --no-build — 1 passed, 1 skipped

Local run output:

Passed!  - Failed: 0, Passed: 1, Skipped: 1, Total: 2

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Added integration tests verifying FlashAttention engagement on the fused-Adam compiled training path.
    • Includes a canary test that ensures fused-step tracking is triggered during training.
    • Adds a skipped regression test (blocked externally) that validates loss behavior to prevent silent-zero failures.
    • Tests emit diagnostic values for debugging and verification.

Review Change Stack

…ip'd regression for Tensors#396

Adds a focused integration test file with two tests:

1. FlashAttentionLayer_TrainViaFusedCompiledAdam_EngagesFusedPath (active)
   Locks in that AiDotNet.Tensors PR #362 (engine-side Engine.FlashAttention
   GraphMode.IsActive recording branch, shipped in NuGet 0.81.3) actually
   reaches the AiDotNet fused-Adam training path. A Transformer<float> with
   FlashAttentionLayer in its layer stack must engage
   CompiledTapeTrainingStep<float>.TryStepWithFusedOptimizer on the first
   Train() call. A future regression that breaks GraphMode trace through FA
   (or any future compile-side gate that rejects FA-containing graphs) would
   flip this red.

2. DenseIdentity_CCE_OnFusedAdam_DoesNotSilentlyZeroNaN (Skip'd)
   Future-fix regression for the consumer-side gap the #1346 investigation
   surfaced, now tracked at ooples/AiDotNet.Tensors#396. When a model with
   multiple trainable parameters routes raw logits (IdentityActivation Dense)
   through CategoricalCrossEntropyLoss on the fused-Adam path, the CCE chain
   produces NaN that the Tensors loss readout silently zeroes — consumer
   sees GetLastLoss=0 ('converged') while gradients are corrupted. Unskip
   once Tensors#396 fix ships and the consuming NuGet bumps.

Closes #1346 — the engine-side fix is in master via #362; the remaining
consumer-side fix lives in AiDotNet.Tensors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 19, 2026 13:42
@vercel
Copy link
Copy Markdown

vercel Bot commented May 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
aidotnet_website Ignored Ignored Preview May 19, 2026 5:30pm
aidotnet-playground-api Ignored Ignored Preview May 19, 2026 5:30pm

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a9d2e34d-39ee-4ab0-8264-0c3dc1cdf582

📥 Commits

Reviewing files that changed from the base of the PR and between 44acd93 and 5778c50.

📒 Files selected for processing (1)
  • tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs

Walkthrough

Adds a new integration test file that verifies FlashAttentionLayer engages the compiled fused-Adam path and includes a skipped regression test ensuring last-loss is not a literal zero silent-failure.

Changes

FlashAttention Fused-Compiled Training Regression Tests

Layer / File(s) Summary
Test class setup and shared constants
tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs
Test class FlashAttentionFusedCompiledTrainingIssue1346Tests with ITestOutputHelper, collection attribute, shared constants, and class closure.
Transformer builder and wiring
tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs
BuildFlashAttentionTransformer(...) constructs a Transformer<float> including FlashAttentionLayer<float>, normalization/slice, identity output Dense, wires CategoricalCrossEntropyLoss<float> and AdamOptimizer<float>.
Deterministic input/target builders
tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs
BuildFingerprintInput(...) builds a deterministic [1, SeqLen, EmbedDim] tensor; BuildOneHotTarget(...) builds a [1, NumClasses] one-hot target.
Canary test for fused path engagement
tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs
FlashAttentionLayer_TrainViaFusedCompiledAdam_EngagesFusedPath() resets/invalidate fused-step tracking, runs Train(), logs fused-step count, and asserts fusedSteps > 0.
Regression test for loss correctness (skipped)
tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs
DenseIdentity_CCE_OnFusedAdam_DoesNotSilentlyZeroNaN() (skipped pending AiDotNet.Tensors#396) trains an identity-activated Dense→(layernorm/slice)→Dense model once on fused path, logs fused-step and GetLastLoss() diagnostics, asserts fused engagement, and verifies loss != literal 0f (NaN/Infinity allowed).

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly Related Issues

Possibly Related PRs

  • ooples/AiDotNet#1144: Related fused optimizer execution and plan invalidation logic that these tests validate.
  • ooples/AiDotNet#1107: Related changes to CompiledTapeTrainingStep<T> auto-compilation/invalidation exercised by these tests.
  • ooples/AiDotNet#1347: Another PR adding integration tests around FlashAttentionLayer<T> training correctness; closely related at test level.

Poem

Flash layers hum, fused steps ignite,
Tests pin the path in a single bright light,
One train call to prove the fused road runs,
A skipped guard waits till the tensor fix comes,
Small tests, big peace — CI sleeps tonight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding a canary test for #1346 verifying fused-Adam path engagement plus a skipped regression test for Tensors#396, with direct issue references.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/1346-flashattention-compile-backward

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs`:
- Around line 37-38: Annotate the test class
FlashAttentionFusedCompiledTrainingIssue1346Tests with an xUnit collection
attribute and create a matching CollectionDefinition with DisableParallelization
= true to ensure the global CompiledTapeTrainingStep<float> fused-step counter
is not shared between parallel test threads; specifically, add
[Collection("FlashAttentionFusedTests")] to
FlashAttentionFusedCompiledTrainingIssue1346Tests and add a
CollectionDefinition("FlashAttentionFusedTests", DisableParallelization = true)
class to the test project so the tests run in isolation and the fused-step
counter reads/resets won't interleave.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 05d27118-9dac-4662-9383-f12e5e67f430

📥 Commits

Reviewing files that changed from the base of the PR and between 030a4d0 and 44acd93.

📒 Files selected for processing (1)
  • tests/AiDotNet.Tests/IntegrationTests/NeuralNetworks/FlashAttentionFusedCompiledTrainingIssue1346Tests.cs

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds integration test coverage to lock in behavior around AiDotNet#1346 and the follow-up AiDotNet.Tensors#396, specifically focusing on the compiled fused-Adam training path and loss readout behavior.

Changes:

  • Add a canary integration test ensuring a Transformer with FlashAttentionLayer engages the fused compiled-Adam training step.
  • Add a skipped regression test documenting and guarding against the fused-plan loss readout silently returning 0 for NaN/Inf cases (blocked on Tensors#396).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ooples
Copy link
Copy Markdown
Owner Author

ooples commented May 19, 2026

PR #1386 review C8Cn3 thread on Task.Yield/async: keeping the async Task + await Task.Yield() pair because [Fact(Timeout = 60000)] is xUnit-2-only-supports-async in this codebase. Verified empirically this session — when I tried [Fact(Timeout = N)] public void M() on a regression-pin test, xUnit rejected it at execution with "Tests marked with Timeout are only supported for async tests". Task.Yield() is the minimal-overhead trick to satisfy the async Task signature requirement (no real async work in the test body, just need the compiler-emitted state machine). Dropping the timeout would lose the CI safety net we want for tests that exercise full Transformer training — fused-Adam path bugs can spin/hang.

@ooples ooples merged commit 429e98c into master May 20, 2026
32 of 46 checks passed
@ooples ooples deleted the fix/1346-flashattention-compile-backward branch May 20, 2026 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants