Add sparse-data unit test for xvector-based linear regression by highperformancecoder · Pull Request #643 · highperformancecoder/minsky

highperformancecoder · 2026-04-09T07:12:03Z

Linear regression using a y tensor's x-vector for x-values was silently producing incorrect results when y contained non-finite entries (NaN/Inf), because sumx and sumxx were not masked to finite pairs.

Changes

Test coverage (test/testTensorOps.cc): Adds xvectorValueLinearRegressionSparse — a gtest case with y = {1, NaN, 3, 4, Inf, 6} over x-vector {0,1,2,3,4,5}. Verifies that only the four finite pairs are used and the regression recovers y = x + 1 across all output slots.

TensorVal y(hc); y=vector<double>{1, std::numeric_limits<double>::quiet_NaN(), 3, 4,
                                  std::numeric_limits<double>::infinity(), 6};
// finite pairs: (0,1),(2,3),(3,4),(5,6) => slope=1, intercept=1
vector<double> result={1,2,3,4,5,6};

This change is

Summary by CodeRabbit

Bug Fixes
- Linear regression now properly excludes non-finite values (NaN and Inf) from calculations, ensuring accurate regression results when working with sparse or incomplete data.

… sparse.

coderabbitai · 2026-04-09T07:12:22Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e2de9a7e-78e4-4b6e-a22c-c013ebbbdda2

📥 Commits

Reviewing files that changed from the base of the PR and between 971e761 and 1c7a046.

📒 Files selected for processing (1)

test/testTensorOps.cc

📝 Walkthrough

Walkthrough

LinearRegression::setArguments now checks y->rank() for the "no x provided" path, moves sumy.setArgument(y,args) later, and applies masking to ignore non-finite (x,y) pairs. Aggregates are rebuilt to operate on masked values so invalid observations are excluded.

Changes

Cohort / File(s)	Summary
Linear Regression Masking & Rank Guard `engine/minskyTensorOps.cc`	Changed "no x provided" rank check to `y->rank()`; moved `sumy.setArgument(y,args)` later; introduced `mask(isfinite(x)&&isfinite(y))`, selectors `fx`/`fxy`; built `maskedX`/`maskedY`; updated `sumx`, `sumy`, `sumxx`, `sumxy`, `sumyy`, and `count` to use masked values, excluding non-finite pairs.
Regression Behavior Test `test/testTensorOps.cc`	Added `TEST_F(CorrelationSuite, xvectorValueLinearRegressionSparse)` to exercise value-based xvector regression with interspersed `NaN`/`Inf` and assert outputs match expected `y = x + 1`, verifying non-finite observations are ignored.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Add sparse-data unit test for xvector-based linear regression #643 — Implements masking of non-finite (x,y) pairs and adds the sparse xvector linear-regression test; directly related.
Added a "bulk" Linear regression operator that returns a tensor of #634 — Modifies LinearRegression::setArguments and related regression wiring in engine/minskyTensorOps.cc; closely related refactor/extension.

Poem

🐇 I hop through tensors, nose to the map,

I mask out the NaNs and give Infs a tap,
I count only the true, not the gap,
Sums and slopes now nap in a nap,
Hooray — clean fits on my tiny lap.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: adding a unit test for linear regression with sparse data and xvector-based variables. It matches the primary focus evident in the raw summary and PR objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bugfix/self-linear-regression-sparse-data

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@engine/minskyTensorOps.cc`:
- Around line 1197-1205: The sumy accumulation currently includes y values where
x is non-finite, breaking the paired-data constraint; create a maskedY BinOp
(analogous to fx/maskedX) that returns y only when both x and y are finite (use
the existing mask lambda or a variant), then replace the existing
sumy.setArgument(y,args) call with
sumy.setArgument(make_shared<BinOp>(maskedY,y,spreadX),args) so sumy, sumx,
sumxx, sumxy and count all use the same pairwise masking logic.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2bef1802-094b-4f0b-b3f4-cd506d8ca032

📥 Commits

Reviewing files that changed from the base of the PR and between 125ef6e and 5a1727b.

📒 Files selected for processing (1)

engine/minskyTensorOps.cc

engine/minskyTensorOps.cc

Copilot

Pull request overview

This PR fixes linear regression calculations when the independent variable is derived from a tensor’s x-vector and the dependent data contains sparse (non-finite) entries.

Changes:

Corrects the axis validity guard to use y->rank() instead of the current op’s rank().
Masks sumx and sumxx computations so they only include finite (x,y) pairs, aligning them with count/sumxy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T07:17:47Z

engine/minskyTensorOps.cc

+      auto mask=[](double x, double y){return isfinite(x) && isfinite(y);};
+      auto fx=[](double x, double y){return isfinite(x) && isfinite(y)? x:0;};
      auto fxy=[](double x, double y){return isfinite(x) && isfinite(y)? x*y: 0;};
+      auto maskedX=make_shared<BinOp>(fx,spreadX,y);
+      sumx.setArgument(maskedX,args);
      sumyy.setArgument(make_shared<BinOp>(fxy,y,y),args);
-      sumxx.setArgument(make_shared<BinOp>(fxy,spreadX,spreadX),args);
+      sumxx.setArgument(make_shared<BinOp>(fxy,maskedX,spreadX),args);
      sumxy.setArgument(make_shared<BinOp>(fxy,y,spreadX),args);
-      count.setArgument
-        (make_shared<BinOp>([](double x,double y) {return isfinite(x)*isfinite(y);},y,spreadX),args);
+      count.setArgument(make_shared<BinOp>(mask,y,spreadX),args);


The new masking logic (maskedX / fxy / count) changes linearRegression behavior for sparse inputs, but there doesn’t appear to be a unit test covering missing/non-finite y values when x is derived from y’s x-vector (the scenario this fix targets). Please add a gtest case (likely in test/testTensorOps.cc near the existing *LinearRegression tests) that includes NaN/Inf entries in y and verifies the regression output matches the expected line using only finite (x,y) pairs.

@copilot apply changes based on this feedback

Added xvectorValueLinearRegressionSparse test in test/testTensorOps.cc (commit bff0c30). The test creates a y-vector with xvector values {0,1,2,3,4,5} and data {1, NaN, 3, 4, Inf, 6}, then verifies the regression correctly identifies y = x + 1 using only the four finite pairs (0,1),(2,3),(3,4),(5,6).

engine/minskyTensorOps.cc

coderabbitai

🧹 Nitpick comments (1)

engine/minskyTensorOps.cc (1)
1203-1203: Consider: sumyy doesn't apply the same pairwise masking.

Unlike the other aggregates, sumyy only checks if y is finite, not whether spreadX is also finite for each element. This is currently harmless since sumyy isn't used in computeScaleAndOffset(), but could cause subtle bugs if it's used in future extensions (e.g., R² calculation).
Optional fix for consistency
-      sumyy.setArgument(make_shared<BinOp>(fxy,y,y),args);
+      sumyy.setArgument(make_shared<BinOp>(fxy,maskedY,y),args);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@engine/minskyTensorOps.cc` at line 1203, sumyy currently only masks on y's
finiteness; update the argument passed to sumyy.setArgument so it applies the
same pairwise masking used elsewhere (i.e., ensure the BinOp combines y and
spreadX finiteness like the other aggregates). Locate the sumyy.setArgument call
and replace the single-input BinOp(make_shared<BinOp>(fxy,y,y),...) with a BinOp
that uses both y and spreadX (matching the pattern used by the other aggregates)
so computeScaleAndOffset and any future users get consistent pairwise masking.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@engine/minskyTensorOps.cc`:
- Line 1203: sumyy currently only masks on y's finiteness; update the argument
passed to sumyy.setArgument so it applies the same pairwise masking used
elsewhere (i.e., ensure the BinOp combines y and spreadX finiteness like the
other aggregates). Locate the sumyy.setArgument call and replace the
single-input BinOp(make_shared<BinOp>(fxy,y,y),...) with a BinOp that uses both
y and spreadX (matching the pattern used by the other aggregates) so
computeScaleAndOffset and any future users get consistent pairwise masking.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7c14124d-ac54-46c1-b630-9a929a930ceb

📥 Commits

Reviewing files that changed from the base of the PR and between 5a1727b and bbc5794.

📒 Files selected for processing (1)

engine/minskyTensorOps.cc

…hen x is derived from x-vector Agent-Logs-Url: https://github.com/highperformancecoder/minsky/sessions/af68bc6c-5736-4b28-b042-2bcf698ddc71 Co-authored-by: highperformancecoder <3075825+highperformancecoder@users.noreply.github.com>

…m:highperformancecoder/minsky into bugfix/self-linear-regression-sparse-data

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/testTensorOps.cc`:
- Around line 1865-1866: The test currently uses EXPECT_EQ(result.size(),
toVal.size()) followed by indexing result[_i], which can OOB if sizes differ;
change the non-fatal check to a fatal one by replacing EXPECT_EQ with ASSERT_EQ
for the size comparison (or alternatively iterate up to result.size() instead of
toVal.size()) so that the loop in the for (size_t _i=0; _i<toVal.size(); ++_i)
EXPECT_NEAR(result[_i], toVal[_i], 1e-4) cannot run when sizes mismatch; update
the expectation that compares result.size() and toVal.size() accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 47ced0ca-9577-42cb-b231-b7947f077d22

📥 Commits

Reviewing files that changed from the base of the PR and between bbc5794 and 971e761.

📒 Files selected for processing (2)

engine/minskyTensorOps.cc
test/testTensorOps.cc

✅ Files skipped from review due to trivial changes (1)

engine/minskyTensorOps.cc

test/testTensorOps.cc

Fix for linear regression calculation using the x-vector when data is…

5a1727b

… sparse.

Copilot AI review requested due to automatic review settings April 9, 2026 07:12

Copilot started reviewing on behalf of highperformancecoder April 9, 2026 07:12 View session

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

engine/minskyTensorOps.cc Show resolved Hide resolved

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Address code review concerns

bbc5794

Copilot started work on behalf of highperformancecoder April 9, 2026 07:34 View session

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

Copilot AI and others added 3 commits April 9, 2026 07:41

Mask out sumyy as well, per Coderabbit suggestion

7868466

Merge branch 'bugfix/self-linear-regression-sparse-data' of github.co…

971e761

…m:highperformancecoder/minsky into bugfix/self-linear-regression-sparse-data

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

test/testTensorOps.cc Outdated Show resolved Hide resolved

Copilot AI changed the title ~~Fix for linear regression calculation using the x-vector when data is…~~ Add sparse-data unit test for xvector-based linear regression Apr 9, 2026

Copilot finished work on behalf of highperformancecoder April 9, 2026 07:47

Address code review comments

1c7a046

highperformancecoder merged commit 40d1d19 into master Apr 10, 2026
5 of 6 checks passed

Uh oh!

Conversation

highperformancecoder commented Apr 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

highperformancecoder Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

highperformancecoder commented Apr 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 9, 2026 •

edited

Loading