Skip to content

fix: matrix_power exponent gradient — clamp subgradient + Python-scalar support#29

Merged
bruAristimunha merged 2 commits into
mainfrom
fix/matrix-power-exponent-grad-clamp
May 18, 2026
Merged

fix: matrix_power exponent gradient — clamp subgradient + Python-scalar support#29
bruAristimunha merged 2 commits into
mainfrom
fix/matrix-power-exponent-grad-clamp

Conversation

@bruAristimunha
Copy link
Copy Markdown
Contributor

Context

Follow-up to #26, which added a gradient w.r.t. the exponent parameter in matrix_power.backward. Two issues remained after that merge.

Bug 1 — spurious huge exponent gradient at clamped eigenvalues

For numerical stability, eigenvalues below get_epsilon(dtype, "eigval_power") (~1e-13 for double) are clamped to that threshold. The X-gradient correctly picks subgradient 0 there:

# core.py:412-413 (existing)
s_deriv = exponent * s_clamped.pow(exponent - 1.0)
s_deriv[s <= threshold] = 0   # subgradient 0 at clamped eigenvalues

But the new exponent-gradient kept flowing through s_modified * log(s_safe) = threshold^e * log(threshold). For s = 1e-30, e = -0.5, this produces grad_exponent ≈ -6.18e+07 instead of the expected ~-0.49.

This makes any training that touches near-singular SPDs through matrix_power blow up.

Bug 2 — Python-scalar exponent crashes backward

Several internal call sites pass a Python float:

# modules/liebn.py:289
matrix_power.apply(X, self.theta)   # self.theta is a Python float

PR #26's backward does .reshape_as(exponent) which crashes with TypeError: reshape_as(): argument 'other' must be Tensor, not float. Reproduced on main.

Fix

  1. forward: cast exponent to a 0-d tensor on X's device/dtype via torch.as_tensor so the .reshape_as call always sees a tensor.
  2. backward: apply the same subgradient-0 convention to exp_g via torch.where(s > threshold, exp_g, 0.0).
  3. Skip the exponent-gradient computation (including the n×n matmul U.mT @ grad_output @ U) when ctx.needs_input_grad[1] is False — saves work on the common path where a Python scalar is passed.
  4. Docstring updated: exponent : float or torch.Tensor.

Net diff: +25 / −13 lines.

Tests

tests/test_functional.py::test_matrix_power now covers:

  • existing gradcheck with 0-d tensor exponent (unchanged)
  • backward with a Python-scalar exponent doesn't crash
  • exponent gradient stays bounded (< 1) on a near-singular input where the bug previously produced ~6e7

Full suite: 707 passed, 146 skipped (no regressions).

Authors

…alues, accept Python scalars

PR #26 added a gradient w.r.t. the exponent in matrix_power.backward, but
two issues remained:

1. With near-singular inputs, the X-gradient correctly picks subgradient 0
   at clamped eigenvalues (s_deriv[s <= threshold] = 0 in `derivative`),
   but the new exponent-gradient kept flowing through
   `s_modified * log(s_safe) = threshold^e * log(threshold)`, producing
   huge spurious values (e.g. ~6e7 for s=1e-30, e=-0.5). Apply the same
   subgradient-zero convention.

2. Passing a Python scalar (as several call sites do, e.g.
   `matrix_power.apply(X, self.theta)` in modules/liebn.py) crashed
   backward at `.reshape_as(exponent)`. Normalize the exponent to a 0-d
   tensor in forward and return None for non-Variable inputs.

The exponent-gradient block also runs only when needed (skips the n×n
matmul `U.mT @ grad_output @ U` on the common no-grad-on-exponent path).

Regression tests cover both cases.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 22fda2bbbb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread spd_learn/functional/core.py
@github-actions
Copy link
Copy Markdown

📚 Documentation Preview

📦 Download Documentation Artifact

Download the documentation-html artifact from the workflow run to view the docs locally.

💡 To enable live previews, add a SURGE_TOKEN secret to this repository. See surge.sh for setup instructions.

@bruAristimunha bruAristimunha merged commit a40daf6 into main May 18, 2026
11 checks passed
@bruAristimunha bruAristimunha deleted the fix/matrix-power-exponent-grad-clamp branch May 18, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant