Fix three incorrect gradient implementations (Softmax, RMSNorm, SimpleRNN) by Copilot · Pull Request #7 · sourcepirate/neutro

Copilot · 2026-05-31T10:38:33Z

Three gradient bugs producing silently wrong backpropagation results in the manual (no-autograd) training path.

Softmax — wrong Jacobian approximation (`activations/softmax.py`)

gradient() returned s * (1 - s) — the diagonal of the softmax Jacobian only. The off-diagonal terms (-s_i * s_j) are required for a correct chain-rule result. gradient() now raises NotImplementedError; gradient_fast() is rewritten to the closed-form vectorised formula:

# Before (wrong — diagonal only):
return self.last_output * (1 - self.last_output)

# After — full Jacobian contracted with upstream gradient:
dot = np.sum(s * grad_output, axis=-1, keepdims=True)
return s * (grad_output - dot)   # dL/dx_k = s_k * (g_k - Σ_i s_i·g_i)

This also replaces the O(n²) per-sample Jacobian loop with an O(n) broadcast.

RMSNorm — hardcoded reduction axes (`layers/normalization/rmsnorm.py`)

grads['weight'] reduction used axis=(0, 1), which only works for 3-D inputs (batch, seq, dim) and silently produces a scalar for 2-D (batch, dim) inputs:

# Before:
self.grads['weight'] = np.sum(grad_output * self.x_norm, axis=(0, 1))

# After — works for any input rank:
feature_axes = tuple(range(len(grad_output.shape) - 1))
self.grads['weight'] = np.sum(grad_output * self.x_norm, axis=feature_axes)

SimpleRNN — tanh derivative applied unconditionally (`layers/recurrent/simple_rnn.py`)

backward() always computed dz = dh * (1 - h²) regardless of the configured activation. For activation='linear' the correct derivative is 1, so dz = dh:

if self.activation_name == 'tanh':
    dz = dh * (1 - self.h_states[:, t+1, :]**2)
else:
    dz = dh  # linear: d(z)/dz = 1

…eRNN

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

fix: correct gradient implementations for Softmax, RMSNorm, and Simpl…

24dcd39

…eRNN

Copilot AI assigned Copilot and sourcepirate May 31, 2026

Copilot created this pull request from a session on behalf of sourcepirate May 31, 2026 10:38 View session

sourcepirate marked this pull request as ready for review May 31, 2026 10:49

Copilot AI review requested due to automatic review settings May 31, 2026 10:49

sourcepirate merged commit 33d8cd8 into main May 31, 2026
3 checks passed

Copilot started reviewing on behalf of sourcepirate May 31, 2026 10:49 View session

Copilot AI reviewed May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix three incorrect gradient implementations (Softmax, RMSNorm, SimpleRNN)#7

Fix three incorrect gradient implementations (Softmax, RMSNorm, SimpleRNN)#7
sourcepirate merged 1 commit into
mainfrom
copilot/verify-gradient-accuracy

Copilot AI commented May 31, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented May 31, 2026

Softmax — wrong Jacobian approximation (activations/softmax.py)

RMSNorm — hardcoded reduction axes (layers/normalization/rmsnorm.py)

SimpleRNN — tanh derivative applied unconditionally (layers/recurrent/simple_rnn.py)

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Softmax — wrong Jacobian approximation (`activations/softmax.py`)

RMSNorm — hardcoded reduction axes (`layers/normalization/rmsnorm.py`)

SimpleRNN — tanh derivative applied unconditionally (`layers/recurrent/simple_rnn.py`)