Skip to content

fix nn.GRU skipping bhn bias when hidden is None#3252

Merged
angeloskath merged 1 commit intoml-explore:mainfrom
mm65x:fix-gru-hidden-none-bias
Mar 16, 2026
Merged

fix nn.GRU skipping bhn bias when hidden is None#3252
angeloskath merged 1 commit intoml-explore:mainfrom
mm65x:fix-gru-hidden-none-bias

Conversation

@mm65x
Copy link
Copy Markdown
Contributor

@mm65x mm65x commented Mar 14, 2026

Proposed changes

#3249

when hidden=None (the default), the GRU new-gate computation skips the
hidden-side bias bhn entirely. the first timestep computes:

n = tanh(W_xn·x + b_n)

but with an implicit zero hidden state it should compute:

n = tanh(W_xn·x + b_n + r ⊙ b_hn)

this makes gru(x) produce different results from
gru(x, hidden=mx.zeros(...)) whenever bias=True.

the fix adds an elif branch that applies r * self.bhn when there is
no hidden state but the bias exists.

Checklist

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the necessary documentation (if needed)

Copy link
Copy Markdown
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

@angeloskath angeloskath merged commit f226eee into ml-explore:main Mar 16, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants