NorMuon + Deeper U-Net with INT6 Fake Quantization by sea-rod · Pull Request #2016 · openai/parameter-golf

sea-rod · 2026-04-30T21:05:18Z

Swapped out Muon for NorMuon so neurons get more balanced updates instead of a few dominating the whole run. Bumped layers from 9 to 12 for more depth and better skip connections without blowing the size budget. Added INT6 fake quant on attention and MLP activations so the model actually trains closer to how it gets exported, rather than seeing clean fp32 the whole time then getting hit with int8. Also dropped q_gain init from 1.5 to 1.0 since starting inflated after QK-norm was just causing early spikes for no reason.

NorMuon + Deeper U-Net with INT6 Fake Quantization

eea93b0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NorMuon + Deeper U-Net with INT6 Fake Quantization#2016

NorMuon + Deeper U-Net with INT6 Fake Quantization#2016
sea-rod wants to merge 1 commit into
openai:mainfrom
sea-rod:main

sea-rod commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sea-rod commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant