Skip to content

NorMuon + Deeper U-Net with INT6 Fake Quantization#2016

Open
sea-rod wants to merge 1 commit into
openai:mainfrom
sea-rod:main
Open

NorMuon + Deeper U-Net with INT6 Fake Quantization#2016
sea-rod wants to merge 1 commit into
openai:mainfrom
sea-rod:main

Conversation

@sea-rod
Copy link
Copy Markdown

@sea-rod sea-rod commented Apr 30, 2026

Swapped out Muon for NorMuon so neurons get more balanced updates instead of a few dominating the whole run. Bumped layers from 9 to 12 for more depth and better skip connections without blowing the size budget. Added INT6 fake quant on attention and MLP activations so the model actually trains closer to how it gets exported, rather than seeing clean fp32 the whole time then getting hit with int8. Also dropped q_gain init from 1.5 to 1.0 since starting inflated after QK-norm was just causing early spikes for no reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant