NanoChat Distilled learning #410
dustinwloring1988
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Has anyone attempted distillation with the larger of the model that @karpathy released and maybe using it when training a mode with the depth of say 12 or a nice power of 2 number like 8? Before someone says that I want a ChatGPT clone from doing that I am not that naive and know that if should only be a capability of the teacher model. If someone has or would like to work together on this please let me know. I think the results will be nice to share on here for people to learn from either way
Beta Was this translation helpful? Give feedback.
All reactions