New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model- and data-dependent hyperparameters #25
Comments
Hi, this is a great question! Looking at the GPT-J hparams, Other soft constraints like weight decay and KL divergence should also be tuned. A good rule of thumb is to start with non-constraining values (e.g., no weight decay, no KL loss, high clamp factor) and make sure the maximum-DOF update works. Then increase constraints to eliminate bleedover effects. The ROME notebook ( |
If you have any model-specific questions, I'd be happy to take a look when I get a moment. lmk! |
Thank you very much. |
Hi!
Thank you very much for making your implementation publicly available.
I want to use ROME on different LMs and datasets than those you tried in the paper. I was wondering which hyperparameters are model- or data-dependent and whether you have an intuition/strategy for finding values for them.
Thanks!
The text was updated successfully, but these errors were encountered: