Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model- and data-dependent hyperparameters #25

Closed
negar-foroutan opened this issue Sep 19, 2022 · 3 comments
Closed

Model- and data-dependent hyperparameters #25

negar-foroutan opened this issue Sep 19, 2022 · 3 comments

Comments

@negar-foroutan
Copy link

Hi!
Thank you very much for making your implementation publicly available.
I want to use ROME on different LMs and datasets than those you tried in the paper. I was wondering which hyperparameters are model- or data-dependent and whether you have an intuition/strategy for finding values for them.
Thanks!

@kmeng01
Copy link
Owner

kmeng01 commented Sep 19, 2022

Hi, this is a great question! Looking at the GPT-J hparams, clamp_norm_factor is perhaps the most important. It is a hard constraint that determines how large $v_*$'s norm can be, with respect to the original hidden representation. If it's too high, bleedover will be high (update unnecessarily large), but if low, the update will not work.

Other soft constraints like weight decay and KL divergence should also be tuned. A good rule of thumb is to start with non-constraining values (e.g., no weight decay, no KL loss, high clamp factor) and make sure the maximum-DOF update works. Then increase constraints to eliminate bleedover effects.

The ROME notebook (notebooks/rome.ipynb) is an excellent place to experiment with these values. The hparams files are hot-reloaded on every run of the execution cell, so iteration speed is relatively fast.

@kmeng01
Copy link
Owner

kmeng01 commented Sep 19, 2022

If you have any model-specific questions, I'd be happy to take a look when I get a moment. lmk!

@negar-foroutan
Copy link
Author

Thank you very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants