Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic model? #20

Closed
forresti opened this issue Jan 18, 2024 · 3 comments
Closed

Generic model? #20

forresti opened this issue Jan 18, 2024 · 3 comments
Assignees

Comments

@forresti
Copy link

forresti commented Jan 18, 2024

Thanks for publishing this excellent work. If I understand correctly, you run LASER intervention separately for each evaluation task.

Would it be possible to make one LASER model that is generic to all tasks? My goal is to compress LLAMA-v2-7B to be smaller, for executing faster on mobile devices.

Also, is it correct that you just apply LASER to one layer of the model? I was wondering, did you try applying it to most of the layers?

@dkmisra dkmisra self-assigned this Jan 19, 2024
@dkmisra
Copy link
Collaborator

dkmisra commented Jan 19, 2024

That is correct. We do pick LASER hyperparameters for each task and this is important for seeing the huge gains we report. There is an alternate method called LaserRMT that is not from us, which provides a different task-agnostic way to select hyperparameters. I haven't tried it myself but the authors have reported some results.

The simplest way to try LASER across a range of tasks, is to compute a meta-score on a task like AGIEval, and then use it to select the hyperparameter. I am optimistic that we will still see gains across a range of tasks since we find that typically the gains all come from doing intervention in the later MLP layers, and so the optimal hyperparameters tend to have some pattern. The gains might be more modest, compared to only focusing on a single task though.

For most experiments in our paper, we apply LASER to a single layer and in fact we apply a single LASER intervention, i.e., we only edit a single matrix. We have an experiment on GPTJ+CounterFact where we composed multiple LASER interventions. See the paragraph Composing reductions across layers in the paper. @pratyushasharma has released a script here with details for this experiment, and the upcoming refactoring will support composing LASER in a proper generalizable way.

@dkmisra
Copy link
Collaborator

dkmisra commented Jan 19, 2024

Related to #19

@forresti
Copy link
Author

Thanks so much!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants