Generic model? #20

forresti · 2024-01-18T19:22:38Z

Thanks for publishing this excellent work. If I understand correctly, you run LASER intervention separately for each evaluation task.

Would it be possible to make one LASER model that is generic to all tasks? My goal is to compress LLAMA-v2-7B to be smaller, for executing faster on mobile devices.

Also, is it correct that you just apply LASER to one layer of the model? I was wondering, did you try applying it to most of the layers?

dkmisra · 2024-01-19T01:58:50Z

That is correct. We do pick LASER hyperparameters for each task and this is important for seeing the huge gains we report. There is an alternate method called LaserRMT that is not from us, which provides a different task-agnostic way to select hyperparameters. I haven't tried it myself but the authors have reported some results.

The simplest way to try LASER across a range of tasks, is to compute a meta-score on a task like AGIEval, and then use it to select the hyperparameter. I am optimistic that we will still see gains across a range of tasks since we find that typically the gains all come from doing intervention in the later MLP layers, and so the optimal hyperparameters tend to have some pattern. The gains might be more modest, compared to only focusing on a single task though.

For most experiments in our paper, we apply LASER to a single layer and in fact we apply a single LASER intervention, i.e., we only edit a single matrix. We have an experiment on GPTJ+CounterFact where we composed multiple LASER interventions. See the paragraph Composing reductions across layers in the paper. @pratyushasharma has released a script here with details for this experiment, and the upcoming refactoring will support composing LASER in a proper generalizable way.

dkmisra · 2024-01-19T02:24:24Z

Related to #19

forresti · 2024-01-21T23:03:41Z

Thanks so much!!!

dkmisra self-assigned this Jan 19, 2024

forresti closed this as completed Jan 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic model? #20

Generic model? #20

forresti commented Jan 18, 2024 •

edited

dkmisra commented Jan 19, 2024 •

edited

dkmisra commented Jan 19, 2024

forresti commented Jan 21, 2024

Generic model? #20

Generic model? #20

Comments

forresti commented Jan 18, 2024 • edited

dkmisra commented Jan 19, 2024 • edited

dkmisra commented Jan 19, 2024

forresti commented Jan 21, 2024

forresti commented Jan 18, 2024 •

edited

dkmisra commented Jan 19, 2024 •

edited