-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
the code:train/SeleKT/selekt.py calculate topk in every parameter, not global
the paper https://arxiv.org/pdf/2503.03656. 4. Robust Model Adaptation/Proposed Solution says
"we first compute dense gradients by doing full finetuning of the model θ, and the compute the top-k non-zero
entries (by magnitude) on the (accumulated) gradient vector
or the “task vector” θ − θbase. This also ensures that the
parameter selection is global and not confined to specific layers or other heuristics employed in earlier robust finetuning strategies (Lee et al., 2023)."
Metadata
Metadata
Assignees
Labels
No labels