You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you are submitting a bug report, please fill in the following details and use the tag [bug].
Describe the bug
When evaluating SAEs trained on individual attention head outputs, the reconstructed score should be calculated by zero-ablating only that singular head.
Here, zero_abl_loss calls zero_ablate_hook, which zero-ablates all heads.
Additional context
I noticed this while training SAEs on individual attention head outputs. The reconstructed score looked really good (~0.97) but the feature dashboards looked quite mediocre.
Checklist
I have checked that there is no similar issue in the repo (required)
The text was updated successfully, but these errors were encountered:
If you are submitting a bug report, please fill in the following details and use the tag [bug].
Describe the bug
When evaluating SAEs trained on individual attention head outputs, the reconstructed score should be calculated by zero-ablating only that singular head.
Here,
zero_abl_loss
callszero_ablate_hook
, which zero-ablates all heads.Additional context
I noticed this while training SAEs on individual attention head outputs. The reconstructed score looked really good (~0.97) but the feature dashboards looked quite mediocre.
Checklist
The text was updated successfully, but these errors were encountered: