[Bug Report] evals.py ablates all heads when it needs to ablate only one #163

shehper · 2024-05-28T18:15:58Z

If you are submitting a bug report, please fill in the following details and use the tag [bug].

Describe the bug
When evaluating SAEs trained on individual attention head outputs, the reconstructed score should be calculated by zero-ablating only that singular head.

Here, zero_abl_loss calls zero_ablate_hook, which zero-ablates all heads.

Additional context
I noticed this while training SAEs on individual attention head outputs. The reconstructed score looked really good (~0.97) but the feature dashboards looked quite mediocre.

Checklist

I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

shehper · 2024-05-28T18:17:48Z

I will be happy to submit a PR for this.

jbloomAus · 2024-05-29T10:32:05Z

Hey Shepher,

Thanks for raising this. So obvious in hindsight. I'd love a PR for this.

shehper · 2024-05-30T16:18:01Z

Sorry, I have just been a little swamped with the research sprint. I'll submit a PR over the weekend.

shehper mentioned this issue Jul 8, 2024

Fixed hooks for single head SAEs #219

Merged

8 tasks

shehper closed this as completed Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] evals.py ablates all heads when it needs to ablate only one #163

[Bug Report] evals.py ablates all heads when it needs to ablate only one #163

shehper commented May 28, 2024 •

edited

Loading

shehper commented May 28, 2024

jbloomAus commented May 29, 2024

shehper commented May 30, 2024

[Bug Report] evals.py ablates all heads when it needs to ablate only one #163

[Bug Report] evals.py ablates all heads when it needs to ablate only one #163

Comments

shehper commented May 28, 2024 • edited Loading

Checklist

shehper commented May 28, 2024

jbloomAus commented May 29, 2024

shehper commented May 30, 2024

shehper commented May 28, 2024 •

edited

Loading