Skip to content

Bug: EvalConfig preset resolver aliases shared list, mutation corrupts BENCHMARK_PRESETS #31

@sacredvoid

Description

@sacredvoid

Description

In `eval.py:51-53`, the preset resolver assigns the `BENCHMARK_PRESETS` list directly:

```python
self.tasks = BENCHMARK_PRESETS[self.preset] # aliasing, not copying!
```

Since both point to the same list object, any mutation of `config.tasks` (e.g. `config.tasks.append("extra")`) permanently corrupts the shared preset dict for all future configs in the same process.

Reproduction

```python
from alignrl.eval import BENCHMARK_PRESETS, EvalConfig
cfg = EvalConfig(preset="reasoning")
cfg.tasks.append("extra")
print(BENCHMARK_PRESETS["reasoning"]) # ['gsm8k', 'math', 'arc_challenge', 'extra'] - CORRUPTED
```

Impact

In notebooks where multiple EvalConfig instances are created (e.g. evaluating different stages), modifying one config's tasks silently corrupts all subsequent preset resolutions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions