Bug: parse_results includes booleans as numeric metrics (bool is subclass of int)

## Description

In \`eval.py:31\`:
\`\`\`python
benchmarks[task_name] = {k: v for k, v in metrics.items() if isinstance(v, (int, float))}
\`\`\`

In Python, \`bool\` is a subclass of \`int\`, so \`isinstance(True, (int, float))\` returns \`True\`. If lm-eval returns boolean metadata in its results dict (e.g. config flags), they leak through as metric values 1/0, polluting benchmark comparisons.

## Fix
Add explicit \`not isinstance(v, bool)\` guard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: parse_results includes booleans as numeric metrics (bool is subclass of int) #29

Description

Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: parse_results includes booleans as numeric metrics (bool is subclass of int) #29

Description

Description

Fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions