-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Open
Description
Describe the feature or improvement you're requesting
I wonder if anyone has a solid method for evaluating code benchmarks like APPS.
String typed codes can be very noisy and require deliberate preprocessing to be executed and tested.
I don't see any class inheriting Evals that can perform code tests.
Any clue? 🤔
Additional context
No response
Metadata
Metadata
Assignees
Labels
No labels