Popular repositories Loading
-
RefuseBench
RefuseBench PublicA benchmark for spec-gaming resistance in LLMs: do models honor embedded policy rules when a plausible user request creates pressure to bend them?
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.