Run models in parallel during benchmark #53

steven10a · 2025-11-18T15:03:51Z

Small change to the eval tool.

When benchmarking with multiple models, they now all run in parallel instead of sequentially
Added new tests

Copilot

Pull Request Overview

This PR enhances the guardrail evaluation tool to run model benchmarks in parallel rather than sequentially, significantly improving performance when testing multiple models. The changes introduce concurrency controls and chunking capabilities for better resource management during benchmarking.

Key Changes:

Models now run concurrently during benchmarking, controlled by a semaphore with configurable parallelism limits
Added optional sample chunking to break up large benchmark runs into smaller pieces
Enhanced structured logging throughout the benchmark process with timing information
Added comprehensive unit tests for the new parallel execution and chunking functionality

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
src/guardrails/evals/guardrail_evals.py	Implemented parallel model execution with asyncio.Semaphore, added chunk_samples method, enhanced logging with structured events and timing, added CLI arguments for controlling parallelism
tests/unit/evals/test_guardrail_evals.py	New test file with unit tests covering parallel model limit determination and sample chunking functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/guardrails/evals/guardrail_evals.py

tests/unit/evals/test_guardrail_evals.py

Copilot

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/guardrails/evals/guardrail_evals.py

Copilot

Pull Request Overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

steven10a · 2025-11-19T14:51:38Z

@codex review

chatgpt-codex-connector · 2025-11-19T14:56:06Z

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

gabor-openai

TY

Run models in parallel during benchmark

e2a598a

Copilot AI review requested due to automatic review settings November 18, 2025 15:03

Copilot started reviewing on behalf of steven10a November 18, 2025 15:04 View session

Copilot finished reviewing on behalf of steven10a November 18, 2025 15:13

Copilot AI reviewed Nov 18, 2025

View reviewed changes

steven10a added 3 commits November 18, 2025 10:59

Updating eval docs

d5991f9

Formatting

08fe205

Unused context

32a1ddc

steven10a requested a review from Copilot November 18, 2025 16:16

Copilot started reviewing on behalf of steven10a November 18, 2025 16:16 View session

Copilot finished reviewing on behalf of steven10a November 18, 2025 16:17

Copilot AI reviewed Nov 18, 2025

View reviewed changes

src/guardrails/evals/guardrail_evals.py Outdated Show resolved Hide resolved

steven10a requested a review from gabor-openai November 18, 2025 16:29

Handling race conditions

272f2ec

steven10a requested a review from Copilot November 18, 2025 16:35

Copilot started reviewing on behalf of steven10a November 18, 2025 16:35 View session

Copilot finished reviewing on behalf of steven10a November 18, 2025 16:38

Copilot AI reviewed Nov 18, 2025

View reviewed changes

gabor-openai approved these changes Nov 19, 2025

View reviewed changes

gabor-openai merged commit 967337c into main Nov 19, 2025
9 checks passed

gabor-openai deleted the dev/steven/parallel_eval branch November 19, 2025 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run models in parallel during benchmark #53

Run models in parallel during benchmark #53

Uh oh!

steven10a commented Nov 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

steven10a commented Nov 19, 2025

Uh oh!

chatgpt-codex-connector bot commented Nov 19, 2025

Uh oh!

gabor-openai left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Run models in parallel during benchmark #53

Run models in parallel during benchmark #53

Uh oh!

Conversation

steven10a commented Nov 18, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes:

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

steven10a commented Nov 19, 2025

Uh oh!

chatgpt-codex-connector bot commented Nov 19, 2025

Uh oh!

gabor-openai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants