Flow Benchmark Tools

Create and run LLM benchmarks.

Installation

Just the library:

pip install flow-benchmark-tools:1.1.0

Library + Example benchmarks (see below):

pip install "flow-benchmark-tools[examples]:1.1.0"

Create an agent by inheriting BenchmarkAgent and implementing the run_benchmark_case method.
Create a Benchmark by compiling a list of BenchmarkCases. These can be read from a JSONL file.
Associate agent and benchmark in a BenchmarkRun.
Use a BenchmarkRunner to run your BenchmarkRun.

Two end-to-end benchmark examples are provided in the examples folder: a LangChain RAG application and an OpenAI Assistant agent.

To run the LangChain RAG benchmark:

python src/examples/langchain_rag_agent.py

To run the OpenAI Assistant benchmark:

python src/examples/openai_assistant_agent.py

The benchmark cases are defined in data/rag_benchmark.jsonl.

The two examples follow the typical usage pattern of the library:

define an agent by implementing the BenchmarkAgent interface and overriding the run_benchmark_case method (you can also override the before and after methods, if needed),
create a set of benchmark cases, typically as a JSONL file such as data/rag_benchmark.jsonl,
use a BenchmarkRunner to run the benchmark.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
data		data
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-examples.txt		requirements-examples.txt
requirements-pub.txt		requirements-pub.txt
requirements.txt		requirements.txt