Benchmark Agreement Testing (BAT) Package

Overview

The BAT package is designed to facilitate benchmark agreement testing for NLP models. It allows users to easily compare multiple models against various benchmarks and generate comprehensive reports on their agreement.

Installation

To install the BAT package, you can use pip:

pip install bat-package

Usage Example

Below is a step-by-step example of how to use the BAT package to perform agreement testing.

Step 1: Configuration

First, set up the configuration for the tests:

import pandas as pd
from bat import Tester, Config, Benchmark, Reporter
from bat.utils import get_holistic_benchmark

cfg = Config(
    exp_to_run="example",
    n_models_taken_list=[0],
    model_select_strategy_list=["random"],
    n_exps=10
)

Step 2: Fetch Model Names

Fetch the names of the reference models to be used for scoring:

tester = Tester(cfg=cfg)
models_for_benchmark_scoring = tester.fetch_reference_models_names(
    reference_benchmark=get_holistic_benchmark(), n_models=20
)
print(models_for_benchmark_scoring)

Step 3: Load and Prepare Benchmark

Load a new benchmark and add an aggregate column:

newbench_name = "fakebench"
newbench = Benchmark(
    pd.read_csv(f"src/bat/assets/{newbench_name}.csv"),
    data_source=newbench_name,
)
newbench.add_aggregate(new_col_name=f"{newbench_name}_mwr")

Step 4: Agreement Testing

Perform all-vs-all agreement testing on the new benchmark:

newbench_agreements = tester.all_vs_all_agreement_testing(newbench)
reporter = Reporter()
reporter.draw_agreements(newbench_agreements)

Step 5: Extend and Clean Benchmark

Extend the new benchmark with holistic data and clear repeated scenarios:

allbench = newbench.extend(get_holistic_benchmark())
allbench.clear_repeated_scenarios(source_to_keep=newbench_name)

Step 6: Comprehensive Agreement Testing

Perform comprehensive agreement testing and visualize:

all_agreements = tester.all_vs_all_agreement_testing(allbench)
reporter.draw_agreements(all_agreements)

Contributing

Contributions to the BAT package are welcome! Please submit your pull requests or issues through our GitHub repository.

License

This package is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
.vscode		.vscode
docs		docs
examples		examples
src/bat		src/bat
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
AUTHORS.rst		AUTHORS.rst
CODE_OF_CONDUCT.rst		CODE_OF_CONDUCT.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENCE		LICENCE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Benchmark Agreement Testing (BAT) Package

Overview

Installation

Usage Example

Contributing

License

About

Releases

Packages

Contributors 2

Languages

License

IBM/benchbench

Folders and files

Latest commit

History

Repository files navigation

Benchmark Agreement Testing (BAT) Package

Overview

Installation

Usage Example

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages