Overview

This repository allows you to do 2 main things:

Run evaluation frameworks on different datasets
Compute correlations (spearman, pearson, ... on a dataset-level or system-level) of previously computed scores against human scores

All currently used datasets are included in the repository inside datasets

Datasets

To add a new dataset, you need to implement the DataCollector class src\data_collector.py

Evaluation Frameworks

To add a new evaluation framework you need to implement the EvaluationFramework class and specifically the evaluate() function Refer to the abstract implementation for details on the expected format

Human Evaluation

If you want to compute correlations against human scores, you need to implement a class HumanEvalCollector for the corresponding DataCollector

Pipelines

A pipeline can be configured with a data collector (for model predictions), eval frameworks (for system evaluations) and eval collector (for human evaluation) to compute the necessary scores first and lastly compute all correlations.

Refer to `pipelines/example for an example on the TopicalChat dataset comparing some of the implemented evaluation frameworks.

Easy start: Copy the example from pipelines\example and run it

python -m pipelines.ex_pipeline

Outputs can be found in a new file inside the outputs folder

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
configs		configs
datasets		datasets
pipelines/example		pipelines/example
scripts		scripts
src		src
tests		tests
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Datasets

Evaluation Frameworks

Human Evaluation

Pipelines

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

grounded-coding/docground-benchmark

Folders and files

Latest commit

History

Repository files navigation

Overview

Datasets

Evaluation Frameworks

Human Evaluation

Pipelines

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages