VisEval: A NL2VIS Benchmark

VisEval is a benchmark designed to evaluate visualization generation methods. In this repository, we provide both the toolkit to support the benchmarking, as well as the data used for benchmarks.

What Can VisEval Evaluate

VisEval evaluates generated visualizations from three dimensions:

Whether the generated code can produce the visualization.
Whether the generated visualization meets the query.
Whether the generated visualization is easy to read.

Get Started

Install Benchmark Toolkit

pip install --upgrade vis-evaluator
# or `git clone https://github.com/microsoft/VisEval.git && cd VisEval && pip install --upgrade -e .`

Download Benchmark Dataset

To access the dataset, please follow these steps:

Download the dataset from this link.
Once the download is complete, unzip the file to extract the dataset contents.

For additional information about the dataset, please refer to the dataset documentation.

Usage & Examples

After installation, you can use VisEval by referring to examples/evaluate.py or a follow:

Create your generation method by inheriting from the Agent Class. You can find three examples in the examples/agent directory.

from viseval.agent import Agent, ChartExecutionResult

class YourAgent(Agent):
    def __init__(self, llm):
        self.llm = llm
    
    def generate(
        self, nl_query: str, tables: list[str], config: dict
    ) -> Tuple[str, dict]:
        """Generate code for the given natural language query."""
        pass

    def execute(
        self, code: str, context: dict, log_name: str = None
    ) -> ChartExecutionResult:
        """Execute the given code with context and return the result"""
        pass

Configure evaluator.

    evaluator = Evaluator(webdriver_path, vision_model)

(You can configure the Evaluator without a webdriver and vision model, in which case the evaluation of the readability of the generated visualizations will be skipped.)

Install webdriver.

# download
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
# install
apt install google-chrome-stable_current_amd64.deb
# verify
google-chrome --version

Load vision model (e.g., GPT4-v).

from langchain_openai import AzureChatOpenAI

import dotenv
# Copy .env.example to .env and put your API keys in the file.
dotenv.load_dotenv()

vision_model = AzureChatOpenAI(
    model_name="gpt-4-turbo-v",
    max_retries=999,
    temperature=0.0,
    request_timeout=20,
    max_tokens=4096,
)

Evaluate

from viseval import Dataset

# Configure dataset with the benchmark dataset folder path ( folder), 
# specify the number of tables required to generate visualizations (table_type`: all, single, or multiple),
# and indicate whether to include irrelevant tables (`with_irrelevant_tables`).
dataset = Dataset(folder, table_type, with_irrelevant_tables)

config = {"library": args.library}
result = evaluator.evaluate(agent, dataset, config)
score = result.score()
print(f"Score: {score}")

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct.For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Privacy Statement

This project has adopted the Microsoft Privacy Statement.

Citation

If you find that VisEval helps your research, please consider citing it:

@misc{chen2024viseval,
      title={VisEval: A Benchmark for Data Visualization in the Era of Large Language Models}, 
      author={Nan Chen and Yuge Zhang and Jiahang Xu and Kan Ren and Yuqing Yang},
      year={2024},
      eprint={2407.00981},
      archivePrefix={arXiv},
      primaryClass={cs.HC},
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
tests		tests
viseval		viseval
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
pyproject.toml		pyproject.toml
viseval_dataset.zip		viseval_dataset.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisEval: A NL2VIS Benchmark

What Can VisEval Evaluate

Get Started

Install Benchmark Toolkit

Download Benchmark Dataset

Usage & Examples

Contributing

Trademarks

Privacy Statement

Citation

About

Releases

Packages

Contributors 2

Languages

License

microsoft/VisEval

Folders and files

Latest commit

History

Repository files navigation

VisEval: A NL2VIS Benchmark

What Can VisEval Evaluate

Get Started

Install Benchmark Toolkit

Download Benchmark Dataset

Usage & Examples

Contributing

Trademarks

Privacy Statement

Citation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages