This repository implements Struct-Bench, a novel evaluation framework and a benchmark to evaluate synthetic data quality relative to a real dataset, where the real dataset features complex inter-field structural relations, and at least some fields contain natural language. We illustrate the dataset level and sample level views into Struct-Bench as follows:
- Project Setup
- Forking & Pull Requests
- Adding and Evaluating Datasets
- Generating Synthetic Datasets
- Key Files & Directories
- Acknowledgement
- Clone the Repo
git clone hhttps://github.com/struct-bench/structpe.git
cd structpe
- Install
- Recommended: Create a fresh virtual environment (
conda
orvenv
). - Then install locally:
pip install .
- Or for editable mode (if you intend to develop and push changes):
pip install -e .
Adding new datasets to Struct-Bench are welcome! To propose a change or new feature, follow these steps:
-
Fork the Repo on GitHub
- Visit the structpe GitHub page, click “Fork”, and choose your GitHub account.
-
Clone Your Fork
git clone https://github.com/yourfork/structpe.git
cd structpe
- Create a Branch
git checkout -b my-new-dataset
- Make Your Changes
- Add new files, fix bugs, or implement new features.
- Update or add unit tests in
tests/
.
- Push & Open a Pull Request
git commit -am "Add new dataset for XYZ"
git push origin my-new-dataset
- Then open a Pull Request on GitHub from your fork.
- Review & Merge
- The maintainers will review your PR, offer feedback, and merge once approved.
Struct-Bench uses a registry pattern to easily integrate more datasets. Here’s how:
-
Create a New File
- In
structpe/dataset/
, for example:my_new_dataset.py
. - Define your sample class (
MyNewSample
) and a container class (MyNewDataset
). - Use existing atomic types from
_types.py
or define constraints as needed.
- In
-
Define the context-free grammar (CFG) of the data structure
-
Register the Dataset
- At the end of that file:
from structpe.dataset.registry import register_dataset
register_dataset("my_new_dataset", MyNewDataset)
- (Optional) Provide any
dataset_metric(level=...)
functions to compute custom metrics. - (Optional) If lines in grammar have fields that are logically comparable, define
compute_node_similarities = [("fieldA", "fieldB"), ...]
.
Then run:
structpe evaluate \
--private-dataset-name=my_dataset \
--private-dataset-json=data/my_dataset.json \
--synthetic-data=data/synthetic_dataset.json \
--savedir results_my_dataset
You’ll get a comprehensive JSON summarizing correctness, adjacency, grammar, KNN-based metrics, plus your custom dataset metrics.
Please refer to this link for more details on the dataset evaluation framework.
We adopt the DP Fine-tuning (Yu et al. 2021) and Augmented Private Evolution (Aug-PE) algorithm (Xie et al. 2024) to generate synthetic datasets on graph-structured data (ShareGPT, ICLR reviews), tabular data (Water, Arena, Adult), and attribute controllable data (Reviews, Grounding). We adopt the external library microsoft/dp-transformers for DP Fine-tuning, and microsoft/DPSDA for Aug-PE.
We also implement the Aug-PE algorithm in structpe/generator to generate synthetic attribute controllable data.
Generate DP synthetic text with Aug-PE:
from structpe.generator.generation import run_generation_pipeline
synthetic_texts = run_generation_pipeline(
file_path="data/input.tsv",
file_type="tsv", # or "csv" or "json"
dataset_name="my_dataset",
concurrency=4,
init_count=10,
iterations=3,
endpoint="https://myazureendpoint.openai.azure.com/", # replace to your OpenAI endpoint
deployment="gpt-4"
)
file_path
: Path to your input file (JSON
,CSV
, orTSV
).file_type
: Must be"json"
,"csv"
, or"tsv"
.concurrency
: Number of threads to use for Azure OpenAI calls.init_count
: Initial sample count.iterations
: How many iteration cycles.endpoint
: Your Azure OpenAI endpoint.deployment
: Name of the model deployment (e.g.,"gpt-4"
).
A list of final generated strings.
Please refer to this link for more details on the synthetic data generation.
-
structpe/_types.py
Holds enumerations and atomic range classes (e.g.AtomicRangeInt
) used by multiple datasets. -
structpe/dataset/
Holds each dataset definition (search_dataset.py
,hotel_booking_dataset.py
, etc.) plusregistry.py
for dynamic dataset lookup. -
structpe/descriptor/descriptor.py
Implements reflection-based serialization so that entire dataset objects can be stored as JSON and reconstructed. -
structpe/evaluator/
Contains theEvaluator
class (with JSON output) and supporting classes (LLMJudge
,Verifier
, etc.) for constraint checks, distribution stats, and more. -
structpe/generator/generation.py
Demonstrates how to create synthetic samples from existing dataset descriptions (currently forsearch_query
). -
structpe/run.py
Houses the CLI. Subcommands:list datasets
: Show registered datasetsrun --dataset-name=XYZ
: Instantiate and evaluate a dataset
-
tests/
Contains unit tests such as:test_dataset.py
(checks correctness of dataset classes),test_pipeline.py
(verifies pipeline logic),test_evaluator.py
(tests evaluation output).
Disclaimer: Please expect changes in the framework as we improve it further based on feedback from researchers and practitioners.