GitHub - stjordanis/be_great: A novel approach for synthesizing tabular data using pretrained large language models

Generation of Realistic Tabular data
with pretrained Transformer-based language models

Our GReaT framework utilizes the capabilities of pretrained large language Transformer models to synthesize realistic tabular data. New samples are generated with just a few lines of code, following an easy-to-use API. Please see our publication for more details.

GReaT Installation

The GReaT framework can be easily installed using with pip - requires a Python version >= 3.9:

pip install be-great

GReaT Quickstart

In the example below, we show how the GReaT approach is used to generate synthetic tabular data for the California Housing dataset.

from be_great import GReaT
from sklearn.datasets import fetch_california_housing

data = fetch_california_housing(as_frame=True).frame

model = GReaT(llm='distilgpt2', epochs=50)
model.fit(data)
synthetic_data = model.sample(n_samples=100)

GReaT Citation

If you use GReaT, please link or cite our work:

@article{borisov2022language,
  title={Language Models are Realistic Tabular Data Generators},
  author={Borisov, Vadim and Se{\ss}ler, Kathrin and Leemann, Tobias and Pawelczyk, Martin and Kasneci, Gjergji},
  journal={arXiv preprint arXiv:2210.06280},
  year={2022}
}

GReaT Acknowledgements

We sincerely thank the HuggingFace 🤗 framework.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
be_great.egg-info		be_great.egg-info
be_great		be_great
dist		dist
docs		docs
examples		examples
imgs		imgs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

be_great.egg-info

be_great.egg-info

be_great

be_great

dist

dist

docs

docs

examples

examples

imgs

imgs

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

mkdocs.yml

mkdocs.yml

pyproject.toml

pyproject.toml

requirements.txt

requirements.txt

Repository files navigation

GReaT Installation

GReaT Quickstart

GReaT Citation

GReaT Acknowledgements

About

Releases

Packages

Languages

License

stjordanis/be_great

Folders and files

Latest commit

History

Repository files navigation

GReaT Installation

GReaT Quickstart

GReaT Citation

GReaT Acknowledgements

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages