CIA is a Python library for synthetic data augmentation using Stable Diffusion + ControlNet. Generate high-quality synthetic images from real seed images, evaluate their quality, and use them to improve downstream ML models.
- Synthetic image generation using Stable Diffusion controlled by Canny edges, OpenPose, Segmentation, or MediaPipe face features
- Quality metrics -- Frechet Inception Distance (FID), Inception Score (IS), Mahalanobis distance
- Quality-based filtering -- keep only the best synthetic images via top-k, top-p, or threshold filtering
- Auto-captioning -- generate image captions using OpenAI or Ollama vision models
- Multiple interfaces -- Python API, CLI, and Hydra config
Run CIA in your browser with Google Colab: no installation required. Open the Quickstart notebook to generate, evaluate, and filter synthetic images in under 15 minutes.
pip install ciagenWith optional dependencies:
pip install ciagen[captioning] # OpenAI/Ollama auto-captioning
pip install ciagen[training] # YOLO/classifier training
pip install ciagen[datasets] # COCO, Flickr30K, FER, MOCS datasets
pip install ciagen[all] # Everythinggit clone https://github.com/fennecinspace/ciagen.git
cd ciagen
pip install -e ".[all]"./run_and_build_docker_file.sh nvidia
docker exec -it ciagen zshfrom ciagen import generate, evaluate, filter_generated
# Generate synthetic images
result = generate(
source="data/real/train/images/",
output="data/generated/",
extractor="canny",
sd_model="fennecinspace/sd-v15",
cn_model="lllyasviel/sd-controlnet-canny",
num_per_image=3,
prompt="a person walking in a park",
seed=42,
device="cuda",
)
print(f"Generated {result['total_generated']} images")
# Evaluate quality
scores = evaluate(
real="data/real/train/images/",
generated="data/generated/",
metrics=["fid", "mld"],
feature_extractor="vit",
)
print(f"FID: {scores['dtd']['fid']}")
# Filter to keep the best images
kept = filter_generated(
generated="data/generated/",
method="top-k",
value=100,
)# Generate images
ciagen generate \
--source data/real/train/images/ \
--output data/generated/ \
--extractor canny \
--sd-model fennecinspace/sd-v15 \
--cn-model lllyasviel/sd-controlnet-canny \
--num 3 \
--prompt "a person walking"
# Evaluate quality
ciagen evaluate \
--real data/real/train/images/ \
--generated data/generated/ \
--metrics fid mld
# Filter generated images
ciagen filter \
--generated data/generated/ \
--method top-k \
--value 100
# Auto-caption images
ciagen caption \
--images data/real/train/images/ \
--output data/real/train/captions/ \
--engine ollama \
--model llavapython run.py task=gen model.cn_use=lllyasviel_canny prompt.base="a person"
python run.py task=dtd
python run.py task=ptd
python run.py task=filtering
python run.py task=mix
python run.py task=trainSee ciagen/conf/config.yaml for all configuration options.
The recommended workflow:
real images ──► condition extraction ──► SD + ControlNet ──► synthetic images
│
real images ──────────────────────────────────────────────► evaluate ──► filter ──► mix ──► train
- Generate -- Extract a control condition (edges, pose, segmentation) from each real image, then generate synthetic variations using Stable Diffusion + ControlNet
- Evaluate -- Compute distribution-level metrics (FID, IS) and per-image metrics (Mahalanobis distance)
- Filter -- Select the best synthetic images based on quality scores
- Mix -- Combine real and filtered synthetic data into a training dataset
- Train -- Train your downstream model (YOLOv8 for detection, InceptionV3 for classification)
| Extractor | Description | Use Case |
|---|---|---|
canny |
Canny edge detection | General purpose, preserves structure |
openpose |
Human pose estimation | People, actions, body pose |
segmentation |
YOLOv8 semantic segmentation | Object boundaries |
mediapipe_face |
MediaPipe face landmarks | Facial emotion, face generation |
| Metric | Type | Description |
|---|---|---|
fid |
Distribution-to-Distribution | Frechet Inception Distance -- lower is better |
inception_score |
Distribution-to-Distribution | Inception Score -- higher is better |
mld |
Point-to-Distribution | Mahalanobis distance -- per-image, lower is better |
data/
├── real/{dataset}/
│ ├── train/{images,labels,captions}/
│ ├── val/{images,labels,captions}/
│ └── test/{images,labels,captions}/
├── generated/{dataset}/{controlnet-model}/
│ ├── metadata.yaml
│ └── *.png
└── mixed/{dataset}/
python run.py task=prepare_data data.base=coco # COCO People
python run.py task=prepare_data data.base=flickr30k # Flickr30K Entities
python run.py task=prepare_data data.base=fer # Facial Emotion Recognition
python run.py task=prepare_data data.base=mocs # Construction SitesFull documentation is available in the docs/ directory and can be built with MkDocs:
pip install mkdocs-material mkdocstrings[python]
mkdocs serveSee CONTRIBUTING.md for development setup, code style, and PR guidelines.
This project is licensed under the GNU Affero General Public License v3.
Copyright (c) 2026 Universite de Mons, Multitel, Universite Libre de Bruxelles, Universite Catholique de Louvain.