DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination

Abstract: Recent text-to-image (T2I) generative models allow for high-quality synthesis following either text instructions or visual examples. Despite their capabilities, these models face limitations in creating new, detailed creatures within specific categories (e.g., virtual dog or bird species), which are valuable in digital asset creation and biodiversity analysis. To bridge this gap, we introduce a novel task, Virtual Creatures Generation: Given a set of unlabeled images of the target concepts (e.g., 200 bird species), we aim to train a T2I model capable of creating new, hybrid concepts within diverse backgrounds and contexts. We propose a new method called DreamCreature, which identifies and extracts the underlying sub-concepts (e.g., body parts of a specific species) in an unsupervised manner. The T2I thus adapts to generate novel concepts (e.g., new bird species) with faithful structures and photorealistic appearance by seamlessly and flexibly composing learned sub-concepts. To enhance sub-concept fidelity and disentanglement, we extend the textual inversion technique by incorporating an additional projector and tailored attention loss regularization. Extensive experiments on two fine-grained image benchmarks demonstrate the superiority of DreamCreature over prior art alternatives in both qualitative and quantitative evaluation. Ultimately, the learned sub-concepts facilitate diverse creative applications, including innovative consumer product designs and nuanced property modifications.

Methodology

Overview of our DreamCreature. (Left) Discovering sub-concepts within a semantic hierarchy involves partitioning each image into distinct parts and forming semantic clusters across unlabeled training data. (Right) These clusters are organized into a dictionary, and their semantic embeddings are learned through a textual inversion approach. For instance, a text description like a photo of a [Head,42] [Wing,87]... guides the optimization of the corresponding textual embedding by reconstructing the associated image. To promote disentanglement among learned concepts, we minimize a specially designed attention loss, denoted as $\mathcal{L}_{attn}$.

Mixing sub-concepts

Integrating a specific sub-concept (e.g., body, head, or even background) of a source concept B to the target concept A.

Our results

Mixing 4 different species:

More examples;

Creative generation:

Usage

A demo is available on the kamwoh/dreamcreature Hugging Face Space. (Very very slow due to CPU only)
You can run the demo on a Colab: .
You can use the gradio demo locally by running python app.py or gradio_demo_cub200.py or gradio_demo_dog.py in the src folder.

Training

Check out train_kmeans_segmentation.ipynb to obtain a DINO-based KMeans Segmentation that can obtain the "parts"/" sub-concepts". This is to obtain the "attention mask" used during the training.
Assuming no labels, we can also use the kmeans labels as a supervision, otherwise we can use the supervised labels ( such as ground-truth class) as we can obtain higher quality of reconstruction.
Check out run_sd_sup.sh or run_sd_unsup.sh for training. All hyperparameters in these scripts are used in the paper.
SDXL version also available (see run_sdxl_sup.sh) but due to resource limitation, we cannot efficiently train a model, hence we do not have a pre-trained model on SDXL.

Citation

@misc{ng2023dreamcreature,
      title={DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination},
      author={Kam Woh Ng and Xiatian Zhu and Yi-Zhe Song and Tao Xiang},
      year={2023},
      eprint={2311.15477},
      archivePrefix={arXiv},
      primaryClass={cs.CV}}

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
docs		docs
src		src
.gitignore		.gitignore
README.md		README.md
dreamcreature_gradio.ipynb		dreamcreature_gradio.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

src

src

.gitignore

.gitignore

README.md

README.md

dreamcreature_gradio.ipynb

dreamcreature_gradio.ipynb

requirements.txt

requirements.txt

Repository files navigation

DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination

Methodology

Mixing sub-concepts

Our results

Usage

Training

Citation

About

Releases

Packages

Contributors 2

Languages

kamwoh/dreamcreature

Folders and files

Latest commit

History

Repository files navigation

DreamCreature: Crafting Photorealistic Virtual Creatures from Imagination

Methodology

Mixing sub-concepts

Our results

Usage

Training

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages