Generate visually similar synthetic images using image captions as prompts to an image generation model.
Some possible use-cases are:
- Generate synthetic images/datasets of objects or concepts that are not known (i.e. don't know the prompt)
- Could be a form of art therapy, you give the image as an artist, get to see more images of the same kind
Given an input image, a text
is generated which describes the image using an Image-To-Text model. Next, this text
is used as a prompt to synthesize new images
(hopefully similar to the input!) by using a Text-To-Image generative model. See flow chart below.
flowchart TD
A(Input Image) --> B[Image-To-Text Model];
B -->|text as prompt| C[Text-To-Image Model];
C --> D(Synthetic Image);
Some things I might try:
- Prompt blending
- Morph between multiple image inputs
- https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb
- https://huggingface.co/nlpconnect/vit-gpt2-image-captioning
Whatever licenses the pretrained models have.