This project utilizes the VQGAN+CLIP image generation model, complemented by ESRGAN's upscaling capabilities, to create high-quality images. The VQGAN+CLIP image generation model is a powerful combination of two separate models that work together to create stunning images. VQGAN (Vector Quantized Generative Adversarial Network) is responsible for generating high-quality images by mapping a low-dimensional noise vector to an image space through the use of vector quantization. On the other hand, CLIP (Contrastive Language Image Pre-Training) provides a way to match the generated images with the original prompts or text descriptions used as input.
Together, VQGAN and CLIP form a symbiotic relationship where the strengths of one complement the other, resulting in highly detailed and accurate images. The addition of ESRGAN for upscaling further enhances the quality of the final output, ensuring that the small details are preserved even after increasing the resolution.
To install the necessary dependencies, follow the below steps:
-
Install the
environment.yml
file with Anaconda. -
Clone the
Real-ESRGAN
,CLIP
andtaming-transformers
git repositories by running the following commands:git clone https://github.com/sberbank-ai/Real-ESRGAN git clone https://github.com/openai/CLIP.git git clone https://github.com/CompVis/taming-transformers.git
-
Download the
VQGAN
model by going into thetaming-transformers
directory and running the following commands:mkdir -p checkpoints wget 'https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fckpts%2Flast.ckpt&dl=1' -O 'checkpoints/vqgan_imagenet_f16_16384.ckpt' mkdir checkpoints wget 'https://heibox.uni-heidelberg.de/d/a7530b09fed84f80a887/files/?p=%2Fconfigs%2Fmodel.yaml&dl=1' -O 'checkpoints/vqgan_imagenet_f16_16384.yaml'
-
Download the
RealESRGAN
model by going into theRealESRGAN
directory and running the following commands:gdown https://drive.google.com/uc?id=1SGHdZAln4en65_NQeQY9UjchtkEF9f5F -O weights/RealESRGAN_x4.pth &> /dev/null
-
Run the
generate_image.py
file or thegen.ipynb
notebook to generate images based on your input.
This project was inspired by Katherine Crowson's original implementation of the VQGAN+CLIP model.
This project is licensed under the MIT License - see the LICENSE.md file for details.