Mumoda is an inference and finetuning library using Multi-Modality learning. It already contains some SOTA contractive learning model with multiple modalites. Such as CLIP, DeCLIP, Coca etc.
The main target of mumoda is using them for downstream tasks, for instance, text-image, image-caption, or even for object detection etc. To see the big potential of these models transfer to new tasks.
Features mumoda supported now:
- CLIP as API (used for quickly encode text, encode image to latent space etc.);
- DeCLIP as API;
- Guided Diffusion model support;
More features to come, just start and fork mumoda !
mumoda is clean and easy to use, it wasn't a iPython notebook for newbies, it was a elegant and powerful lib for researchers. But if you are not interested anything else just want to see the magic, just:
pip install -r requirements.txt
python demo.py
Be note: if you run CLIP with Diffusion by default, you need at least 12GB GPU memory.
there pictures are generated with CLIP and Guided-Difussion:
A longly dog stand in the rain with a train along side:
All models are download from third-party repo. Might add training support in the future.
All rights belongs to Lucas Jin 2022.