Magic Mix

Implementation of the [MagicMix: Semantic Mixing with Diffusion Models](https://arxiv.org/abs/2210.16056) paper. This is a Diffusion Pipeline for semantic mixing of an image and a text prompt to create a new concept while preserving the spatial layout and geometry of the subject in the image. The pipeline takes an image that provides the layout semantics and a prompt that provides the content semantics for the mixing process.

There are 3 parameters for the method:

`mix_factor`: It is the interpolation constant used in the layout generation phase. The greater the value of mix_factor, the greater the influence of the prompt on the layout generation process.

`kmax` and `kmin`: These determine the range for the layout and content generation process. A higher value of kmax results in loss of more information about the layout of the original image and a higher value of kmin results in more steps for content generation process. This script was contributed by [Partho Das](https://github.com/daspartho) and notebook by [Parag Ekbote](https://github.com/ParagEkbote).

For additional examples, check out this [demo notebook](https://github.com/daspartho/MagicMix/blob/main/demo.ipynb)

In [1]:
pip install diffusers pillow transformers accelerate

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [1]:
import requests
from diffusers import DiffusionPipeline, DDIMScheduler
from PIL import Image
from io import BytesIO

pipe = DiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4",
    custom_pipeline="magic_mix",
    scheduler=DDIMScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler"),
).to('cuda')

# Load image from URL correctly
url = "https://user-images.githubusercontent.com/59410571/209578593-141467c7-d831-4792-8b9a-b17dc5e47816.jpg"
response = requests.get(url)
image = Image.open(BytesIO(response.content)).convert("RGB")  # Convert to RGB to avoid issues
mix_img = pipe(
    image,
    prompt='bed',
    kmin=0.3,
    kmax=0.5,
    mix_factor=0.5,
    )
mix_img.save('phone_bed_mix.jpg')

Loading pipeline components...:   0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]