Skip to content

flux pipeline inference with controlnet, inpainting, plus ip-adapter #11046

@john09282922

Description

@john09282922

Describe the bug

Hi, I would like to utilize flux pipeline. But for now, I have gpu issues to use origin flux pipeline.
If I would like to use nf4 version, How can I set up the inference file on controlnet, inpainting, ip-adapter?
Do I use Fluxcontrol depth or canny and mask, ip-adapter model? or fluxcontrol, fluxfill, ip-adapter?

Thanks,

@hlky, @sayakpaul

Reproduction

import torch
from diffusers import FluxControlInpaintPipeline
from diffusers.models.transformers import FluxTransformer2DModel
from transformers import T5EncoderModel
from diffusers.utils import load_image, make_image_grid
from image_gen_aux import DepthPreprocessor # https://github.com/huggingface/image_gen_aux
from PIL import Image
import numpy as np

access_token = ""
pipe = FluxControlInpaintPipeline.from_pretrained(
"black-forest-labs/FLUX.1-Depth-dev",
torch_dtype=torch.bfloat16, token=access_token)

use following lines if you have GPU constraints

---------------------------------------------------------------

transformer = FluxTransformer2DModel.from_pretrained(
"sayakpaul/FLUX.1-Depth-dev-nf4", subfolder="transformer", torch_dtype=torch.bfloat16
)
text_encoder_2 = T5EncoderModel.from_pretrained(
"sayakpaul/FLUX.1-Depth-dev-nf4", subfolder="text_encoder_2", torch_dtype=torch.bfloat16
)
pipe.transformer = transformer
pipe.text_encoder_2 = text_encoder_2

pipe.enable_model_cpu_offload()

---------------------------------------------------------------

pipe.to("cuda")

prompt = "a blue robot sad expressions"
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

head_mask = np.zeros_like(image)
head_mask[65:580,300:642] = 255
mask_image = Image.fromarray(head_mask)

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(image)[0].convert("RGB")

output = pipe(
prompt=prompt,
image=image,
control_image=control_image,
mask_image=mask_image,
num_inference_steps=30,
strength=1,
guidance_scale=10.0,
generator=torch.Generator().manual_seed(42),
).images[0]
make_image_grid([image, control_image, mask_image, output.resize(image.size)], rows=1, cols=4).save("output.png")

changing depth to canny, and add ip-adapter?

Logs

System Info

.

Who can help?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions