flux pipeline inference with controlnet, inpainting, plus ip-adapter

### Describe the bug

Hi, I would like to utilize flux pipeline. But for now, I have gpu issues to use origin flux pipeline.
If I would like to use nf4 version, How can I set up the inference file on controlnet, inpainting, ip-adapter? 
Do I use Fluxcontrol depth or canny and mask, ip-adapter model? or fluxcontrol, fluxfill, ip-adapter?

Thanks,

@hlky, @sayakpaul 

### Reproduction

import torch
from diffusers import FluxControlInpaintPipeline
from diffusers.models.transformers import FluxTransformer2DModel
from transformers import T5EncoderModel
from diffusers.utils import load_image, make_image_grid
from image_gen_aux import DepthPreprocessor # https://github.com/huggingface/image_gen_aux
from PIL import Image
import numpy as np


access_token = ""
pipe = FluxControlInpaintPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Depth-dev",
    torch_dtype=torch.bfloat16, token=access_token)

# use following lines if you have GPU constraints
# ---------------------------------------------------------------
transformer = FluxTransformer2DModel.from_pretrained(
    "sayakpaul/FLUX.1-Depth-dev-nf4", subfolder="transformer", torch_dtype=torch.bfloat16
)
text_encoder_2 = T5EncoderModel.from_pretrained(
    "sayakpaul/FLUX.1-Depth-dev-nf4", subfolder="text_encoder_2", torch_dtype=torch.bfloat16
)
pipe.transformer = transformer
pipe.text_encoder_2 = text_encoder_2


pipe.enable_model_cpu_offload()

# ---------------------------------------------------------------
pipe.to("cuda")

prompt = "a blue robot sad expressions"
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")

head_mask = np.zeros_like(image)
head_mask[65:580,300:642] = 255
mask_image = Image.fromarray(head_mask)

processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(image)[0].convert("RGB")

output = pipe(
    prompt=prompt,
    image=image,
    control_image=control_image,
    mask_image=mask_image,
    num_inference_steps=30,
    strength=1,
    guidance_scale=10.0,
    generator=torch.Generator().manual_seed(42),
).images[0]
make_image_grid([image, control_image, mask_image, output.resize(image.size)], rows=1, cols=4).save("output.png")

changing depth to canny, and add ip-adapter? 

### Logs

```shell

```

### System Info

.

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

flux pipeline inference with controlnet, inpainting, plus ip-adapter #11046

Describe the bug

Reproduction

use following lines if you have GPU constraints

---------------------------------------------------------------

---------------------------------------------------------------

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

flux pipeline inference with controlnet, inpainting, plus ip-adapter #11046

Description

Describe the bug

Reproduction

use following lines if you have GPU constraints

---------------------------------------------------------------

---------------------------------------------------------------

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions