Skip to content

Add Ernie-Image modular pipeline#13498

Open
akshan-main wants to merge 1 commit intohuggingface:mainfrom
akshan-main:modular-ernie
Open

Add Ernie-Image modular pipeline#13498
akshan-main wants to merge 1 commit intohuggingface:mainfrom
akshan-main:modular-ernie

Conversation

@akshan-main
Copy link
Copy Markdown
Contributor

What does this PR do?

Adds the modular pipeline for ErnieImage (ErnieImageAutoBlocks + ErnieImageModularPipeline).

Parity verified on A100, bf16, 50 steps, 1024x1024 with baidu/ERNIE-Image:

  • MAD (AutoBlocks vs standard): 0.000033
  • Max absolute diff: 0.58 (out of 255)

Colab Notebook: https://colab.research.google.com/gist/akshan-main/f25801763d573209464d6bfd685d708e/modular-ernie-image.ipynb

Addresses #13389 (comment).

Before submitting

Who can review?

@yiyixuxu @sayakpaul

@sayakpaul
Copy link
Copy Markdown
Member

Maybe it could live as a custom pipeline on the Hub like https://huggingface.co/krea/krea-realtime-video/tree/main?

@akshan-main
Copy link
Copy Markdown
Contributor Author

@sayakpaul @yiyixuxu can do the krea pattern by keeping a minimal ErnieImageModularPipeline subclass + MODULAR_PIPELINE_MAPPING entry in diffusers. Want me to restructure the PR that way?

@sayakpaul
Copy link
Copy Markdown
Member

Doing it the Krea way wouldn't require any changes to core Diffusers no?

@akshan-main
Copy link
Copy Markdown
Contributor Author

you were right, the fully hub-only pattern works. moved everything to https://huggingface.co/akshan-main/ernie-image-modular and it loads end-to-end with zero diffusers changes:

from diffusers.modular_pipelines import ModularPipeline
pipe = ModularPipeline.from_pretrained("akshan-main/ernie-image-modular", trust_remote_code=True)
pipe.load_components(trust_remote_code=True)

No custom ErnieImageModularPipeline class needed.

inlined the pipeline properties (vae_scale_factor, num_channels_latents, text_in_dim) into the blocks via direct components.vae.config.* / components.transformer.config.* reads.

Should I close this pr @sayakpaul

@akshan-main
Copy link
Copy Markdown
Contributor Author

hey @sayakpaul, over the last few weeks, I've profiled QwenImage and QwenImageEdit to identify cudaStreamSynchronize calls causing per-step latency, which led to the QwenImage RoPE sync fix (#13406, merged), plus modular pipelines for LTX (#13378, merged) and HunyuanVideo 1.5 (#13389, merged), and an HV1.5 I2V bug fix (#13439, merged). I've also published modular upscale hub blocks for SDXL, Flux1, and Z-Image, and just pushed an Ernie-Image modular hub repo. Would this qualify me for MVP recognition, or is there more I should do to get there? happy to keep contributing either way, just wanted to check.

@yiyixuxu
Copy link
Copy Markdown
Collaborator

@sayakpaul @akshan-main

ooh but I'd really love to have official modular support for ERNIE-Image — it's a really good model, trained from scratch, and the Baidu team is committed to releasing more checkpoints and building their own community around it

I think we can release official blocks for text-to-image, image-to-image, and edit pipelines (not yet released) and encourage the community to build more custom stuff on hub using our official blocks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants