ovis_image: fix guidance_scale / max_sequence_length / batched CFG / precomputed embeds + add pipeline test by HaozheZhang6 · Pull Request #13944 · huggingface/diffusers

HaozheZhang6 · 2026-06-13T19:32:49Z

What does this PR do?

Addresses items 3, 4, 5, 6 and 7 from the ovis_image review in #13630.

Item 3 — OvisImagePipeline.guidance_scale read an uninitialized self._guidance_scale; it's now set in __call__.
Item 6 — max_sequence_length was validated but never reached the tokenizer; it's now threaded through encode_prompt → _get_ovis_prompt_embeds.
Item 5 — precomputed prompt_embeds were used as-is; they're now moved to the right device/dtype and repeated for num_images_per_prompt.
Item 4 — a single default negative_prompt with batched prompts under CFG raised a shape error; the negative prompt is now expanded to the batch size. Also guarded _get_ovis_prompt_embeds against text_encoder=None.
Item 7 — adds tests/pipelines/ovis_image/test_ovis_image.py (the model had no pipeline test) covering inference, the guidance_scale property, max_sequence_length, num_images_per_prompt, and batched CFG.

Item 2 (joint_attention_kwargs) and notes on the full PipelineTesterMixin are in a comment below.

Before submitting

Did you use an AI agent (Claude Code, Codex, Cursor, etc.) to help with this PR? If so:
- Pointed it at .ai/ — read AGENTS.md and review-rules.md.
- Self-reviewed the diff against .ai/review-rules.md.
Did you read the contributor guideline?
Was this discussed via a GitHub issue? Yes — ovis_image model/pipeline review #13630.
Did you write any new tests? Yes.

Who can review?

@hlky

hlky · 2026-06-13T20:54:36Z

This looks good so far @HaozheZhang6. As this is a new test, I would look at some of the recent test refactors and adopt those newer patterns to avoid needing a refactor later.

For the ovis_image review, item 1 is already covered by the pattern level PR you made. I would say items 2 and 6 are other easy fixes, while 4 and 5 are also fairly straight forward. If you feel confident I'd suggest attempting those too, then you'll completed the full review.

HaozheZhang6 · 2026-06-13T21:01:52Z

Thanks @hlky! Makes sense — I'll switch the test to the full PipelineTesterMixin using the newer pattern (a realistic tiny Qwen3 text encoder like the DreamLite / NucleusMoE-Image tests, so the standard save/load and dtype/device tests pass out of the box), and take on items 2/4/5/6 so the full review is covered. Will push the updates here.

…precomputed embeds + add pipeline test Addresses items 3/4/5/6/7 of huggingface#13630. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

HaozheZhang6 · 2026-06-13T21:15:47Z

Pushed items 3/4/5/6 + the test. Two things I'd value your steer on:

Full PipelineTesterMixin — I started toward it but ovis hits two infra issues: the tiny model produces NaN in the offload / save-load comparison tests (DreamLite avoids this with AutoencoderTiny + a realistic Qwen3-VL), and the tokenizer's chat_template is lost on save/load because the pipeline uses a bare tokenizer (no processor component like DreamLite). Happy to mirror the DreamLite fixture exactly if that's the intended approach.
Item 2 (joint_attention_kwargs) — the transformer forward doesn't currently accept attention_kwargs; the blocks already do, so it looks like a matter of threading it through forward → blocks and passing self.joint_attention_kwargs from the pipeline. I'll do that next unless you had a different shape in mind.

- thread joint_attention_kwargs through the transformer forward + blocks (item 2) and pass it from the pipeline's transformer calls. - encode_prompt now returns both positive and negative embeds (the z_image / PixArt convention) so precomputed embeds work end-to-end and the prompt is encoded in a single call. - switch the pipeline test to the full PipelineTesterMixin. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

HaozheZhang6 · 2026-06-13T21:42:58Z

Done — items 2/4/5/6 are in, and the test is now on the full PipelineTesterMixin (passes locally, plus a couple of ovis-specific checks).

Heads up on one design choice: to get test_encode_prompt_works_in_isolation passing I changed encode_prompt to encode and return both the positive and negative embeds in a single call (the z_image / PixArt convention) instead of calling it twice. The negative path already accepted precomputed embeds — this just returns them so the standard test can pick them up. If you'd rather keep encode_prompt single-prompt, I'll revert that and skip the one test instead.

item 2 is threaded through forward with @apply_lora_scale, matching flux.

hlky · 2026-06-13T22:04:00Z

@HaozheZhang6 LGTM, nice work.

For test_encode_prompt_works_in_isolation/encode_prompt I think returning both is the right choice, it's established in more popular model/pipelines and required for the test itself. The change for item 2 is also the correct approach and @apply_lora_scale is a thoughtful extra touch.

As a general rule I would say if a pattern is present in a more popular model/pipeline it's usually a good indication that it's the preferred approach.

At this point I think you are ready to request a review from a maintainer for approval, I would expect the tests may have some requested changes because of the recent refactors we mentioned, I haven't followed those changes exactly so I can't be more specific, my suggestion would be to await the review before making any further changes.

HaozheZhang6 · 2026-06-13T22:06:36Z

Thanks @hlky — appreciate you walking me through this one. I'll request a maintainer review and hold off on any further changes until then.

HaozheZhang6 · 2026-06-13T22:07:33Z

@DN6 — per @hlky's suggestion above, would you have a chance to review this for approval? You did the recent Ovis transformer refactor (#13347), so you've probably got the most context here. CI is green and the full PipelineTesterMixin passes.

github-actions · 2026-06-14T08:45:28Z

Hi @HaozheZhang6, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. Fixes #1234) to the PR description so the issue is linked. See the contribution guide for more details. If this PR intentionally does not fix a tracked issue, a maintainer can add the no-issue-needed label to silence this reminder.

HaozheZhang6 · 2026-06-23T03:47:33Z

@DN6 friendly re-ping -- hlky LGTM'd this and pointed to you for the final review given your Ovis transformer refactor (#13347). CI is green and the full PipelineTesterMixin passes. No rush.

HaozheZhang6 · 2026-06-30T03:21:11Z

Thanks @hlky — incorporated the review feedback. @sayakpaul would you have a moment to take a look for approval? Happy to update the tests for the recent refactors once a maintainer flags the specifics.

HuggingFaceDocBuilderDev · 2026-07-01T05:08:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

dg845 · 2026-07-01T06:23:16Z

+                max_sequence_length=max_sequence_length,
+            )
+        else:
+            dtype = self.text_encoder.dtype if self.text_encoder is not None else self.transformer.dtype


nit: since we always need dtype for text_ids below, maybe we can define it once at the top of _prepare_prompt_embeds?

dg845

Thanks for the PR! Left some small comments.

The pipeline computes max_length inline from max_sequence_length, so the attribute is dead. Addresses review feedback.

github-actions Bot added size/S PR with diff < 50 LOC pipelines labels Jun 13, 2026

HaozheZhang6 changed the title ~~Initialize _guidance_scale in OvisImagePipeline.__call__~~ Add ovis_image pipeline test + initialize guidance_scale Jun 13, 2026

HaozheZhang6 force-pushed the ovis-pipeline-guidance-scale branch from ce56ae4 to a3f91f1 Compare June 13, 2026 20:29

github-actions Bot added tests size/M PR with diff < 200 LOC and removed size/S PR with diff < 50 LOC labels Jun 13, 2026

ovis_image: fix guidance_scale / max_sequence_length / batched-CFG / …

46b0de9

…precomputed embeds + add pipeline test Addresses items 3/4/5/6/7 of huggingface#13630. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

HaozheZhang6 changed the title ~~Add ovis_image pipeline test + initialize guidance_scale~~ ovis_image: fix guidance_scale / max_sequence_length / batched CFG / precomputed embeds + add pipeline test Jun 13, 2026

HaozheZhang6 force-pushed the ovis-pipeline-guidance-scale branch from a3f91f1 to 46b0de9 Compare June 13, 2026 21:15

github-actions Bot added models size/L PR with diff > 200 LOC and removed size/M PR with diff < 200 LOC labels Jun 13, 2026

sayakpaul requested a review from dg845 June 30, 2026 17:34

Merge branch 'main' into ovis-pipeline-guidance-scale

1b91c36

dg845 reviewed Jul 1, 2026

View reviewed changes

Comment thread src/diffusers/pipelines/ovis_image/pipeline_ovis_image.py

dg845 reviewed Jul 1, 2026

View reviewed changes

dg845 approved these changes Jul 1, 2026

View reviewed changes

HaozheZhang6 and others added 2 commits July 1, 2026 01:43

ovis_image: remove now-unused tokenizer_max_length

4d67006

The pipeline computes max_length inline from max_sequence_length, so the attribute is dead. Addresses review feedback.

Merge branch 'main' into ovis-pipeline-guidance-scale

e042b78

dg845 merged commit 1ffa423 into huggingface:main Jul 1, 2026
15 checks passed

Uh oh!

Conversation

HaozheZhang6 commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

hlky commented Jun 13, 2026

Uh oh!

HaozheZhang6 commented Jun 13, 2026

Uh oh!

HaozheZhang6 commented Jun 13, 2026

Uh oh!

HaozheZhang6 commented Jun 13, 2026

Uh oh!

hlky commented Jun 13, 2026

Uh oh!

HaozheZhang6 commented Jun 13, 2026

Uh oh!

HaozheZhang6 commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

HaozheZhang6 commented Jun 23, 2026

Uh oh!

HaozheZhang6 commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 1, 2026

Uh oh!

Uh oh!

dg845 Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

dg845 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HaozheZhang6 commented Jun 13, 2026 •

edited

Loading

HaozheZhang6 commented Jun 30, 2026 •

edited

Loading