Skip to content

ovis_image: fix guidance_scale / max_sequence_length / batched CFG / precomputed embeds + add pipeline test#13944

Merged
dg845 merged 5 commits into
huggingface:mainfrom
HaozheZhang6:ovis-pipeline-guidance-scale
Jul 1, 2026
Merged

ovis_image: fix guidance_scale / max_sequence_length / batched CFG / precomputed embeds + add pipeline test#13944
dg845 merged 5 commits into
huggingface:mainfrom
HaozheZhang6:ovis-pipeline-guidance-scale

Conversation

@HaozheZhang6

@HaozheZhang6 HaozheZhang6 commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Addresses items 3, 4, 5, 6 and 7 from the ovis_image review in #13630.

  • Item 3OvisImagePipeline.guidance_scale read an uninitialized self._guidance_scale; it's now set in __call__.
  • Item 6max_sequence_length was validated but never reached the tokenizer; it's now threaded through encode_prompt_get_ovis_prompt_embeds.
  • Item 5 — precomputed prompt_embeds were used as-is; they're now moved to the right device/dtype and repeated for num_images_per_prompt.
  • Item 4 — a single default negative_prompt with batched prompts under CFG raised a shape error; the negative prompt is now expanded to the batch size. Also guarded _get_ovis_prompt_embeds against text_encoder=None.
  • Item 7 — adds tests/pipelines/ovis_image/test_ovis_image.py (the model had no pipeline test) covering inference, the guidance_scale property, max_sequence_length, num_images_per_prompt, and batched CFG.

Item 2 (joint_attention_kwargs) and notes on the full PipelineTesterMixin are in a comment below.

Before submitting

  • Did you use an AI agent (Claude Code, Codex, Cursor, etc.) to help with this PR? If so:
    • Pointed it at .ai/ — read AGENTS.md and review-rules.md.
    • Self-reviewed the diff against .ai/review-rules.md.
  • Did you read the contributor guideline?
  • Was this discussed via a GitHub issue? Yes — ovis_image model/pipeline review #13630.
  • Did you write any new tests? Yes.

Who can review?

@hlky

@github-actions github-actions Bot added size/S PR with diff < 50 LOC pipelines labels Jun 13, 2026
@HaozheZhang6 HaozheZhang6 changed the title Initialize _guidance_scale in OvisImagePipeline.__call__ Add ovis_image pipeline test + initialize guidance_scale Jun 13, 2026
@HaozheZhang6 HaozheZhang6 force-pushed the ovis-pipeline-guidance-scale branch from ce56ae4 to a3f91f1 Compare June 13, 2026 20:29
@github-actions github-actions Bot added tests size/M PR with diff < 200 LOC and removed size/S PR with diff < 50 LOC labels Jun 13, 2026
@hlky

hlky commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

This looks good so far @HaozheZhang6. As this is a new test, I would look at some of the recent test refactors and adopt those newer patterns to avoid needing a refactor later.

For the ovis_image review, item 1 is already covered by the pattern level PR you made. I would say items 2 and 6 are other easy fixes, while 4 and 5 are also fairly straight forward. If you feel confident I'd suggest attempting those too, then you'll completed the full review.

@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

Thanks @hlky! Makes sense — I'll switch the test to the full PipelineTesterMixin using the newer pattern (a realistic tiny Qwen3 text encoder like the DreamLite / NucleusMoE-Image tests, so the standard save/load and dtype/device tests pass out of the box), and take on items 2/4/5/6 so the full review is covered. Will push the updates here.

…precomputed embeds + add pipeline test

Addresses items 3/4/5/6/7 of huggingface#13630.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@HaozheZhang6 HaozheZhang6 changed the title Add ovis_image pipeline test + initialize guidance_scale ovis_image: fix guidance_scale / max_sequence_length / batched CFG / precomputed embeds + add pipeline test Jun 13, 2026
@HaozheZhang6 HaozheZhang6 force-pushed the ovis-pipeline-guidance-scale branch from a3f91f1 to 46b0de9 Compare June 13, 2026 21:15
@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

Pushed items 3/4/5/6 + the test. Two things I'd value your steer on:

  1. Full PipelineTesterMixin — I started toward it but ovis hits two infra issues: the tiny model produces NaN in the offload / save-load comparison tests (DreamLite avoids this with AutoencoderTiny + a realistic Qwen3-VL), and the tokenizer's chat_template is lost on save/load because the pipeline uses a bare tokenizer (no processor component like DreamLite). Happy to mirror the DreamLite fixture exactly if that's the intended approach.
  2. Item 2 (joint_attention_kwargs) — the transformer forward doesn't currently accept attention_kwargs; the blocks already do, so it looks like a matter of threading it through forward → blocks and passing self.joint_attention_kwargs from the pipeline. I'll do that next unless you had a different shape in mind.

- thread joint_attention_kwargs through the transformer forward + blocks
  (item 2) and pass it from the pipeline's transformer calls.
- encode_prompt now returns both positive and negative embeds (the z_image /
  PixArt convention) so precomputed embeds work end-to-end and the prompt is
  encoded in a single call.
- switch the pipeline test to the full PipelineTesterMixin.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added models size/L PR with diff > 200 LOC and removed size/M PR with diff < 200 LOC labels Jun 13, 2026
@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

Done — items 2/4/5/6 are in, and the test is now on the full PipelineTesterMixin (passes locally, plus a couple of ovis-specific checks).

Heads up on one design choice: to get test_encode_prompt_works_in_isolation passing I changed encode_prompt to encode and return both the positive and negative embeds in a single call (the z_image / PixArt convention) instead of calling it twice. The negative path already accepted precomputed embeds — this just returns them so the standard test can pick them up. If you'd rather keep encode_prompt single-prompt, I'll revert that and skip the one test instead.

item 2 is threaded through forward with @apply_lora_scale, matching flux.

@hlky

hlky commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

@HaozheZhang6 LGTM, nice work.

For test_encode_prompt_works_in_isolation/encode_prompt I think returning both is the right choice, it's established in more popular model/pipelines and required for the test itself. The change for item 2 is also the correct approach and @apply_lora_scale is a thoughtful extra touch.

As a general rule I would say if a pattern is present in a more popular model/pipeline it's usually a good indication that it's the preferred approach.

At this point I think you are ready to request a review from a maintainer for approval, I would expect the tests may have some requested changes because of the recent refactors we mentioned, I haven't followed those changes exactly so I can't be more specific, my suggestion would be to await the review before making any further changes.

@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

Thanks @hlky — appreciate you walking me through this one. I'll request a maintainer review and hold off on any further changes until then.

@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

@DN6 — per @hlky's suggestion above, would you have a chance to review this for approval? You did the recent Ovis transformer refactor (#13347), so you've probably got the most context here. CI is green and the full PipelineTesterMixin passes.

@github-actions

Copy link
Copy Markdown
Contributor

Hi @HaozheZhang6, thanks for the PR! It does not appear to link an issue it fixes. If this PR addresses an existing issue, please add a closing keyword (e.g. Fixes #1234) to the PR description so the issue is linked. See the contribution guide for more details. If this PR intentionally does not fix a tracked issue, a maintainer can add the no-issue-needed label to silence this reminder.

@HaozheZhang6

Copy link
Copy Markdown
Contributor Author

@DN6 friendly re-ping -- hlky LGTM'd this and pointed to you for the final review given your Ovis transformer refactor (#13347). CI is green and the full PipelineTesterMixin passes. No rush.

@HaozheZhang6

HaozheZhang6 commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @hlky — incorporated the review feedback. @sayakpaul would you have a moment to take a look for approval? Happy to update the tests for the recent refactors once a maintainer flags the specifics.

@sayakpaul sayakpaul requested a review from dg845 June 30, 2026 17:34
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment thread src/diffusers/pipelines/ovis_image/pipeline_ovis_image.py
max_sequence_length=max_sequence_length,
)
else:
dtype = self.text_encoder.dtype if self.text_encoder is not None else self.transformer.dtype

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since we always need dtype for text_ids below, maybe we can define it once at the top of _prepare_prompt_embeds?

@dg845 dg845 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left some small comments.

HaozheZhang6 and others added 2 commits July 1, 2026 01:43
The pipeline computes max_length inline from max_sequence_length, so the attribute is dead. Addresses review feedback.
@dg845 dg845 merged commit 1ffa423 into huggingface:main Jul 1, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants