support LongCat-Video by Artiprocher · Pull Request #1018 · modelscope/DiffSynth-Studio

Artiprocher · 2025-10-30T05:42:23Z

No description provided.

gemini-code-assist · 2025-10-30T05:42:41Z

Summary of Changes

Hello @Artiprocher, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the video generation framework by adding full support for the LongCat-Video model. This integration allows users to leverage advanced text-to-video, image-to-video, and video continuation features. The changes encompass the addition of the model's core architecture, its seamless integration into the existing pipeline, and the provision of detailed examples for both inference and training, ensuring a robust and user-friendly experience.

Highlights

New Model Support: Introduced comprehensive support for the meituan-longcat/LongCat-Video model, expanding the framework's capabilities to include text-to-video, image-to-video, and video continuation.
Core Model Implementation: Added the LongCatVideoTransformer3DModel with its associated components, such as RMSNorm_FP32, RotaryPositionalEmbedding, and custom attention mechanisms, forming the backbone of the new video generation capabilities.
Pipeline Integration: Integrated the LongCat-Video model into the video pipeline, including a new processing unit (WanVideoUnit_LongCatVideo) and a dedicated model function (model_fn_longcat_video) to handle its specific inference logic.
VRAM Management Enhancements: Updated the VRAM management system to properly handle the new LayerNorm_FP32 and RMSNorm_FP32 layers introduced by the LongCat-Video model, ensuring efficient memory usage.
Documentation and Examples: Updated README.md and README_zh.md to reflect the new model support and provided new example scripts for LongCat-Video inference, full training, LoRA training, and validation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the LongCat-Video model, including its architecture, pipeline integration, and example scripts for inference and training. The changes are extensive and well-structured. I've identified a few critical issues in the pipeline integration that could lead to incorrect behavior, particularly with batching and the model's output interpretation. There's also a medium-severity issue regarding unsafe tensor initialization in the model's implementation. After addressing these points, the PR should be in good shape.

gemini-code-assist · 2025-10-30T05:45:11Z

+        num_cond_latents = longcat_latents.shape[2]
+    else:
+        num_cond_latents = 0
+    context = context.unsqueeze(0)


The use of context.unsqueeze(0) appears to be incorrect and will likely cause issues with batch sizes greater than 1 (e.g., when using classifier-free guidance with cfg_merge=True). It reshapes the context tensor to [1, B, N, C], which will cause a dimension mismatch later in the model's forward pass. It should be context.unsqueeze(1) to correctly shape it to [B, 1, N, C] for batch processing.

Suggested change

context = context.unsqueeze(0)

context = context.unsqueeze(1)

gemini-code-assist · 2025-10-30T05:45:11Z

+        use_gradient_checkpointing=use_gradient_checkpointing,
+        use_gradient_checkpointing_offload=use_gradient_checkpointing_offload,
+    )
+    output = -output


Negating the model's output with output = -output is highly unusual and potentially a critical bug. Standard diffusion models are trained to predict noise or velocity, and negating the output would reverse the diffusion process, leading to incorrect results. If this model was specifically trained to predict the negative of the target, this should be clearly documented with a comment. Otherwise, this line should be removed.

Suggested change

output = -output

# output = -output

gemini-code-assist · 2025-10-30T05:45:11Z

+        if num_cond_latents is not None and num_cond_latents > 0:
+            k_full = torch.cat([k_cache, k], dim=2).contiguous()
+            v_full = torch.cat([v_cache, v], dim=2).contiguous()
+            q_padding = torch.cat([torch.empty_like(k_cache), q], dim=2).contiguous()


Using torch.empty_like is unsafe as it creates a tensor with uninitialized memory, which can lead to non-deterministic behavior or NaNs. Please use torch.zeros_like instead for safe and deterministic initialization.

Suggested change

q_padding = torch.cat([torch.empty_like(k_cache), q], dim=2).contiguous()

q_padding = torch.cat([torch.zeros_like(k_cache), q], dim=2).contiguous()

support LongCat-Video

support LongCat-Video

b73e713

Artiprocher merged commit a30ed90 into main Oct 30, 2025

gemini-code-assist Bot reviewed Oct 30, 2025

View reviewed changes

Artiprocher deleted the longcat branch November 4, 2025 03:01

LePao1 pushed a commit to LePao1/DiffSynth-Studio that referenced this pull request Feb 22, 2026

Merge pull request modelscope#1018 from modelscope/longcat

4551913

support LongCat-Video

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support LongCat-Video#1018

support LongCat-Video#1018
Artiprocher merged 1 commit intomainfrom
longcat

Artiprocher commented Oct 30, 2025

Uh oh!

gemini-code-assist Bot commented Oct 30, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Oct 30, 2025

Uh oh!

gemini-code-assist Bot Oct 30, 2025

Uh oh!

gemini-code-assist Bot Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	context = context.unsqueeze(0)
	context = context.unsqueeze(1)

	q_padding = torch.cat([torch.empty_like(k_cache), q], dim=2).contiguous()
	q_padding = torch.cat([torch.zeros_like(k_cache), q], dim=2).contiguous()

Conversation

Artiprocher commented Oct 30, 2025

Uh oh!

gemini-code-assist Bot commented Oct 30, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant