-
Notifications
You must be signed in to change notification settings - Fork 1.2k
update LTX-2.3 doc #1364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update LTX-2.3 doc #1364
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -24,88 +24,36 @@ from diffsynth.pipelines.ltx2_audio_video import LTX2AudioVideoPipeline, ModelCo | |||||||
| from diffsynth.utils.data.media_io_ltx2 import write_video_audio_ltx2 | ||||||||
|
|
||||||||
| vram_config = { | ||||||||
| "offload_dtype": torch.float8_e5m2, | ||||||||
| "offload_dtype": torch.bfloat16, | ||||||||
| "offload_device": "cpu", | ||||||||
| "onload_dtype": torch.float8_e5m2, | ||||||||
| "onload_device": "cpu", | ||||||||
| "preparing_dtype": torch.float8_e5m2, | ||||||||
| "onload_dtype": torch.bfloat16, | ||||||||
| "onload_device": "cuda", | ||||||||
| "preparing_dtype": torch.bfloat16, | ||||||||
| "preparing_device": "cuda", | ||||||||
| "computation_dtype": torch.bfloat16, | ||||||||
| "computation_device": "cuda", | ||||||||
| } | ||||||||
| """ | ||||||||
| Offical model repo: https://www.modelscope.cn/models/Lightricks/LTX-2 | ||||||||
| Repackaged model repo: https://www.modelscope.cn/models/DiffSynth-Studio/LTX-2-Repackage | ||||||||
| For base models of LTX-2, offical checkpoint (with model config ModelConfig(model_id="Lightricks/LTX-2", origin_file_pattern="ltx-2-19b-dev.safetensors")) | ||||||||
| and repackaged checkpoints (with model config ModelConfig(model_id="DiffSynth-Studio/LTX-2-Repackage", origin_file_pattern="*.safetensors")) are both supported. | ||||||||
| We have repackeged the official checkpoints in DiffSynth-Studio/LTX-2-Repackage repo to support separate loading of different submodules, | ||||||||
| and avoid redundant memory usage when users only want to use part of the model. | ||||||||
| """ | ||||||||
| # use the repackaged modelconfig from "DiffSynth-Studio/LTX-2-Repackage" to avoid redundant model loading | ||||||||
| pipe = LTX2AudioVideoPipeline.from_pretrained( | ||||||||
| torch_dtype=torch.bfloat16, | ||||||||
| device="cuda", | ||||||||
| model_configs=[ | ||||||||
| ModelConfig(model_id="google/gemma-3-12b-it-qat-q4_0-unquantized", origin_file_pattern="model-*.safetensors", **vram_config), | ||||||||
| ModelConfig(model_id="DiffSynth-Studio/LTX-2-Repackage", origin_file_pattern="transformer.safetensors", **vram_config), | ||||||||
| ModelConfig(model_id="DiffSynth-Studio/LTX-2-Repackage", origin_file_pattern="text_encoder_post_modules.safetensors", **vram_config), | ||||||||
| ModelConfig(model_id="DiffSynth-Studio/LTX-2-Repackage", origin_file_pattern="video_vae_decoder.safetensors", **vram_config), | ||||||||
| ModelConfig(model_id="DiffSynth-Studio/LTX-2-Repackage", origin_file_pattern="audio_vae_decoder.safetensors", **vram_config), | ||||||||
| ModelConfig(model_id="DiffSynth-Studio/LTX-2-Repackage", origin_file_pattern="audio_vocoder.safetensors", **vram_config), | ||||||||
| ModelConfig(model_id="DiffSynth-Studio/LTX-2-Repackage", origin_file_pattern="video_vae_encoder.safetensors", **vram_config), | ||||||||
| ModelConfig(model_id="Lightricks/LTX-2", origin_file_pattern="ltx-2-spatial-upscaler-x2-1.0.safetensors", **vram_config), | ||||||||
| ModelConfig(model_id="Lightricks/LTX-2.3", origin_file_pattern="ltx-2.3-22b-dev.safetensors", **vram_config), | ||||||||
| ModelConfig(model_id="Lightricks/LTX-2.3", origin_file_pattern="ltx-2.3-spatial-upscaler-x2-1.0.safetensors", **vram_config), | ||||||||
| ], | ||||||||
| tokenizer_config=ModelConfig(model_id="google/gemma-3-12b-it-qat-q4_0-unquantized"), | ||||||||
| stage2_lora_config=ModelConfig(model_id="Lightricks/LTX-2", origin_file_pattern="ltx-2-19b-distilled-lora-384.safetensors"), | ||||||||
| vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5, | ||||||||
| stage2_lora_config=ModelConfig(model_id="Lightricks/LTX-2.3", origin_file_pattern="ltx-2.3-22b-distilled-lora-384.safetensors"), | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The documentation states that "VRAM management has been enabled", but the
Suggested change
|
||||||||
| ) | ||||||||
|
|
||||||||
| # use the following modelconfig if you want to initialize model from offical checkpoints from "Lightricks/LTX-2" | ||||||||
| # pipe = LTX2AudioVideoPipeline.from_pretrained( | ||||||||
| # torch_dtype=torch.bfloat16, | ||||||||
| # device="cuda", | ||||||||
| # model_configs=[ | ||||||||
| # ModelConfig(model_id="google/gemma-3-12b-it-qat-q4_0-unquantized", origin_file_pattern="model-*.safetensors", **vram_config), | ||||||||
| # ModelConfig(model_id="Lightricks/LTX-2", origin_file_pattern="ltx-2-19b-dev.safetensors", **vram_config), | ||||||||
| # ModelConfig(model_id="Lightricks/LTX-2", origin_file_pattern="ltx-2-spatial-upscaler-x2-1.0.safetensors", **vram_config), | ||||||||
| # ], | ||||||||
| # tokenizer_config=ModelConfig(model_id="google/gemma-3-12b-it-qat-q4_0-unquantized"), | ||||||||
| # stage2_lora_config=ModelConfig(model_id="Lightricks/LTX-2", origin_file_pattern="ltx-2-19b-distilled-lora-384.safetensors"), | ||||||||
| # vram_limit=torch.cuda.mem_get_info("cuda")[1] / (1024 ** 3) - 0.5, | ||||||||
| # ) | ||||||||
|
|
||||||||
| prompt = "A girl is very happy, she is speaking: \"I enjoy working with Diffsynth-Studio, it's a perfect framework.\"" | ||||||||
| negative_prompt = ( | ||||||||
| "blurry, out of focus, overexposed, underexposed, low contrast, washed out colors, excessive noise, " | ||||||||
| "grainy texture, poor lighting, flickering, motion blur, distorted proportions, unnatural skin tones, " | ||||||||
| "deformed facial features, asymmetrical face, missing facial features, extra limbs, disfigured hands, " | ||||||||
| "wrong hand count, artifacts around text, inconsistent perspective, camera shake, incorrect depth of " | ||||||||
| "field, background too sharp, background clutter, distracting reflections, harsh shadows, inconsistent " | ||||||||
| "lighting direction, color banding, cartoonish rendering, 3D CGI look, unrealistic materials, uncanny " | ||||||||
| "valley effect, incorrect ethnicity, wrong gender, exaggerated expressions, wrong gaze direction, " | ||||||||
| "mismatched lip sync, silent or muted audio, distorted voice, robotic voice, echo, background noise, " | ||||||||
| "off-sync audio, incorrect dialogue, added dialogue, repetitive speech, jittery movement, awkward " | ||||||||
| "pauses, incorrect timing, unnatural transitions, inconsistent framing, tilted camera, flat lighting, " | ||||||||
| "inconsistent tone, cinematic oversaturation, stylized filters, or AI artifacts." | ||||||||
| ) | ||||||||
| height, width, num_frames = 512 * 2, 768 * 2, 121 | ||||||||
| prompt = "Two cute orange cats, wearing boxing gloves, stand in a boxing ring and fight each other. They are punching each other fast and yelling: 'I will win!'" | ||||||||
| negative_prompt = pipe.default_negative_prompt["LTX-2.3"] | ||||||||
| video, audio = pipe( | ||||||||
| prompt=prompt, | ||||||||
| negative_prompt=negative_prompt, | ||||||||
| seed=43, | ||||||||
| height=height, | ||||||||
| width=width, | ||||||||
| num_frames=num_frames, | ||||||||
| tiled=True, | ||||||||
| use_two_stage_pipeline=True, | ||||||||
| ) | ||||||||
| write_video_audio_ltx2( | ||||||||
| video=video, | ||||||||
| audio=audio, | ||||||||
| output_path='ltx2_twostage.mp4', | ||||||||
| fps=24, | ||||||||
| audio_sample_rate=24000, | ||||||||
| height=1024, width=1536, num_frames=121, | ||||||||
| tiled=True, use_two_stage_pipeline=True, | ||||||||
| ) | ||||||||
| write_video_audio_ltx2(video=video, audio=audio, output_path='video.mp4', fps=24, audio_sample_rate=pipe.audio_vocoder.output_sampling_rate) | ||||||||
| ``` | ||||||||
|
|
||||||||
| ## 模型总览 | ||||||||
|
|
||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation states that "VRAM management has been enabled", but the
vram_limitparameter is missing from theLTX2AudioVideoPipeline.from_pretrainedcall. Without this parameter, VRAM management will be disabled, which could lead to out-of-memory errors for users with limited VRAM. Please add thevram_limitparameter to enable automatic VRAM management as described.