Cosmos Predict2 #11695

a-r-r-o-w · 2025-06-11T14:29:17Z

The cosmos is within us. We are made of star-stuff. We are a way for the universe to know itself.

cc @pjannaty @chenhsuanlin @fitsumreda @asfiyab-nvidia @amolfasale

…pecific scheduler

…cosmos-predict2

a-r-r-o-w · 2025-06-11T14:33:36Z

@yiyixuxu This PR contains the version that works with FlowMatchEulerDiscreteScheduler. For the version that works with EDMEulerDiscrete, it is in this branch.

We will probably be sticking with this PR IIUC from our discussion, so I'll update the scheduler configs once you confirm (currently only the text2image pipeline has been updated, so will update video2world soon too)

PRs:

HuggingFaceDocBuilderDev · 2025-06-11T14:36:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

pjannaty

LGTM! Thank you, Aryan and the HuggingFace team!

yiyixuxu

looks good!

a-r-r-o-w · 2025-06-13T15:43:03Z

We have new weight PRs now since the original weights were updated at the time of release:

Arslan-Mehmood1 · 2025-06-19T11:02:55Z

python3.10

code

import time
import torch
from diffusers import Cosmos2TextToImagePipeline

# Available checkpoints: nvidia/Cosmos-Predict2-2B-Text2Image, nvidia/Cosmos-Predict2-14B-Text2Image
model_id = "nvidia/Cosmos-Predict2-14B-Text2Image"
pipe = Cosmos2TextToImagePipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "A close-up shot captures a vibrant yellow scrubber vigorously working on a grimy plate, its bristles moving in circular motions to lift stubborn grease and food residue. The dish, once covered in remnants of a hearty meal, gradually reveals its original glossy surface. Suds form and bubble around the scrubber, creating a satisfying visual of cleanliness in progress. The sound of scrubbing fills the air, accompanied by the gentle clinking of the dish against the sink. As the scrubber continues its task, the dish transforms, gleaming under the bright kitchen lights, symbolizing the triumph of cleanliness over mess."
negative_prompt = "The video captures a series of frames showing ugly scenes, static with no motion, motion blur, over-saturation, shaky footage, low resolution, grainy texture, pixelated images, poorly lit areas, underexposed and overexposed scenes, poor color balance, washed out colors, choppy sequences, jerky movements, low frame rate, artifacting, color banding, unnatural transitions, outdated special effects, fake elements, unconvincing visuals, poorly edited content, jump cuts, visual noise, and flickering. Overall, the video is of poor quality."

start_time = time.time()
print("Generating image...")

output = pipe(
    prompt=prompt, negative_prompt=negative_prompt, generator=torch.Generator().manual_seed(1)
).images[0]

output.save("output_cosmos14b.png")

end_time = time.time()
time_taken = end_time - start_time
print(f"Image generated and saved as 'output.png'. Time taken: {time_taken:.2f} seconds")

better-profanity==0.7.0
boto3==1.38.31
decord==0.6.0
diffusers==0.33.1
ftfy==6.3.1
fvcore==0.1.5.post20221221
h11==0.16.0
huggingface-hub==0.32.4
hydra-core==1.3.2
imageio[pyav,ffmpeg]==2.37.0
iopath==0.1.10
ipdb==0.13.13
loguru==0.7.3
mediapy==1.2.4
megatron-core==0.12.1
modelscope==1.26.0
nltk==3.9.1
omegaconf==2.3.0
opencv-python==4.11.0.86
peft==0.15.2
qwen-vl-utils[decord]==0.0.11
retinaface-py==0.0.2
scikit-image==0.25.2
sentencepiece==0.2.0
termcolor==3.1.0
transformers==4.51.3
webdataset==0.2.111

ERROR:

Traceback (most recent call last):
  File "/home/paperspace/Ahmer/cosmos/test.py", line 3, in <module>
    from diffusers import Cosmos2TextToImagePipeline
ImportError: cannot import name 'Cosmos2TextToImagePipeline' from 'diffusers' (/home/paperspace/.virtualenvs/cosmos/lib/python3.10/site-packages/diffusers/__init__.py)

a-r-r-o-w · 2025-06-19T11:07:20Z

Since there hasn't been a diffusers release yet, you need to install from the main branch to use Cosmos. A release will be happening soon, but for the time being, please try: pip install git+https://github.com/huggingface/diffusers

matabear-wyx · 2025-07-24T04:04:20Z

why num_channels_latents = self.transformer.config.in_channels - 1 ?

a-r-r-o-w · 2025-07-24T04:08:20Z

Video2World models have an additional channel for concatenated conditioning mask, which indicates what frames to use for video extending condition signal. The actual latent channels is one less than the transformer in_channels

a-r-r-o-w added 15 commits June 7, 2025 16:45

support text-to-image

4d90851

update example

a0617d5

make fix-copies

c2ab6c8

support use_flow_sigmas in EDM scheduler instead of maintain cosmos-s…

0d56c0c

…pecific scheduler

support video-to-world

3e019f2

update

9059a52

Merge branch 'integrations/cosmos-predict2' into nvidia/integrations/…

b99b000

…cosmos-predict2

rename text2image pipeline

829545d

make fix-copies

4f8c133

add t2i test

714f89d

add test for v2w pipeline

06e852d

support edm dpmsolver multistep

2d01740

Merge branch 'integrations/cosmos-predict2' into nvidia/integrations/…

cdb3aa5

…cosmos-predict2

update

178f9b6

update

f046889

a-r-r-o-w requested a review from yiyixuxu June 11, 2025 14:29

pjannaty approved these changes Jun 11, 2025

View reviewed changes

yiyixuxu approved these changes Jun 11, 2025

View reviewed changes

a-r-r-o-w added 6 commits June 11, 2025 20:57

update

6226c8d

update tests

f3b427d

fix tests

8499008

safety checker

5ba504c

Merge branch 'main' into integrations/cosmos-predict2-fm-scheduler

aadbada

make conversion script work without guardrail

2c9e946

a-r-r-o-w merged commit 9f91305 into main Jun 13, 2025
16 checks passed

a-r-r-o-w deleted the integrations/cosmos-predict2-fm-scheduler branch June 13, 2025 20:21

dylanholmes mentioned this pull request Jun 18, 2025

Make requirements even more flexible yiyixuxu/cosmos-guardrail#2

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cosmos Predict2 #11695

Cosmos Predict2 #11695

Uh oh!

a-r-r-o-w commented Jun 11, 2025 •

edited

Loading

Uh oh!

a-r-r-o-w commented Jun 11, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jun 11, 2025

Uh oh!

pjannaty left a comment

Uh oh!

yiyixuxu left a comment

Uh oh!

a-r-r-o-w commented Jun 13, 2025

Uh oh!

Uh oh!

Arslan-Mehmood1 commented Jun 19, 2025

Uh oh!

a-r-r-o-w commented Jun 19, 2025

Uh oh!

matabear-wyx commented Jul 24, 2025

Uh oh!

a-r-r-o-w commented Jul 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Cosmos Predict2 #11695

Cosmos Predict2 #11695

Uh oh!

Conversation

a-r-r-o-w commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 11, 2025

Uh oh!

pjannaty left a comment

Choose a reason for hiding this comment

Uh oh!

yiyixuxu left a comment

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w commented Jun 13, 2025

Uh oh!

Uh oh!

Arslan-Mehmood1 commented Jun 19, 2025

Uh oh!

a-r-r-o-w commented Jun 19, 2025

Uh oh!

matabear-wyx commented Jul 24, 2025

Uh oh!

a-r-r-o-w commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

a-r-r-o-w commented Jun 11, 2025 •

edited

Loading

a-r-r-o-w commented Jun 11, 2025 •

edited

Loading

a-r-r-o-w commented Jul 24, 2025 •

edited

Loading