![Cosmos-Transfer1-7B](cosmos-transfer1_banner.png)

**Cosmos-Transfer1** is a multimodal world-to-world (W2W) generation model from the Cosmos WFM series. It allows controllable visual generation using inputs like segmentation, depth, canny edge, and blur, with adaptive spatiotemporal control. This notebook showcases two samples to use Cosmos-Transfer1-7B for flexible and coherent visual transformations.

- Tested Spec:
    - GPU: Crusoe L40S
    - VRAM: 48GiB
    - GPU Driver: 535.183.06 (CUDA 12.2)
- Prerequisites
    - A valid Huggingface access token. You can get it from https://huggingface.co/settings/tokens
    - Access the NGC **nvidian** org. You can apply for access in the Slack channel: *#swngc-help*

The following steps are based on [Github: Cosmos-Transfer1-7B](https://github.com/nvidia-cosmos/cosmos-transfer1/blob/main/examples/inference_cosmos_transfer1_7b.md)
- Tested Commit: ed9ab808fb1c4fab04a14ecd7fbccb3e757bd92e

### Setup Environment and Dependencies
---
Execute the following commands in a terminal. To open a terminal: Launcher tab -> Other -> Terminal

```bash
# Go to home directory
cd ~

# Install uv package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

# Login your Huggingface account
uv tool install -U "huggingface_hub[cli]"
hf auth login

# Setup docker config, REPLACE <you key> with your NGC Key
export NGC_API_KEY=<your Key>
echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin

# Create a python virtual environment and install dependencies
uv venv
source .venv/bin/activate
uv pip install loguru
uv pip install torch
uv pip install huggingface_hub
uv pip install ipykernel
uv pip install ipywidgets

# Create a pyhton kernel for the notebook
python -m ipykernel install --user --name=transfer1 --display-name "Python (.venv) Transfer1"
```

### Switch to the Custom Python Kernel
---
1. Go back to the notebook: *cosmos-transfer1.ipynb*
2. click on the **Python3(ipykernel)** on upper-right corner
3. Pick **Python(.venv)Transfer1** in *Start python Kernel* section, then click Select button. (If you don't see the option, try restaring the notebook.)
4. The upper-right kernel button should be updated to *Python(.venv)Transfer1*

### Create Workspace
---
Make sure you have at least **360 GB** of free disk space to store data. The following code is for the Crusoe instance use case, you can simply create a /workspace directory if you're not using Crusoe.

In [None]:
%%bash

# Ensure the target directory exists
mkdir -p /ephemeral/workspace
# Create the symlink only if it doesn't already exist
[ -L ~/workspace ] || ln -s /ephemeral/workspace ~/workspace

### Clone GitHub Repository Which Contains Sample Scripts and Dataset
---
 Change the working directory to **/workspace** for all subsequent notebook cells.

In [None]:
%cd ~/workspace

In [None]:
%%bash

# Clone the repository
git clone https://github.com/nvidia-cosmos/cosmos-transfer1.git

# Switch to the tested commit
cd cosmos-transfer1
git fetch
git checkout ed9ab808fb1c4fab04a14ecd7fbccb3e757bd92e

### Download Model Weights
---
It takes a while to download 360+ GB data from HuggingFace. You should see this message in the end of the log.
```
Successfully downloaded google-t5/t5-11b
```

In [None]:
# Pull model weights from Huggingface
import os
import sys

project_root = os.path.abspath("cosmos-transfer1")
download_script = "cosmos-transfer1/scripts/download_checkpoints.py"
checkpoint_dir = "checkpoints/"

!PYTHONPATH={project_root} {sys.executable} {download_script} --output_dir {checkpoint_dir}

In [None]:
# You should see /ephemera used 363GB of the disk space
!df -h

### Pull Docker Image
---
Download the Docker image from the internal registry. There should be a publicly accessible image in the future. Make sure you have access to the **nvidian** org.
- NGC registry URL: https://registry.ngc.nvidia.com/orgs/nvidian/containers/cosmos-transfer1/tags

You can also build your own docker image, it takes about 80 minutes to build.
```bash
docker build -f Dockerfile . -t nvcr.io/$USER/cosmos-transfer1:latest
```

In [None]:
%%bash
docker pull nvcr.io/nvidian/cosmos-transfer1:pytorch-25-04_v2

### Spin Up the Cosmos-Transfer1 Container
---

In [None]:
# Ensure the container stays alive using "tail -f /dev/null", so we can interact with it later
!docker run --gpus all -d --name cosmos-transfer1 \
    -v ./cosmos-transfer1:/workspace \
    -v ./cosmos-transfer1/assets:/workspace/datasets \
    -v ./checkpoints:/workspace/checkpoints \
    nvcr.io/nvidian/cosmos-transfer1:pytorch-25-04_v2 tail -f /dev/null

In [None]:
# Sanity check if the environment setup is successful
!docker exec cosmos-transfer1 python scripts/test_environment.py

### Use Case #1: Single Control (Edge)
---
- VRAM Used: ~30 GB
- Inference Time:
    - ~20 minutes (A100 x 1)
    - ~18 minutes (L40S x 1)

Use the default configurations below, or modify the input video path and prompt to update the ControlNet specs.

In [None]:
import json

# Input file should be located inside /workspace/cosmos-transfer1/assets to let the container access.
# The path is related to /workspace/cosmos-transfer1.
input_path = "assets/example1_input_video.mp4"

# prompt copied from cosmos-transfer1/assets/inference_cosmos_transfer1_single_control_edge.json
prompt = """The video is set in a modern, well-lit office environment with a sleek, minimalist design.
The background features several people working at desks, indicating a busy workplace atmosphere.
The main focus is on a robotic interaction at a counter.
Two robotic arms, equipped with black gloves, are seen handling a red and white patterned coffee cup with a black lid.
The arms are positioned in front of a woman who is standing on the opposite side of the counter.
She is wearing a dark vest over a gray long-sleeve shirt and has long dark hair.
The robotic arms are articulated and move with precision, suggesting advanced technology.

At the beginning, the robotic arms hold the coffee cup securely.
As the video progresses, the woman reaches out with her right hand to take the cup.
The interaction is smooth, with the robotic arms adjusting their grip to facilitate the handover.
The woman's hand approaches the cup, and she grasps it confidently, lifting it from the robotic grip.
The camera remains static throughout, focusing on the exchange between the robotic arms and the woman.
The setting includes a white countertop with a container holding stir sticks and a potted plant, adding to the modern aesthetic.
The video highlights the seamless integration of robotics in everyday tasks, emphasizing efficiency and precision in a contemporary office setting.
"""

spec_content = {
    "prompt": prompt,
    "input_video_path": input_path,
    "edge": {
        "control_weight": 1.0
    }
}

json_path = "cosmos-transfer1/assets/custom_spec.json"
with open(json_path, 'w') as fd:
    json.dump(spec_content, fd, sort_keys=False, indent=4)

print(f"Controlnet spec JSON created: {json_path}")

Run inference. You should see this message in the end of the log:
```bash
[08-05 12:47:18|INFO|cosmos_transfer1/diffusion/inference/transfer.py:396:demo] Saved video to outputs/example1_single_control_edge/output.mp4
[08-05 12:47:18|INFO|cosmos_transfer1/diffusion/inference/transfer.py:397:demo] Saved prompt to outputs/example1_single_control_edge/output.txt
```

In [None]:
%%bash

docker exec cosmos-transfer1 bash -c "
export CUDA_VISIBLE_DEVICES=\${CUDA_VISIBLE_DEVICES:=0}
export CHECKPOINT_DIR=\${CHECKPOINT_DIR:=./checkpoints}
export NUM_GPU=\${NUM_GPU:=1}
export PYTHONPATH=./
torchrun --nproc_per_node=\$NUM_GPU --nnodes=1 --node_rank=0 cosmos_transfer1/diffusion/inference/transfer.py \
    --checkpoint_dir \$CHECKPOINT_DIR \
    --video_save_folder outputs/custom_spec \
    --controlnet_specs assets/custom_spec.json \
    --offload_text_encoder_model \
    --offload_guardrail_models \
    --num_gpus \$NUM_GPU
"
# You can review the output files in *workspace/cosmos_transfer1/outputs/custom_spec*.

### Use Case #2 Prompt Upsampler + Single-Step Inference
---
- VRAM Used: ~30 GB
- Inference Time:
    - ~5 minutes (L40S x 1)

Cosmos-Transfer1 supports a variety of configurations. You can pass your configuration in a JSON file via the arguments.
Here are two examples:

**--use_distilled**: It allows for single-step inference, reducing the compute resources used.

**--upsample_prompt**: It converts your short prompt into a longer, more detailed prompt for video generation.

In [None]:
import json

# Input file should be located inside /workspace/cosmos-transfer1/assets to let the container access.
# The path is related to /workspace/cosmos-transfer1.
input_path = "assets/example1_input_video.mp4"

# prompt copied from cosmos-transfer1/assets/inference_cosmos_transfer1_single_control_edge_short_prompt.json
prompt = """a robotic arm hand over a coffee cup to a woman in a modern office.
"""

spec_content = {
    "prompt": prompt,
    "input_video_path": input_path,
    "edge": {
        "control_weight": 1.0
    }
}

json_path = "cosmos-transfer1/assets/custom_spec_short_prompt.json"
with open(json_path, 'w') as fd:
    json.dump(spec_content, fd, sort_keys=False, indent=4)

print(f"Controlnet spec JSON created: {json_path}")

Run inference.

In [None]:
%%bash

docker exec cosmos-transfer1 bash -c "
export CUDA_VISIBLE_DEVICES=\${CUDA_VISIBLE_DEVICES:=0}
export CHECKPOINT_DIR=\${CHECKPOINT_DIR:=./checkpoints}
export NUM_GPU=\${NUM_GPU:=1}
export PYTHONPATH=/workspace
torchrun --nproc_per_node=\$NUM_GPU --nnodes=1 --node_rank=0 /workspace/cosmos_transfer1/diffusion/inference/transfer.py \
    --checkpoint_dir \$CHECKPOINT_DIR \
    --video_save_folder outputs/custom_spec_short_prompt \
    --controlnet_specs assets/custom_spec_short_prompt.json \
    --offload_text_encoder_model \
    --upsample_prompt \
    --offload_prompt_upsampler \
    --offload_guardrail_models \
    --num_gpus \$NUM_GPU \
    --use_distilled
"
# You can review the output files in *workspace/cosmos_transfer1/outputs/custom_spec_short_prompt*.

For more detailed information and samples, please refer to: [Cosmos-Transfer1: World Generation with Adaptive Multimodal Control](
https://github.com/nvidia-cosmos/cosmos-transfer1/blob/main/examples/inference_cosmos_transfer1_7b.md)

### Stop the Container
---

In [None]:
%%bash
docker stop cosmos-transfer1
docker rm cosmos-transfer1