![Cosmos-Reason1-7B](cosmos-reason1_banner.png)
[Github: Cosmos-Reason1-7B](https://github.com/nvidia-cosmos/cosmos-reason1/tree/main)

### Set Up Your New Instance in a Terminal
To open a terminal: Launcher tab -> Other -> Terminal

```bash
# Install uv/just
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uv tool install rust-just

# Clone the repository
git clone https://github.com/nvidia-cosmos/cosmos-reason1.git
cd cosmos-reason1

# Install the package using venv
just install
source .venv/bin/activate

# Make sure pip is installed in the .venv
python -m ensurepip --upgrade

# Restart the venv
deactivate
source .venv/bin/activate

# Set up a custom kernel for Jupyter Notebook
pip3 install ipykernel
python -m ipykernel install --user --name=reason1 --display-name "Python (.venv) Reason1"
pip3 install -U ipywidgets

# Login your Huggingface account to download checkpoints later
# Get your token here: https://huggingface.co/settings/tokens
pip3 install huggingface_hub
huggingface-cli login
```

### Switch to the Custom Python Kernel
1. Go back to reason1.ipynb
2. click on the **Python3(ipykernel)** on upper-right corner
3. Pick **Python(.venv)Reason1**, then click Select button. (If you don't see the option, try restaring the notebook.)
4. The upper-right kernel button should be changed to Python(.venv)Reason1

### Download Models from Huggingface

In [2]:
from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="nvidia/Cosmos-Reason1-7B",
    local_dir="nvidia/Cosmos-Reason1-7B",
)

Fetching 13 files:   0%|          | 0/13 [00:00<?, ?it/s]

'/home/ubuntu/nvidia/Cosmos-Reason1-7B'

## Test Inference
It takes a few seconds to start printing outputs. You should see "<\/answer>" at the end of the log.

In [1]:
from transformers import AutoProcessor
from vllm import LLM, SamplingParams
from qwen_vl_utils import process_vision_info

# You can also replace the MODEL_PATH by a safetensors folder path mentioned above
MODEL_PATH = "nvidia/Cosmos-Reason1-7B"
VIDEO_PATH = "cosmos-reason1/assets/sample.mp4"

print(MODEL_PATH)

llm = LLM(
    model=MODEL_PATH,
    limit_mm_per_prompt={"image": 10, "video": 10},
)

sampling_params = SamplingParams(
    temperature=0.6,
    top_p=0.95,
    repetition_penalty=1.05,
    max_tokens=4096,
)

video_messages = [
    {"role": "system", "content": "You are a helpful assistant. Answer the question in the following format: <think>\nyour reasoning\n</think>\n\n<answer>\nyour answer\n</answer>."},
    {"role": "user", "content": [
            {"type": "text", "text": (
                    "Is it safe to turn right?"
                )
            },
            {
                "type": "video", 
                "video": VIDEO_PATH,
                "fps": 4,
            }
        ]
    },
]

# Here we use video messages as a demonstration
messages = video_messages

processor = AutoProcessor.from_pretrained(MODEL_PATH)
prompt = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
image_inputs, video_inputs, video_kwargs = process_vision_info(messages, return_video_kwargs=True)

mm_data = {}
if image_inputs is not None:
    mm_data["image"] = image_inputs
if video_inputs is not None:
    mm_data["video"] = video_inputs

llm_inputs = {
    "prompt": prompt,
    "multi_modal_data": mm_data,

    # FPS will be returned in video_kwargs
    "mm_processor_kwargs": video_kwargs,
}

outputs = llm.generate([llm_inputs], sampling_params=sampling_params)
generated_text = outputs[0].outputs[0].text

print(generated_text)

INFO 07-29 09:18:17 [__init__.py:244] Automatically detected platform cuda.
nvidia/Cosmos-Reason1-7B
INFO 07-29 09:18:25 [config.py:841] This model supports multiple tasks: {'classify', 'embed', 'generate', 'reward'}. Defaulting to 'generate'.
INFO 07-29 09:18:25 [config.py:1472] Using max model len 128000
INFO 07-29 09:18:26 [config.py:2285] Chunked prefill is enabled with max_num_batched_tokens=16384.
INFO 07-29 09:18:30 [__init__.py:244] Automatically detected platform cuda.
INFO 07-29 09:18:32 [core.py:526] Waiting for init message from front-end.
INFO 07-29 09:18:32 [core.py:69] Initializing a V1 LLM engine (v0.9.2) with config: model='nvidia/Cosmos-Reason1-7B', speculative_config=None, tokenizer='nvidia/Cosmos-Reason1-7B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=128000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipelin

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:00,  3.43it/s]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.53it/s]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.25it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.15it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:03<00:00,  1.27it/s]



INFO 07-29 09:18:38 [default_loader.py:272] Loading weights took 3.21 seconds
INFO 07-29 09:18:38 [gpu_model_runner.py:1801] Model loading took 15.6271 GiB and 3.542802 seconds
INFO 07-29 09:18:38 [gpu_model_runner.py:2238] Encoder cache will be initialized with a budget of 98304 tokens, and profiled with 1 video items of the maximum feature size.


Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.


INFO 07-29 09:18:48 [backends.py:508] Using cache directory: /home/ubuntu/.cache/vllm/torch_compile_cache/7c2ad702c1/rank_0_0/backbone for vLLM's torch.compile
INFO 07-29 09:18:48 [backends.py:519] Dynamo bytecode transform time: 4.88 s
INFO 07-29 09:18:52 [backends.py:155] Directly load the compiled graph(s) for shape None from the cache, took 3.210 s
INFO 07-29 09:18:52 [monitor.py:34] torch.compile takes 4.88 s in total
INFO 07-29 09:18:53 [gpu_worker.py:232] Available KV cache memory: 49.57 GiB
INFO 07-29 09:18:53 [kv_cache_utils.py:716] GPU KV cache size: 928,176 tokens
INFO 07-29 09:18:53 [kv_cache_utils.py:720] Maximum concurrency for 128,000 tokens per request: 7.25x


Capturing CUDA graph shapes: 100%|██████████| 67/67 [00:15<00:00,  4.39it/s]


INFO 07-29 09:19:09 [gpu_model_runner.py:2326] Graph capturing finished in 15 secs, took 0.66 GiB
INFO 07-29 09:19:09 [core.py:172] init engine (profile, create kv cache, warmup model) took 30.48 seconds


Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
You have video processor config saved in `preprocessor.json` file which is deprecated. Video processor configs should be saved in their own `video_preprocessor.json` file. You can rename the file or load and save the processor back which renames it automatically. Loading from `preprocessor.json` will be removed in v5.0.
qwen-vl-utils using torchvision to read video.


Adding requests:   0%|          | 0/1 [00:00<?, ?it/s]

Processed prompts:   0%|          | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]

<think>
Okay, let's see. The user is asking if it's safe to turn right based on the video provided. First, I need to parse the details given. The scenario is a residential street with parked cars and driveways. The driver has stopped at an intersection, checked for pedestrians and traffic, and now wants to turn right.

The key points here are the parked cars and driveways. When turning right, especially in residential areas, visibility can be an issue. Parked cars might block the view of oncoming traffic or pedestrians stepping out from between them. Also, there could be vehicles exiting driveways without checking properly. The driver already checked both directions, but after that, when actually making the turn, they need to be cautious again.

In the video, there's mention of multiple parked cars along the curb and driveways. That setup increases the risk of hidden hazards. Even though the driver stopped and looked before proceeding, during the turn itself, they should slow down, che