The Jupyter Notebook example utilizes the **"Cosmos-0.1-Tokenizer-CI16x16"** pretrained model, designed as a Continuous Image (CI) tokenizer. This model transforms images into continuous latent space representations rather than discrete tokens. As implied by its name, it reduces the spatial dimensions of input images to **16x16**, effectively downsizing both height and width by a factor of 16.

Within the notebook, the `ImageTokenizer` class from the `cosmos_tokenizer.image_lib` module is employed to manage the encoder and decoder components of this model. The encoder compresses the input image into a condensed latent representation, while the decoder reconstructs the image from this latent representation.

This instance of the Cosmos Tokenizer demonstrates its autoencoding capability: compressing an image into a smaller latent space and subsequently reconstructing it to its original form. This showcases the efficiency of continuous image tokenization for tasks involving significant spatial compression during image reconstruction.


this tutorial follows a clear, step-by-step approach, making it easy to understand and adapt.
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/aicam/Cosmos-Tokenizer/blob/main/notebook/Image_Tokenization.ipynb)

### Step 1: Clone the Cosmos Tokenizer Repository

In [1]:
!git clone https://github.com/NVIDIA/Cosmos-Tokenizer.git

Cloning into 'Cosmos-Tokenizer'...
remote: Enumerating objects: 149, done.[K
remote: Counting objects: 100% (32/32), done.[K
remote: Compressing objects: 100% (20/20), done.[K
remote: Total 149 (delta 15), reused 12 (delta 12), pack-reused 117 (from 1)[K
Receiving objects: 100% (149/149), 2.95 MiB | 18.01 MiB/s, done.
Resolving deltas: 100% (76/76), done.


### Step 2: Change the working directory to the cloned folder and Install required dependencies.

In [2]:
# Step 2: Automatically change the working directory to the cloned folder
import os
os.chdir("Cosmos-Tokenizer")  # Change to the cloned repo directory

In [3]:
%pip install opencv-python torch torchvision mediapy loguru # you may also need to install other dependencies

Collecting mediapy
  Downloading mediapy-1.2.2-py3-none-any.whl.metadata (4.8 kB)
Collecting loguru
  Downloading loguru-0.7.3-py3-none-any.whl.metadata (22 kB)
Collecting jedi>=0.16 (from ipython->mediapy)
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Downloading mediapy-1.2.2-py3-none-any.whl (26 kB)
Downloading loguru-0.7.3-py3-none-any.whl (61 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.6/61.6 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading jedi-0.19.2-py2.py3-none-any.whl (1.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m69.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: loguru, jedi, mediapy
Successfully installed jedi-0.19.2 loguru-0.7.3 mediapy-1.2.2


### Step 3: Set Up Hugging Face API Token and Download Pretrained Models

In this step, you'll configure the Hugging Face API token and download the pretrained model weights required for the **Cosmos Tokenizer**.

1. **Ensure You Have a Hugging Face Account**  
   If you do not already have a Hugging Face account, follow these steps to create one and generate an API token:
   - Go to the [Hugging Face website](https://huggingface.co/) and sign up for a free account.
   - After logging in, navigate to your [Settings → Access Tokens](https://huggingface.co/settings/tokens).
   - Click on "New Token" to generate an API token with the required permissions.

2. **Set the Hugging Face Token**  
   Check if the Hugging Face token is already set in the environment variables. If not, you will be prompted to enter it manually. The token is essential to authenticate and access the Hugging Face models.



In [5]:
# Check if the token is already set
if "HUGGINGFACE_TOKEN" not in os.environ:
    os.environ["HUGGINGFACE_TOKEN"] = input("Please enter your Hugging Face API token: ")



In [6]:
from huggingface_hub import login, snapshot_download
import os
HUGGINGFACE_TOKEN = os.environ.get("HUGGINGFACE_TOKEN")
login(token=HUGGINGFACE_TOKEN, add_to_git_credential=True)
model_names = [
        "Cosmos-0.1-Tokenizer-CI16x16",
]
for model_name in model_names:
    hf_repo = "nvidia/" + model_name
    local_dir = "../pretrained_ckpts/" + model_name
    os.makedirs(local_dir, exist_ok=True)
    print(f"downloading {model_name}...")
    snapshot_download(repo_id=hf_repo, local_dir=local_dir)

downloading Cosmos-0.1-Tokenizer-CI16x16...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 7 files:   0%|          | 0/7 [00:00<?, ?it/s]

README.md:   0%|          | 0.00/21.7k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/54.0 [00:00<?, ?B/s]

autoencoder.jit:   0%|          | 0.00/163M [00:00<?, ?B/s]

model_config.yaml:   0%|          | 0.00/92.0 [00:00<?, ?B/s]

encoder.jit:   0%|          | 0.00/67.2M [00:00<?, ?B/s]

decoder.jit:   0%|          | 0.00/96.2M [00:00<?, ?B/s]

.gitattributes:   0%|          | 0.00/1.67k [00:00<?, ?B/s]

### Step 4: Install and Use the Cosmos Tokenizer for Image Reconstruction

In this step, you'll install the **Cosmos Tokenizer** (if not installed), load the required checkpoints, and perform image reconstruction.

1. **Install the Cosmos Tokenizer** (if not already installed)  
   Before proceeding, ensure you have the **Cosmos Tokenizer** installed. If you cloned the repository in Step 1, use the following command to install it in editable mode:

   ```bash
   %pip install -e /path/to/Cosmos-Tokenizer


In [7]:
import cv2
import numpy as np
import torch

import importlib
import cosmos_tokenizer.image_lib

importlib.reload(cosmos_tokenizer.image_lib)
from cosmos_tokenizer.image_lib import ImageTokenizer

# 1) Specify the model name, and the paths to the encoder/decoder checkpoints.
model_name = "Cosmos-0.1-Tokenizer-CI16x16"
encoder_ckpt = f"../pretrained_ckpts/{model_name}/encoder.jit"
decoder_ckpt = f"../pretrained_ckpts/{model_name}/decoder.jit"

# 2) Load or provide the image filename you want to tokenize & reconstruct.
input_image_path = "/content/Cosmos-Tokenizer/test_data/image.png"

# 3) Read the image from disk (shape = H x W x 3 in BGR). Then convert to RGB.
original_bgr = cv2.imread(input_image_path)
if original_bgr is None:
    raise FileNotFoundError(f"Could not read image file: {input_image_path}")

original_rgb = cv2.cvtColor(original_bgr, cv2.COLOR_BGR2RGB)

# 4) Expand dimensions to B x H x W x C, since the ImageTokenizer expects a batch dimension
#    in the input. (Batch size = 1 in this example.)
input_image = np.expand_dims(original_rgb, axis=0)

# 5) Create the ImageTokenizer instance with the encoder & decoder.
#    - device="cuda" uses the GPU
#    - dtype="bfloat16" expects Ampere or newer GPU (A100, RTX 30xx, etc.)
tokenizer = ImageTokenizer(
    checkpoint_enc=encoder_ckpt,
    checkpoint_dec=decoder_ckpt,
    device="cuda",
    dtype="bfloat16",
)

# 6) Use the tokenizer to autoencode (encode & decode) the image.
#    The output is a NumPy array with shape = B x H x W x C, range [0..255].
reconstructed_image = tokenizer.forward(input_image)

# 7) Extract the single image from the batch (index 0), convert to uint8.
reconstructed_image = reconstructed_image[0].astype(np.uint8)

# 8) Convert from RGB back to BGR (if you want to save using OpenCV).
reconstructed_bgr = cv2.cvtColor(reconstructed_image, cv2.COLOR_RGB2BGR)

# 9) Save the reconstructed image to disk.
output_image_path = "my_image_reconstructed.jpg"
cv2.imwrite(output_image_path, reconstructed_bgr)

print("Reconstruction saved to:", output_image_path)


Reconstruction saved to: my_image_reconstructed.jpg
