GigaWorld-1: A Roadmap to World Models for Robot Policy Evaluation

GigaWorld-1: A Roadmap to World Models for Robot Policy Evaluation

🚀 Open-source training, inference, data processing, checkpoint conversion, and LoRA merge workflows for robot world models.

📑 Table of Contents

📰 Latest Updates
📊 Open-Source Progress
1. 📦 Environment Setup
- 1.1 🖥️ Hardware & OS
- 1.2 🧰 Install Dependencies
2. 🗃️ Data Preparation
3. 🧩 Model Preparation
4. 🚂 Training
5. 🎬 Inference
6. 🔄 Model Merge & Checkpoint Conversion
7. 📁 Repository Layout
🙏 Acknowledgements
📖 Citation

📰 Latest Updates

Date	Update
📄 2026-07	We uploaded the PDF version of the technical report, arXiv coming soon.
🧑‍💻 2026-07	Partial training, inference, data processing, and model utility code was open-sourced.
📦 2026-07	Partial model weights, toy data, and download tools were released.
📖 2026-07	The GigaWorld-1 technical report was released.
🏆 2026-04	We released the CVPR 2026 World Model Track leaderboard. See CVPR-2026-WorldModel-Track-LeaderBoard.
🏆 2026-03	We hosted the CVPR 2026 World Model Challenge. See CVPR-2026-Workshop-WM-Track.

💡 Subscribe to releases — click Watch ▾ → Custom → Releases on the GitHub repo to be notified when new weights, datasets, or the WMBench benchmark drop.

📊 Open-Source Progress

🟢 Released · 🟡 Beta · 🔴 Coming Soon — last updated 2026-07

Status	Component	Description
🟢	Stage-1 weights (Nano / Pro)	Released on open-gigaai/Giga-World-1 and ModelScope
🟢	Training code	Stage-1: `train_gigaworld_functrl_uni_stage1.py` for Nano (1.3B) and Pro (5B), DeepSpeed ZeRO-2/3 ready — see §4.1; Stage-2: `train_gigaworld_functrl_uni_stage2_dmd.py` for DMD2 distillation (4–6 steps) — see §4.2
🟢	Inference code (i2v / t2v)	Nano + Pro one-click scripts, 10 FPS, 33 s rollouts — see §5
🟡	Data preprocessing pipeline & toy data	LeRobot-style → GigaWorld format with Qwen3-VL captions + Depth Anything V2 — see §2; toy data: open-gigaai/Giga-World-1-Toydata
🟢	Tools	LoRA merge / checkpoint conversion, visualization, and offline latent utilities — see §2, §6
🟡	📊 WMBench benchmark	Partially open-sourced — 15 fine-grained metrics, leaderboard + VLM judging
🔴	Stage-2 distilled weights	Distilled Nano / Pro checkpoints — coming soon
🔴	RL post-training	3D RL post-training scripts for stronger 3D scene modeling — coming soon
🔴	Other-domain weights and training code	Additional domain checkpoints and corresponding training recipes — coming soon
🔴	Acceleration framework	Optimized distributed inference / training acceleration stack — coming soon

🌐 Release Channels

Channel	Purpose	Where
🐙 GitHub Releases	Tagged source snapshots with changelogs
🤗 Hugging Face Model	Giga-World-1 model weights
🤗 Hugging Face Dataset	Giga-World-1 toy data
🤗 Hugging Face Dataset (CVPR)	CVPR 2026 WorldModel Track dataset
🔷 ModelScope Model	ModelScope mirror for model weights
🔷 ModelScope Dataset	ModelScope mirror for toy data
📄 arXiv	Paper PDF, BibTeX
🌐 Project Page	Videos, leaderboard, demos	🌐 Project Page
📊 WMBench	Public benchmark leaderboard
🆘 Support	Issues and discussions

🛠️ Want a component to ship sooner? Please file an issue or open a PR.

1. 📦 Environment Setup

1.1 🖥️ Hardware & OS

Item	Requirement / Recommendation
Production setup	Single node with 8 × H20 or 8 × A100 GPUs
Inference	Supports both Nano (1.3B) and Pro (5B); consumer-grade GPUs can be used with memory-saving settings
Training	Production experiments are run on 8-GPU nodes; consumer-grade GPU training is possible with ZeRO, offloading, gradient checkpointing, and reduced batch / resolution settings
OS	Linux, verified on Ubuntu 20.04 / 22.04
CUDA	CUDA 12.x recommended, matching the local PyTorch installation

Note: We use a single-node 8× H20 or 8× A100 setup for production training. With appropriate memory optimization techniques, the released code can also run training and inference experiments on consumer-grade GPUs.

1.2 🧰 Install Dependencies

Install PyTorch first according to your CUDA version, then install the remaining dependencies with install.sh.

cd <PROJECT_ROOT>

# CUDA 12.6
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu126

# CUDA 12.8
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128

# CUDA 13.0
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu130

# Install dependencies
bash install.sh

install.sh installs packages from requirements.txt.

Main dependencies:

accelerate>=1.1.0        # accelerate launch / DDP / DeepSpeed
av>=12.0.0               # video read / write
decord>=0.6.0            # fast video decoding
diffusers>=0.35.0        # custom-modified version (see thirdparty/)
einops>=0.8.0
imageio>=2.36.0
imageio-ffmpeg>=0.5.1
numpy>=1.24,<3
omegaconf>=2.3.0
opencv-python>=4.9.0
packaging>=24.0
pandas>=2.1.0
peft>=0.12.0             # LoRA implementation
Pillow>=10.0.0
pyyaml>=6.0.0
safetensors>=0.4.5
torchdata>=0.8.0
tqdm>=4.66.0
transformers>=4.45.0
wandb>=0.18.0            # offline by default
xformers>=0.0.28.post3   # memory-efficient attention
deepspeed==0.19.2
ftfy==6.3.1

Optional: install FlashAttention 2 or 3

FlashAttention 2 or 3 can be used depending on your CUDA / PyTorch environment. Install one compatible version if you need FlashAttention acceleration.
# FlashAttention 2
pip install flash-attn --no-build-isolation

# Or build FlashAttention 3 manually
cd thirdparty/flash-attention-3
# Follow its README

2. 🗃️ Data Preparation

This release provides a small toy data package for verifying inference, data loading, visualization, and training workflows. The toy data is available from Hugging Face and ModelScope.

Use the one-click downloader:

bash tools/download_tool/download_giga_world.sh \
  --platform hf \
  --target toydata \
  --output-dir ./downloads

For ModelScope, replace --platform hf with --platform modelscope. See tools/download_tool/README.md for all options.

After downloading, place or symlink the toy data under example/:

mkdir -p example
cp -r ./downloads/Giga-World-1-Toydata/* ./example/

Expected project structure:

giga-world-release/
└── example/
    ├── toy_datapipeline_dataset/    # raw LeRobot-format toy dataset
    │   ├── gt/
    │   ├── depth/
    │   ├── plucker/
    │   ├── sketch/
    │   └── labels/
    └── toy_train_dataset/           # model training data
        ├── nano/
        │   ├── dataset_cache.pkl
        │   └── episode_*.pt
        └── pro/
            ├── dataset_cache.pkl
            └── episode_*.pt

toy_train_dataset/ is already in the format used by the training configs: stage_1_post_functrl_wan21.yaml, stage_1_post_functrl_wan22_5b.yaml, and stage_2_dmd_functrl_wan21.yaml.

For raw data visualization, run the web tool and open http://127.0.0.1:8090/ or http://127.0.0.1:8090/calib:

cd tools/data_vis_tools
python app.py --host 0.0.0.0 --port 8090

For offline latent pre-computation, use get_short-latents-giga-ctrl.py or get_short-latents-giga-ctrl-wan22-5b.py. The input data should contain helios_giga_ctrl.jsonl, videos/, and control_videos/; outputs are .pt samples containing precomputed vae_latent, control_latent, prompt_embed, and related metadata. See tools/offload_data/data_format.md for the data format.

3. 🧩 Model Preparation

Released model weights are available from:

Platform	Repository
🤗 Hugging Face	open-gigaai/Giga-World-1
🔷 ModelScope	GigaAI/Giga-World-1

Use the download helper:

Script: download_giga_world.sh
Full usage: tools/download_tool/README.md

Download model weights from Hugging Face:

bash tools/download_tool/download_giga_world.sh \
  --platform hf \
  --target model \
  --output-dir ./downloads

Download model weights from ModelScope:

bash tools/download_tool/download_giga_world.sh \
  --platform modelscope \
  --target model \
  --output-dir ./downloads

After downloading, place or symlink the model files under model/:

giga-world-release/
├── model/
│   ├── before_stage1/
│   │   ├── Wan2p1_1p3B-FunContro-GigaRobo-alpha-diffusers/
│   │   ├── Wan2p1_1p3B-FunControl-diffusers/
│   │   └── Wan2p2_5B-FunControl-diffusers/
│   ├── stage1/
│   │   ├── nano/
│   │   └── pro/
│   └── stage2_distill/          # coming soon
└── tools/
    └── download_tool/

If the downloader saves weights under ./downloads/Giga-World-1/, copy or symlink them into the repository:

mkdir -p model
cp -r ./downloads/Giga-World-1/* ./model/

4. 🚂 Training

flowchart LR
    %% Base Models
    A1["WAN 2.1<br/>1.3B FunControl"]
    A2["WAN 2.2<br/>5B FunControl"]

    %% Before Stage1
    B1["GigaRobo Alpha<br/>Diffusers"]
    B2["WAN2.1 1.3B<br/>Diffusers"]
    B3["WAN2.2 5B<br/>Diffusers"]

    %% Stage1
    C1["Nano"]
    C2["Pro"]

    %% Stage2
    D1["Nano Distill"]
    D2["Pro Distill"]

    %% Connections
    A1 -- "Pretrain on Giga Dataset<br/>Convert to Diffusers" --> B1
    A1 -- "Convert to Diffusers" --> B2

    B1 --> C1
    B2 --> C1
    C1 --> D1

    A2 -- "Convert to Diffusers" --> B3
    B3 --> C2
    C2 --> D2

    subgraph S0["Before Stage 1"]
        subgraph S0A["Interchangeable"]
            B1
            B2
        end
        B3
    end

    subgraph S1["Stage 1"]
        C1
        C2
    end

    subgraph S2["Stage 2"]
        D1
        D2
    end

    classDef base    fill:#4F8EF7,color:#fff,stroke:#2D5FD2,stroke-width:3px;
    classDef diff    fill:#7C4DFF,color:#fff,stroke:#5B2DCC,stroke-width:3px;
    classDef stage   fill:#00C853,color:#fff,stroke:#009624,stroke-width:3px;
    classDef distill fill:#FF6D00,color:#fff,stroke:#DD2C00,stroke-width:3px;

    style S0A fill:none,stroke:#FF0000,stroke-width:3px,stroke-dasharray:8 4,color:#FF0000;

    class A1,A2 base;
    class B1,B2,B3 diff;
    class C1,C2 stage;
    class D1,D2 distill;

Training entrypoints and launcher scripts are paired (each pair = one accelerate launch command + one YAML).

4.1 Stage-1 Training (Controllable Pre-training)

Entrypoint / Config	Launcher	Note
Trainer: train_gigaworld_functrl_uni_stage1.py		Unified trainer handling both Nano and Pro
Config: stage_1_post_functrl_wan21.yaml	train_deepspeed_stage1_functrl_wan21.sh	Nano (1.3B)
Config: stage_1_post_functrl_wan22_5b.yaml	train_deepspeed_stage1_functrl_wan22_5b.sh	Pro (5B)

Launch Nano:

bash scripts/training/stage1/train_deepspeed_stage1_functrl_wan21.sh

Launch Pro:

bash scripts/training/stage1/train_deepspeed_stage1_functrl_wan22_5b.sh

Default output layout:

output/
├── exp/
│   ├── Giga-world-Nano-Train-Stage-1/
│   └── Giga-world-Pro-Train-Stage-1/
└── logs/

4.2 Stage-2 DMD Training (Acceleration Distillation)

Entrypoint / Config	Launcher
Trainer: train_gigaworld_functrl_uni_stage2_dmd.py
Config: stage_2_dmd_functrl_wan21.yaml	train_deepspeed_stage2_functrl_wan21.sh
Config: stage_2_dmd_functrl_wan22_5b.yaml	train_deepspeed_stage2_functrl_wan22_5b.sh

DMD2 compresses the denoising loop from 20 steps to 4–6 steps (the Stage-2 config sets num_inference_steps: 6) and aligns with a frozen real score model via a critic_lora.

Launch Nano DMD:

bash scripts/training/stage2/train_deepspeed_stage2_functrl_wan21.sh

Example output:

output/exp/Giga-world-Nano-Train-DMD/

4.3 Training on Other Domains

To adapt GigaWorld-1 to a new domain, no code changes are required. In practice, you only need to construct training data in the same format as the provided GigaWorld data, then launch the existing Stage-1 training pipeline with the corresponding config.

For most new domains, training for a few thousand steps on 8 GPUs is typically sufficient and can usually finish within one day.

5. 🎬 Inference

Script	Mode	Model	Link
`run_infer_nano_i2v.sh`	i2v	Nano 1.3B	script
`run_infer_nano_t2v.sh`	t2v	Nano 1.3B	script
`run_infer_pro_i2v.sh`	i2v	Pro 5B	script
`run_infer_pro_t2v.sh`	t2v	Pro 5B	script

Usage:

# Nano i2v: first frame + control video + text prompt
bash scripts/infer/run_infer_nano_i2v.sh

# Pro t2v: text prompt only (omit --image_path → t2v mode)
bash scripts/infer/run_infer_pro_t2v.sh

The underlying entrypoint infer_giga_world.py exposes the following arguments:

Argument	Required	Default	Description
`--config`	✅	—	Training / inference YAML config (drives model type and hyperparams)
`--base_model_path`	✅	—	Base diffusers model directory (VAE / T5 / Transformer)
`--transformer_model_name_or_path`		None	Path to the merged transformer; falls back to `--base_model_path` if None
`--checkpoint_path`		None	Optional LoRA / partial checkpoint path
`--image_path`		None	First frame for i2v; omit to enter t2v mode
`--prompt`	✅	—	Text prompt
`--control_video_path`		None	Control video (Plücker / Ray Map), optional
`--output_dir`	✅	—	Output root directory
`--sample_name`		sample	Output video name prefix
`--seed`		42	Random seed
`--fps`		10	Output video FPS
`--num_frames`		99	Total frames (330 ≈ 33 s @ 10 FPS)
`--height`		480	Output height
`--width`		1920	Output width (typically 640×3 = 1920 for three views)
`--num_inference_steps`		20	20 steps for Stage-1; 4–6 for Stage-2 / DMD
`--guidance_scale`		5.0	Classifier-free guidance strength
`--enable_tiling`		False	VAE tiling for memory savings

Inference output example:

First Frame	Control Video	Generated Rollout
	🎬 control_video.mp4	🎬 i2v_sample.mp4

6. 🔄 Model Merge & Checkpoint Conversion

Use uni_merge_lora_for_giga_world_1.py to merge LoRA or partial checkpoints into a standalone transformer for deployment.

The tool supports both wan2.1 and wan2.2_5b, automatically resolves checkpoint files from the specified checkpoint directory, and saves the merged transformer to --save_dir.

python tools/ckpt_tools/uni_merge_lora_for_giga_world_1.py \
  --base_model <PATH_TO_BASE_NANO> \
  --save_dir   <PATH_TO_STAGE1_MERGED_NANO> \
  --ckpt_dir   /path/to/checkpoint-XXXX \
  --model_type wan2.1

For Pro 5B:

python tools/ckpt_tools/uni_merge_lora_for_giga_world_1.py \
  --base_model <PATH_TO_BASE_PRO> \
  --save_dir   <PATH_TO_STAGE1_MERGED_PRO> \
  --ckpt_dir   /path/to/checkpoint-XXXX \
  --model_type wan2.2_5b

7. 📁 Repository Layout

.
├── gigaworld/                         # Core model, pipeline, data loader, scheduler, and utils
│   ├── dataset/                       #   Stage-1 / Stage-2 / DMD data loaders
│   ├── modules/                       #   Transformer + custom Triton / Flash kernels
│   │   ├── gigaworld_kernels/         #     fp32_rmsnorm, tiled_linear, triton_norm, triton_rope
│   │   ├── transformer_gigaworld.py
│   │   └── transformer_functrl_gigaworld.py
│   ├── pipelines/                     #   i2v / t2v main pipelines
│   ├── scheduler/                     #   custom schedulers
│   ├── utils/                         #   TrainConfig, EMA, recycle batch, etc.
│   └── videoalign/                    #   reward / VLM training & inference
├── infer/
│   └── infer_giga_world.py            # Python inference entrypoint
├── scripts/
│   ├── accelerate_configs/            # DeepSpeed ZeRO-2 / ZeRO-3 configs
│   ├── infer/                         #   i2v / t2v × Nano / Pro one-click scripts
│   ├── training/
│   │   ├── configs/                   #   Stage-1 / Stage-2 YAMLs
│   │   ├── stage1/                    #   Nano / Pro Stage-1 launchers
│   │   └── stage2/                    #   Nano / Pro Stage-2 DMD launchers
├── tools/
│   ├── ckpt_tools/                    #   LoRA merge and checkpoint utilities
│   ├── datapipeline/                  #   LeRobot-style data preprocessing
│   ├── data_vis_tools/                #   Web URDF + camera calibration viewer
│   ├── download_tool/                 #   one-click HF / ModelScope downloader
│   ├── offload_data/                  #   offline latent pre-computation / format conversion
│   └── others/                        #   misc conversion tools
├── assets/                            #   README figures and demo media
│   ├── main_page.png                  #   main teaser image
│   ├── data_vis.gif                   #   data visualization demo
│   ├── input_image.png                #   inference first frame
│   ├── control_video.mp4              #   inference control video
│   └── i2v_sample.mp4                 #   generated rollout sample
├── example/
│   ├── toy_train_dataset/             #   Nano / Pro toy training datasets
│   └── toy_datapipeline_dataset/      #   toy preprocessing output (gt / depth / plucker / sketch)
├── model/
│   ├── before_stage1/                 #   Diffusers-converted base checkpoints
│   ├── stage1/                        #   Nano / Pro Stage-1 checkpoints
│   └── stage2_distill/                #   distilled checkpoints (coming soon)
├── train_gigaworld_functrl_uni_stage1.py
├── train_gigaworld_functrl_uni_stage2_dmd.py
├── requirements.txt
└── install.sh

🙏 Acknowledgements

GigaWorld-1 stands on the shoulders of a vibrant open-source ecosystem. We are deeply grateful to the following communities and projects that made this work possible:

🤗 Foundation Models & Architectures

Wan (Alibaba) — the wan2.1 and wan2.2_5b backbones that power GigaWorld-1 Nano and Pro
Diffusers — the modular diffusion framework we extend with custom attention processors and pipelines
Helios — a video generation model that achieves minute-scale, high-quality video synthesis
Genesis — a generative universal physics engine and robotics/embodied AI simulation platform
Hugging Face 🤗 — hosting, transformers, accelerate, and the entire model & dataset ecosystem
open-gigaai/Giga-World-1 and open-gigaai/Giga-World-1-Toydata — the public Hugging Face model and toy-data repositories

🛠️ Training & Acceleration

PyTorch & DeepSpeed — the foundation of our distributed training stack
Accelerate — the launcher that ties everything together
PEFT — the LoRA implementation behind lora_rank=128 and critic_lora
xFormers — memory-efficient attention
Flash-Attention — fast and memory-efficient exact attention (FA2 / FA3)
Triton — custom kernels for RMSNorm, RoPE, and tiled linears (gigaworld/modules/gigaworld_kernels/)

🗃️ Data & Annotation

Qwen3-VL — dense long-caption generation from cam_high videos
Depth Anything V2 — monocular depth estimation for all three camera views
LeRobot — the LeRobot-style raw data layout our preprocessing pipeline consumes
Open X-Embodiment & AgiBot — large-scale robot demonstration datasets

🦿 Robotics, Visualization & Tooling

Three.js — the WebGL renderer behind the URDF Viewer and camera-calibration tool
Three.js + URDFLoader — URDF/STL loading and forward kinematics
WandB — experiment tracking (offline-by-default in this repo)
Pandas, NumPy, Pillow, OpenCV — the daily workhorses

🎬 Predecessors & Inspiration

VideoCrafter, CogVideoX, Open-Sora, LTX-Video — for showing us what open-source video generation can look like
Wan-Video / Fun-1.1-1.3B-InP — base model artifacts
DMD / DMD2 — the diffusion distillation theory behind our Stage-2 acceleration

🌟 Community

A heartfelt thank you to everyone who has filed an issue, opened a PR, shared a workflow, or simply starred the project. Open source is a relay race — we are proud to hand the baton forward.

If you find GigaWorld-1 useful, please consider ⭐ starring the repo and citing the paper (see below).

📖 Citation

@article{gigaworld2025,
  title   = {GigaWorld-1: A Roadmap to World Models for Robot Policy Evaluation},
  author  = {{GigaAI}},
  journal = {arXiv preprint},
  year    = {2025},
  eprint  = {2511.19861},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}

_{Built with ❤️ by the GigaWorld Team, GigaAI · CVPR 2026}

_{Released under the Apache 2.0 License.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GigaWorld-1: A Roadmap to World Models for Robot Policy Evaluation

📑 Table of Contents

📰 Latest Updates

📊 Open-Source Progress

🌐 Release Channels

1. 📦 Environment Setup

1.1 🖥️ Hardware & OS

1.2 🧰 Install Dependencies

2. 🗃️ Data Preparation

3. 🧩 Model Preparation

4. 🚂 Training

4.1 Stage-1 Training (Controllable Pre-training)

4.2 Stage-2 DMD Training (Acceleration Distillation)

4.3 Training on Other Domains

5. 🎬 Inference

6. 🔄 Model Merge & Checkpoint Conversion

7. 📁 Repository Layout

🙏 Acknowledgements

🤗 Foundation Models & Architectures

🛠️ Training & Acceleration

🗃️ Data & Annotation

🦿 Robotics, Visualization & Tooling

🎬 Predecessors & Inspiration

🌟 Community

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
assets		assets
example		example
gigaworld		gigaworld
infer		infer
scripts		scripts
thirdparty		thirdparty
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt
train_gigaworld_functrl_uni_stage1.py		train_gigaworld_functrl_uni_stage1.py
train_gigaworld_functrl_uni_stage2_dmd.py		train_gigaworld_functrl_uni_stage2_dmd.py

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

GigaWorld-1: A Roadmap to World Models for Robot Policy Evaluation

📑 Table of Contents

📰 Latest Updates

📊 Open-Source Progress

🌐 Release Channels

1. 📦 Environment Setup

1.1 🖥️ Hardware & OS

1.2 🧰 Install Dependencies

2. 🗃️ Data Preparation

3. 🧩 Model Preparation

4. 🚂 Training

4.1 Stage-1 Training (Controllable Pre-training)

4.2 Stage-2 DMD Training (Acceleration Distillation)

4.3 Training on Other Domains

5. 🎬 Inference

6. 🔄 Model Merge & Checkpoint Conversion

7. 📁 Repository Layout

🙏 Acknowledgements

🤗 Foundation Models & Architectures

🛠️ Training & Acceleration

🗃️ Data & Annotation

🦿 Robotics, Visualization & Tooling

🎬 Predecessors & Inspiration

🌟 Community

📖 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages