🚀 Open-source training, inference, data processing, checkpoint conversion, and LoRA merge workflows for robot world models.
| Date | Update |
|---|---|
| 📄 2026-07 | We uploaded the PDF version of the technical report, arXiv coming soon. |
| 🧑💻 2026-07 | Partial training, inference, data processing, and model utility code was open-sourced. |
| 📦 2026-07 | Partial model weights, toy data, and download tools were released. |
| 📖 2026-07 | The GigaWorld-1 technical report was released. |
| 🏆 2026-04 | We released the CVPR 2026 World Model Track leaderboard. See CVPR-2026-WorldModel-Track-LeaderBoard. |
| 🏆 2026-03 | We hosted the CVPR 2026 World Model Challenge. See CVPR-2026-Workshop-WM-Track. |
💡 Subscribe to releases — click Watch ▾ → Custom → Releases on the GitHub repo to be notified when new weights, datasets, or the WMBench benchmark drop.
🟢 Released · 🟡 Beta · 🔴 Coming Soon — last updated 2026-07
| Status | Component | Description |
|---|---|---|
| 🟢 | Stage-1 weights (Nano / Pro) | Released on open-gigaai/Giga-World-1 and ModelScope |
| 🟢 | Training code | Stage-1: train_gigaworld_functrl_uni_stage1.py for Nano (1.3B) and Pro (5B), DeepSpeed ZeRO-2/3 ready — see §4.1; Stage-2: train_gigaworld_functrl_uni_stage2_dmd.py for DMD2 distillation (4–6 steps) — see §4.2 |
| 🟢 | Inference code (i2v / t2v) | Nano + Pro one-click scripts, 10 FPS, 33 s rollouts — see §5 |
| 🟡 | Data preprocessing pipeline & toy data | LeRobot-style → GigaWorld format with Qwen3-VL captions + Depth Anything V2 — see §2; toy data: open-gigaai/Giga-World-1-Toydata |
| 🟢 | Tools | LoRA merge / checkpoint conversion, visualization, and offline latent utilities — see §2, §6 |
| 🟡 | 📊 WMBench benchmark | Partially open-sourced — 15 fine-grained metrics, leaderboard + VLM judging |
| 🔴 | Stage-2 distilled weights | Distilled Nano / Pro checkpoints — coming soon |
| 🔴 | RL post-training | 3D RL post-training scripts for stronger 3D scene modeling — coming soon |
| 🔴 | Other-domain weights and training code | Additional domain checkpoints and corresponding training recipes — coming soon |
| 🔴 | Acceleration framework | Optimized distributed inference / training acceleration stack — coming soon |
| Channel | Purpose | Where |
|---|---|---|
| 🐙 GitHub Releases | Tagged source snapshots with changelogs | |
| 🤗 Hugging Face Model | Giga-World-1 model weights | |
| 🤗 Hugging Face Dataset | Giga-World-1 toy data | |
| 🤗 Hugging Face Dataset (CVPR) | CVPR 2026 WorldModel Track dataset | |
| 🔷 ModelScope Model | ModelScope mirror for model weights | |
| 🔷 ModelScope Dataset | ModelScope mirror for toy data | |
| 📄 arXiv | Paper PDF, BibTeX | |
| 🌐 Project Page | Videos, leaderboard, demos | 🌐 Project Page |
| 📊 WMBench | Public benchmark leaderboard | |
| 🆘 Support | Issues and discussions |
🛠️ Want a component to ship sooner? Please file an issue or open a PR.
| Item | Requirement / Recommendation |
|---|---|
| Production setup | Single node with 8 × H20 or 8 × A100 GPUs |
| Inference | Supports both Nano (1.3B) and Pro (5B); consumer-grade GPUs can be used with memory-saving settings |
| Training | Production experiments are run on 8-GPU nodes; consumer-grade GPU training is possible with ZeRO, offloading, gradient checkpointing, and reduced batch / resolution settings |
| OS | Linux, verified on Ubuntu 20.04 / 22.04 |
| CUDA | CUDA 12.x recommended, matching the local PyTorch installation |
Note: We use a single-node 8× H20 or 8× A100 setup for production training. With appropriate memory optimization techniques, the released code can also run training and inference experiments on consumer-grade GPUs.
Install PyTorch first according to your CUDA version, then install the remaining dependencies with install.sh.
cd <PROJECT_ROOT>
# CUDA 12.6
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu126
# CUDA 12.8
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128
# CUDA 13.0
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu130
# Install dependencies
bash install.shinstall.sh installs packages from requirements.txt.
Main dependencies:
accelerate>=1.1.0 # accelerate launch / DDP / DeepSpeed
av>=12.0.0 # video read / write
decord>=0.6.0 # fast video decoding
diffusers>=0.35.0 # custom-modified version (see thirdparty/)
einops>=0.8.0
imageio>=2.36.0
imageio-ffmpeg>=0.5.1
numpy>=1.24,<3
omegaconf>=2.3.0
opencv-python>=4.9.0
packaging>=24.0
pandas>=2.1.0
peft>=0.12.0 # LoRA implementation
Pillow>=10.0.0
pyyaml>=6.0.0
safetensors>=0.4.5
torchdata>=0.8.0
tqdm>=4.66.0
transformers>=4.45.0
wandb>=0.18.0 # offline by default
xformers>=0.0.28.post3 # memory-efficient attention
deepspeed==0.19.2
ftfy==6.3.1
Optional: install FlashAttention 2 or 3
FlashAttention 2 or 3 can be used depending on your CUDA / PyTorch environment. Install one compatible version if you need FlashAttention acceleration.
# FlashAttention 2 pip install flash-attn --no-build-isolation # Or build FlashAttention 3 manually cd thirdparty/flash-attention-3 # Follow its README
This release provides a small toy data package for verifying inference, data loading, visualization, and training workflows. The toy data is available from Hugging Face and ModelScope.
Use the one-click downloader:
bash tools/download_tool/download_giga_world.sh \
--platform hf \
--target toydata \
--output-dir ./downloadsFor ModelScope, replace --platform hf with --platform modelscope. See tools/download_tool/README.md for all options.
After downloading, place or symlink the toy data under example/:
mkdir -p example
cp -r ./downloads/Giga-World-1-Toydata/* ./example/Expected project structure:
giga-world-release/
└── example/
├── toy_datapipeline_dataset/ # raw LeRobot-format toy dataset
│ ├── gt/
│ ├── depth/
│ ├── plucker/
│ ├── sketch/
│ └── labels/
└── toy_train_dataset/ # model training data
├── nano/
│ ├── dataset_cache.pkl
│ └── episode_*.pt
└── pro/
├── dataset_cache.pkl
└── episode_*.pt
toy_train_dataset/ is already in the format used by the training configs: stage_1_post_functrl_wan21.yaml, stage_1_post_functrl_wan22_5b.yaml, and stage_2_dmd_functrl_wan21.yaml.
For raw data visualization, run the web tool and open http://127.0.0.1:8090/ or http://127.0.0.1:8090/calib:
cd tools/data_vis_tools
python app.py --host 0.0.0.0 --port 8090For offline latent pre-computation, use get_short-latents-giga-ctrl.py or get_short-latents-giga-ctrl-wan22-5b.py. The input data should contain helios_giga_ctrl.jsonl, videos/, and control_videos/; outputs are .pt samples containing precomputed vae_latent, control_latent, prompt_embed, and related metadata. See tools/offload_data/data_format.md for the data format.
Released model weights are available from:
| Platform | Repository |
|---|---|
| 🤗 Hugging Face | open-gigaai/Giga-World-1 |
| 🔷 ModelScope | GigaAI/Giga-World-1 |
Use the download helper:
- Script: download_giga_world.sh
- Full usage: tools/download_tool/README.md
Download model weights from Hugging Face:
bash tools/download_tool/download_giga_world.sh \
--platform hf \
--target model \
--output-dir ./downloadsDownload model weights from ModelScope:
bash tools/download_tool/download_giga_world.sh \
--platform modelscope \
--target model \
--output-dir ./downloadsAfter downloading, place or symlink the model files under model/:
giga-world-release/
├── model/
│ ├── before_stage1/
│ │ ├── Wan2p1_1p3B-FunContro-GigaRobo-alpha-diffusers/
│ │ ├── Wan2p1_1p3B-FunControl-diffusers/
│ │ └── Wan2p2_5B-FunControl-diffusers/
│ ├── stage1/
│ │ ├── nano/
│ │ └── pro/
│ └── stage2_distill/ # coming soon
└── tools/
└── download_tool/
If the downloader saves weights under ./downloads/Giga-World-1/, copy or symlink them into the repository:
mkdir -p model
cp -r ./downloads/Giga-World-1/* ./model/flowchart LR
%% Base Models
A1["WAN 2.1<br/>1.3B FunControl"]
A2["WAN 2.2<br/>5B FunControl"]
%% Before Stage1
B1["GigaRobo Alpha<br/>Diffusers"]
B2["WAN2.1 1.3B<br/>Diffusers"]
B3["WAN2.2 5B<br/>Diffusers"]
%% Stage1
C1["Nano"]
C2["Pro"]
%% Stage2
D1["Nano Distill"]
D2["Pro Distill"]
%% Connections
A1 -- "Pretrain on Giga Dataset<br/>Convert to Diffusers" --> B1
A1 -- "Convert to Diffusers" --> B2
B1 --> C1
B2 --> C1
C1 --> D1
A2 -- "Convert to Diffusers" --> B3
B3 --> C2
C2 --> D2
subgraph S0["Before Stage 1"]
subgraph S0A["Interchangeable"]
B1
B2
end
B3
end
subgraph S1["Stage 1"]
C1
C2
end
subgraph S2["Stage 2"]
D1
D2
end
classDef base fill:#4F8EF7,color:#fff,stroke:#2D5FD2,stroke-width:3px;
classDef diff fill:#7C4DFF,color:#fff,stroke:#5B2DCC,stroke-width:3px;
classDef stage fill:#00C853,color:#fff,stroke:#009624,stroke-width:3px;
classDef distill fill:#FF6D00,color:#fff,stroke:#DD2C00,stroke-width:3px;
style S0A fill:none,stroke:#FF0000,stroke-width:3px,stroke-dasharray:8 4,color:#FF0000;
class A1,A2 base;
class B1,B2,B3 diff;
class C1,C2 stage;
class D1,D2 distill;
Training entrypoints and launcher scripts are paired (each pair = one accelerate launch command + one YAML).
| Entrypoint / Config | Launcher | Note |
|---|---|---|
| Trainer: train_gigaworld_functrl_uni_stage1.py | Unified trainer handling both Nano and Pro | |
| Config: stage_1_post_functrl_wan21.yaml | train_deepspeed_stage1_functrl_wan21.sh | Nano (1.3B) |
| Config: stage_1_post_functrl_wan22_5b.yaml | train_deepspeed_stage1_functrl_wan22_5b.sh | Pro (5B) |
Launch Nano:
bash scripts/training/stage1/train_deepspeed_stage1_functrl_wan21.shLaunch Pro:
bash scripts/training/stage1/train_deepspeed_stage1_functrl_wan22_5b.shDefault output layout:
output/
├── exp/
│ ├── Giga-world-Nano-Train-Stage-1/
│ └── Giga-world-Pro-Train-Stage-1/
└── logs/
| Entrypoint / Config | Launcher |
|---|---|
| Trainer: train_gigaworld_functrl_uni_stage2_dmd.py | |
| Config: stage_2_dmd_functrl_wan21.yaml | train_deepspeed_stage2_functrl_wan21.sh |
| Config: stage_2_dmd_functrl_wan22_5b.yaml | train_deepspeed_stage2_functrl_wan22_5b.sh |
DMD2 compresses the denoising loop from 20 steps to 4–6 steps (the Stage-2 config sets num_inference_steps: 6) and aligns with a frozen real score model via a critic_lora.
Launch Nano DMD:
bash scripts/training/stage2/train_deepspeed_stage2_functrl_wan21.shExample output:
output/exp/Giga-world-Nano-Train-DMD/
To adapt GigaWorld-1 to a new domain, no code changes are required. In practice, you only need to construct training data in the same format as the provided GigaWorld data, then launch the existing Stage-1 training pipeline with the corresponding config.
For most new domains, training for a few thousand steps on 8 GPUs is typically sufficient and can usually finish within one day.
| Script | Mode | Model | Link |
|---|---|---|---|
run_infer_nano_i2v.sh |
i2v | Nano 1.3B | script |
run_infer_nano_t2v.sh |
t2v | Nano 1.3B | script |
run_infer_pro_i2v.sh |
i2v | Pro 5B | script |
run_infer_pro_t2v.sh |
t2v | Pro 5B | script |
Usage:
# Nano i2v: first frame + control video + text prompt
bash scripts/infer/run_infer_nano_i2v.sh
# Pro t2v: text prompt only (omit --image_path → t2v mode)
bash scripts/infer/run_infer_pro_t2v.shThe underlying entrypoint infer_giga_world.py exposes the following arguments:
| Argument | Required | Default | Description |
|---|---|---|---|
--config |
✅ | — | Training / inference YAML config (drives model type and hyperparams) |
--base_model_path |
✅ | — | Base diffusers model directory (VAE / T5 / Transformer) |
--transformer_model_name_or_path |
None | Path to the merged transformer; falls back to --base_model_path if None |
|
--checkpoint_path |
None | Optional LoRA / partial checkpoint path | |
--image_path |
None | First frame for i2v; omit to enter t2v mode | |
--prompt |
✅ | — | Text prompt |
--control_video_path |
None | Control video (Plücker / Ray Map), optional | |
--output_dir |
✅ | — | Output root directory |
--sample_name |
sample | Output video name prefix | |
--seed |
42 | Random seed | |
--fps |
10 | Output video FPS | |
--num_frames |
99 | Total frames (330 ≈ 33 s @ 10 FPS) | |
--height |
480 | Output height | |
--width |
1920 | Output width (typically 640×3 = 1920 for three views) | |
--num_inference_steps |
20 | 20 steps for Stage-1; 4–6 for Stage-2 / DMD | |
--guidance_scale |
5.0 | Classifier-free guidance strength | |
--enable_tiling |
False | VAE tiling for memory savings |
Inference output example:
| First Frame | Control Video | Generated Rollout |
|---|---|---|
![]() |
🎬 control_video.mp4 | 🎬 i2v_sample.mp4 |
Use uni_merge_lora_for_giga_world_1.py to merge LoRA or partial checkpoints into a standalone transformer for deployment.
The tool supports both wan2.1 and wan2.2_5b, automatically resolves checkpoint files from the specified checkpoint directory, and saves the merged transformer to --save_dir.
python tools/ckpt_tools/uni_merge_lora_for_giga_world_1.py \
--base_model <PATH_TO_BASE_NANO> \
--save_dir <PATH_TO_STAGE1_MERGED_NANO> \
--ckpt_dir /path/to/checkpoint-XXXX \
--model_type wan2.1For Pro 5B:
python tools/ckpt_tools/uni_merge_lora_for_giga_world_1.py \
--base_model <PATH_TO_BASE_PRO> \
--save_dir <PATH_TO_STAGE1_MERGED_PRO> \
--ckpt_dir /path/to/checkpoint-XXXX \
--model_type wan2.2_5b.
├── gigaworld/ # Core model, pipeline, data loader, scheduler, and utils
│ ├── dataset/ # Stage-1 / Stage-2 / DMD data loaders
│ ├── modules/ # Transformer + custom Triton / Flash kernels
│ │ ├── gigaworld_kernels/ # fp32_rmsnorm, tiled_linear, triton_norm, triton_rope
│ │ ├── transformer_gigaworld.py
│ │ └── transformer_functrl_gigaworld.py
│ ├── pipelines/ # i2v / t2v main pipelines
│ ├── scheduler/ # custom schedulers
│ ├── utils/ # TrainConfig, EMA, recycle batch, etc.
│ └── videoalign/ # reward / VLM training & inference
├── infer/
│ └── infer_giga_world.py # Python inference entrypoint
├── scripts/
│ ├── accelerate_configs/ # DeepSpeed ZeRO-2 / ZeRO-3 configs
│ ├── infer/ # i2v / t2v × Nano / Pro one-click scripts
│ ├── training/
│ │ ├── configs/ # Stage-1 / Stage-2 YAMLs
│ │ ├── stage1/ # Nano / Pro Stage-1 launchers
│ │ └── stage2/ # Nano / Pro Stage-2 DMD launchers
├── tools/
│ ├── ckpt_tools/ # LoRA merge and checkpoint utilities
│ ├── datapipeline/ # LeRobot-style data preprocessing
│ ├── data_vis_tools/ # Web URDF + camera calibration viewer
│ ├── download_tool/ # one-click HF / ModelScope downloader
│ ├── offload_data/ # offline latent pre-computation / format conversion
│ └── others/ # misc conversion tools
├── assets/ # README figures and demo media
│ ├── main_page.png # main teaser image
│ ├── data_vis.gif # data visualization demo
│ ├── input_image.png # inference first frame
│ ├── control_video.mp4 # inference control video
│ └── i2v_sample.mp4 # generated rollout sample
├── example/
│ ├── toy_train_dataset/ # Nano / Pro toy training datasets
│ └── toy_datapipeline_dataset/ # toy preprocessing output (gt / depth / plucker / sketch)
├── model/
│ ├── before_stage1/ # Diffusers-converted base checkpoints
│ ├── stage1/ # Nano / Pro Stage-1 checkpoints
│ └── stage2_distill/ # distilled checkpoints (coming soon)
├── train_gigaworld_functrl_uni_stage1.py
├── train_gigaworld_functrl_uni_stage2_dmd.py
├── requirements.txt
└── install.sh
GigaWorld-1 stands on the shoulders of a vibrant open-source ecosystem. We are deeply grateful to the following communities and projects that made this work possible:
- Wan (Alibaba) — the
wan2.1andwan2.2_5bbackbones that power GigaWorld-1 Nano and Pro - Diffusers — the modular diffusion framework we extend with custom attention processors and pipelines
- Helios — a video generation model that achieves minute-scale, high-quality video synthesis
- Genesis — a generative universal physics engine and robotics/embodied AI simulation platform
- Hugging Face 🤗 — hosting,
transformers,accelerate, and the entire model & dataset ecosystem - open-gigaai/Giga-World-1 and open-gigaai/Giga-World-1-Toydata — the public Hugging Face model and toy-data repositories
- PyTorch & DeepSpeed — the foundation of our distributed training stack
- Accelerate — the launcher that ties everything together
- PEFT — the LoRA implementation behind
lora_rank=128andcritic_lora - xFormers — memory-efficient attention
- Flash-Attention — fast and memory-efficient exact attention (FA2 / FA3)
- Triton — custom kernels for RMSNorm, RoPE, and tiled linears (
gigaworld/modules/gigaworld_kernels/)
- Qwen3-VL — dense long-caption generation from
cam_highvideos - Depth Anything V2 — monocular depth estimation for all three camera views
- LeRobot — the LeRobot-style raw data layout our preprocessing pipeline consumes
- Open X-Embodiment & AgiBot — large-scale robot demonstration datasets
- Three.js — the WebGL renderer behind the URDF Viewer and camera-calibration tool
- Three.js + URDFLoader — URDF/STL loading and forward kinematics
- WandB — experiment tracking (offline-by-default in this repo)
- Pandas, NumPy, Pillow, OpenCV — the daily workhorses
- VideoCrafter, CogVideoX, Open-Sora, LTX-Video — for showing us what open-source video generation can look like
- Wan-Video / Fun-1.1-1.3B-InP — base model artifacts
- DMD / DMD2 — the diffusion distillation theory behind our Stage-2 acceleration
A heartfelt thank you to everyone who has filed an issue, opened a PR, shared a workflow, or simply starred the project. Open source is a relay race — we are proud to hand the baton forward.
If you find GigaWorld-1 useful, please consider ⭐ starring the repo and citing the paper (see below).
@article{gigaworld2025,
title = {GigaWorld-1: A Roadmap to World Models for Robot Policy Evaluation},
author = {{GigaAI}},
journal = {arXiv preprint},
year = {2025},
eprint = {2511.19861},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}Built with ❤️ by the GigaWorld Team, GigaAI · CVPR 2026
Released under the Apache 2.0 License.

