Skip to content

open-gigaai/giga-world-1

Repository files navigation

GigaWorld-1 Teaser

GigaWorld-1: A Roadmap to World Models for Robot Policy Evaluation

🚀 Open-source training, inference, data processing, checkpoint conversion, and LoRA merge workflows for robot world models.

arXiv Project Page HuggingFace Model HuggingFace Dataset HuggingFace Dataset (CVPR) ModelScope Model ModelScope Dataset WMBench

Python PyTorch Diffusers License Platform


📑 Table of Contents


📰 Latest Updates

Date Update
📄 2026-07 We uploaded the PDF version of the technical report, arXiv coming soon.
🧑‍💻 2026-07 Partial training, inference, data processing, and model utility code was open-sourced.
📦 2026-07 Partial model weights, toy data, and download tools were released.
📖 2026-07 The GigaWorld-1 technical report was released.
🏆 2026-04 We released the CVPR 2026 World Model Track leaderboard. See CVPR-2026-WorldModel-Track-LeaderBoard.
🏆 2026-03 We hosted the CVPR 2026 World Model Challenge. See CVPR-2026-Workshop-WM-Track.

💡 Subscribe to releases — click Watch ▾ → Custom → Releases on the GitHub repo to be notified when new weights, datasets, or the WMBench benchmark drop.


📊 Open-Source Progress

🟢 Released · 🟡 Beta · 🔴 Coming Soon — last updated 2026-07

Status Component Description
🟢 Stage-1 weights (Nano / Pro) Released on open-gigaai/Giga-World-1 and ModelScope
🟢 Training code Stage-1: train_gigaworld_functrl_uni_stage1.py for Nano (1.3B) and Pro (5B), DeepSpeed ZeRO-2/3 ready — see §4.1; Stage-2: train_gigaworld_functrl_uni_stage2_dmd.py for DMD2 distillation (4–6 steps) — see §4.2
🟢 Inference code (i2v / t2v) Nano + Pro one-click scripts, 10 FPS, 33 s rollouts — see §5
🟡 Data preprocessing pipeline & toy data LeRobot-style → GigaWorld format with Qwen3-VL captions + Depth Anything V2 — see §2; toy data: open-gigaai/Giga-World-1-Toydata
🟢 Tools LoRA merge / checkpoint conversion, visualization, and offline latent utilities — see §2, §6
🟡 📊 WMBench benchmark Partially open-sourced — 15 fine-grained metrics, leaderboard + VLM judging
🔴 Stage-2 distilled weights Distilled Nano / Pro checkpoints — coming soon
🔴 RL post-training 3D RL post-training scripts for stronger 3D scene modeling — coming soon
🔴 Other-domain weights and training code Additional domain checkpoints and corresponding training recipes — coming soon
🔴 Acceleration framework Optimized distributed inference / training acceleration stack — coming soon

🌐 Release Channels

Channel Purpose Where
🐙 GitHub Releases Tagged source snapshots with changelogs GitHub Repository
🤗 Hugging Face Model Giga-World-1 model weights Hugging Face Model
🤗 Hugging Face Dataset Giga-World-1 toy data Hugging Face Dataset
🤗 Hugging Face Dataset (CVPR) CVPR 2026 WorldModel Track dataset Hugging Face Dataset CVPR
🔷 ModelScope Model ModelScope mirror for model weights ModelScope Model
🔷 ModelScope Dataset ModelScope mirror for toy data ModelScope Dataset
📄 arXiv Paper PDF, BibTeX arXiv Paper
🌐 Project Page Videos, leaderboard, demos 🌐 Project Page
📊 WMBench Public benchmark leaderboard WMBench Leaderboard
🆘 Support Issues and discussions GitHub Issues

🛠️ Want a component to ship sooner? Please file an issue or open a PR.


1. 📦 Environment Setup

1.1 🖥️ Hardware & OS

Item Requirement / Recommendation
Production setup Single node with 8 × H20 or 8 × A100 GPUs
Inference Supports both Nano (1.3B) and Pro (5B); consumer-grade GPUs can be used with memory-saving settings
Training Production experiments are run on 8-GPU nodes; consumer-grade GPU training is possible with ZeRO, offloading, gradient checkpointing, and reduced batch / resolution settings
OS Linux, verified on Ubuntu 20.04 / 22.04
CUDA CUDA 12.x recommended, matching the local PyTorch installation

Note: We use a single-node 8× H20 or 8× A100 setup for production training. With appropriate memory optimization techniques, the released code can also run training and inference experiments on consumer-grade GPUs.

1.2 🧰 Install Dependencies

Install PyTorch first according to your CUDA version, then install the remaining dependencies with install.sh.

cd <PROJECT_ROOT>

# CUDA 12.6
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu126

# CUDA 12.8
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu128

# CUDA 13.0
pip install torch==2.10.0 torchvision==0.25.0 torchaudio==2.10.0 --index-url https://download.pytorch.org/whl/cu130

# Install dependencies
bash install.sh

install.sh installs packages from requirements.txt.

Main dependencies:

accelerate>=1.1.0        # accelerate launch / DDP / DeepSpeed
av>=12.0.0               # video read / write
decord>=0.6.0            # fast video decoding
diffusers>=0.35.0        # custom-modified version (see thirdparty/)
einops>=0.8.0
imageio>=2.36.0
imageio-ffmpeg>=0.5.1
numpy>=1.24,<3
omegaconf>=2.3.0
opencv-python>=4.9.0
packaging>=24.0
pandas>=2.1.0
peft>=0.12.0             # LoRA implementation
Pillow>=10.0.0
pyyaml>=6.0.0
safetensors>=0.4.5
torchdata>=0.8.0
tqdm>=4.66.0
transformers>=4.45.0
wandb>=0.18.0            # offline by default
xformers>=0.0.28.post3   # memory-efficient attention
deepspeed==0.19.2
ftfy==6.3.1

Optional: install FlashAttention 2 or 3

FlashAttention 2 or 3 can be used depending on your CUDA / PyTorch environment. Install one compatible version if you need FlashAttention acceleration.

# FlashAttention 2
pip install flash-attn --no-build-isolation

# Or build FlashAttention 3 manually
cd thirdparty/flash-attention-3
# Follow its README

2. 🗃️ Data Preparation

This release provides a small toy data package for verifying inference, data loading, visualization, and training workflows. The toy data is available from Hugging Face and ModelScope.

Use the one-click downloader:

bash tools/download_tool/download_giga_world.sh \
  --platform hf \
  --target toydata \
  --output-dir ./downloads

For ModelScope, replace --platform hf with --platform modelscope. See tools/download_tool/README.md for all options.

After downloading, place or symlink the toy data under example/:

mkdir -p example
cp -r ./downloads/Giga-World-1-Toydata/* ./example/

Expected project structure:

giga-world-release/
└── example/
    ├── toy_datapipeline_dataset/    # raw LeRobot-format toy dataset
    │   ├── gt/
    │   ├── depth/
    │   ├── plucker/
    │   ├── sketch/
    │   └── labels/
    └── toy_train_dataset/           # model training data
        ├── nano/
        │   ├── dataset_cache.pkl
        │   └── episode_*.pt
        └── pro/
            ├── dataset_cache.pkl
            └── episode_*.pt

toy_train_dataset/ is already in the format used by the training configs: stage_1_post_functrl_wan21.yaml, stage_1_post_functrl_wan22_5b.yaml, and stage_2_dmd_functrl_wan21.yaml.

For raw data visualization, run the web tool and open http://127.0.0.1:8090/ or http://127.0.0.1:8090/calib:

cd tools/data_vis_tools
python app.py --host 0.0.0.0 --port 8090

Raw data visualization demo

For offline latent pre-computation, use get_short-latents-giga-ctrl.py or get_short-latents-giga-ctrl-wan22-5b.py. The input data should contain helios_giga_ctrl.jsonl, videos/, and control_videos/; outputs are .pt samples containing precomputed vae_latent, control_latent, prompt_embed, and related metadata. See tools/offload_data/data_format.md for the data format.

3. 🧩 Model Preparation

Released model weights are available from:

Platform Repository
🤗 Hugging Face open-gigaai/Giga-World-1
🔷 ModelScope GigaAI/Giga-World-1

Use the download helper:

Download model weights from Hugging Face:

bash tools/download_tool/download_giga_world.sh \
  --platform hf \
  --target model \
  --output-dir ./downloads

Download model weights from ModelScope:

bash tools/download_tool/download_giga_world.sh \
  --platform modelscope \
  --target model \
  --output-dir ./downloads

After downloading, place or symlink the model files under model/:

giga-world-release/
├── model/
│   ├── before_stage1/
│   │   ├── Wan2p1_1p3B-FunContro-GigaRobo-alpha-diffusers/
│   │   ├── Wan2p1_1p3B-FunControl-diffusers/
│   │   └── Wan2p2_5B-FunControl-diffusers/
│   ├── stage1/
│   │   ├── nano/
│   │   └── pro/
│   └── stage2_distill/          # coming soon
└── tools/
    └── download_tool/

If the downloader saves weights under ./downloads/Giga-World-1/, copy or symlink them into the repository:

mkdir -p model
cp -r ./downloads/Giga-World-1/* ./model/

4. 🚂 Training

flowchart LR
    %% Base Models
    A1["WAN 2.1<br/>1.3B FunControl"]
    A2["WAN 2.2<br/>5B FunControl"]

    %% Before Stage1
    B1["GigaRobo Alpha<br/>Diffusers"]
    B2["WAN2.1 1.3B<br/>Diffusers"]
    B3["WAN2.2 5B<br/>Diffusers"]

    %% Stage1
    C1["Nano"]
    C2["Pro"]

    %% Stage2
    D1["Nano Distill"]
    D2["Pro Distill"]

    %% Connections
    A1 -- "Pretrain on Giga Dataset<br/>Convert to Diffusers" --> B1
    A1 -- "Convert to Diffusers" --> B2

    B1 --> C1
    B2 --> C1
    C1 --> D1

    A2 -- "Convert to Diffusers" --> B3
    B3 --> C2
    C2 --> D2

    subgraph S0["Before Stage 1"]
        subgraph S0A["Interchangeable"]
            B1
            B2
        end
        B3
    end

    subgraph S1["Stage 1"]
        C1
        C2
    end

    subgraph S2["Stage 2"]
        D1
        D2
    end

    classDef base    fill:#4F8EF7,color:#fff,stroke:#2D5FD2,stroke-width:3px;
    classDef diff    fill:#7C4DFF,color:#fff,stroke:#5B2DCC,stroke-width:3px;
    classDef stage   fill:#00C853,color:#fff,stroke:#009624,stroke-width:3px;
    classDef distill fill:#FF6D00,color:#fff,stroke:#DD2C00,stroke-width:3px;

    style S0A fill:none,stroke:#FF0000,stroke-width:3px,stroke-dasharray:8 4,color:#FF0000;

    class A1,A2 base;
    class B1,B2,B3 diff;
    class C1,C2 stage;
    class D1,D2 distill;
Loading

Training entrypoints and launcher scripts are paired (each pair = one accelerate launch command + one YAML).

4.1 Stage-1 Training (Controllable Pre-training)

Entrypoint / Config Launcher Note
Trainer: train_gigaworld_functrl_uni_stage1.py Unified trainer handling both Nano and Pro
Config: stage_1_post_functrl_wan21.yaml train_deepspeed_stage1_functrl_wan21.sh Nano (1.3B)
Config: stage_1_post_functrl_wan22_5b.yaml train_deepspeed_stage1_functrl_wan22_5b.sh Pro (5B)

Launch Nano:

bash scripts/training/stage1/train_deepspeed_stage1_functrl_wan21.sh

Launch Pro:

bash scripts/training/stage1/train_deepspeed_stage1_functrl_wan22_5b.sh

Default output layout:

output/
├── exp/
│   ├── Giga-world-Nano-Train-Stage-1/
│   └── Giga-world-Pro-Train-Stage-1/
└── logs/

4.2 Stage-2 DMD Training (Acceleration Distillation)

Entrypoint / Config Launcher
Trainer: train_gigaworld_functrl_uni_stage2_dmd.py
Config: stage_2_dmd_functrl_wan21.yaml train_deepspeed_stage2_functrl_wan21.sh
Config: stage_2_dmd_functrl_wan22_5b.yaml train_deepspeed_stage2_functrl_wan22_5b.sh

DMD2 compresses the denoising loop from 20 steps to 4–6 steps (the Stage-2 config sets num_inference_steps: 6) and aligns with a frozen real score model via a critic_lora.

Launch Nano DMD:

bash scripts/training/stage2/train_deepspeed_stage2_functrl_wan21.sh

Example output:

output/exp/Giga-world-Nano-Train-DMD/

4.3 Training on Other Domains

To adapt GigaWorld-1 to a new domain, no code changes are required. In practice, you only need to construct training data in the same format as the provided GigaWorld data, then launch the existing Stage-1 training pipeline with the corresponding config.

For most new domains, training for a few thousand steps on 8 GPUs is typically sufficient and can usually finish within one day.

5. 🎬 Inference

Script Mode Model Link
run_infer_nano_i2v.sh i2v Nano 1.3B script
run_infer_nano_t2v.sh t2v Nano 1.3B script
run_infer_pro_i2v.sh i2v Pro 5B script
run_infer_pro_t2v.sh t2v Pro 5B script

Usage:

# Nano i2v: first frame + control video + text prompt
bash scripts/infer/run_infer_nano_i2v.sh

# Pro t2v: text prompt only (omit --image_path → t2v mode)
bash scripts/infer/run_infer_pro_t2v.sh

The underlying entrypoint infer_giga_world.py exposes the following arguments:

Argument Required Default Description
--config Training / inference YAML config (drives model type and hyperparams)
--base_model_path Base diffusers model directory (VAE / T5 / Transformer)
--transformer_model_name_or_path None Path to the merged transformer; falls back to --base_model_path if None
--checkpoint_path None Optional LoRA / partial checkpoint path
--image_path None First frame for i2v; omit to enter t2v mode
--prompt Text prompt
--control_video_path None Control video (Plücker / Ray Map), optional
--output_dir Output root directory
--sample_name sample Output video name prefix
--seed 42 Random seed
--fps 10 Output video FPS
--num_frames 99 Total frames (330 ≈ 33 s @ 10 FPS)
--height 480 Output height
--width 1920 Output width (typically 640×3 = 1920 for three views)
--num_inference_steps 20 20 steps for Stage-1; 4–6 for Stage-2 / DMD
--guidance_scale 5.0 Classifier-free guidance strength
--enable_tiling False VAE tiling for memory savings

Inference output example:

First Frame Control Video Generated Rollout
input image 🎬 control_video.mp4 🎬 i2v_sample.mp4

6. 🔄 Model Merge & Checkpoint Conversion

Use uni_merge_lora_for_giga_world_1.py to merge LoRA or partial checkpoints into a standalone transformer for deployment.

The tool supports both wan2.1 and wan2.2_5b, automatically resolves checkpoint files from the specified checkpoint directory, and saves the merged transformer to --save_dir.

python tools/ckpt_tools/uni_merge_lora_for_giga_world_1.py \
  --base_model <PATH_TO_BASE_NANO> \
  --save_dir   <PATH_TO_STAGE1_MERGED_NANO> \
  --ckpt_dir   /path/to/checkpoint-XXXX \
  --model_type wan2.1

For Pro 5B:

python tools/ckpt_tools/uni_merge_lora_for_giga_world_1.py \
  --base_model <PATH_TO_BASE_PRO> \
  --save_dir   <PATH_TO_STAGE1_MERGED_PRO> \
  --ckpt_dir   /path/to/checkpoint-XXXX \
  --model_type wan2.2_5b

7. 📁 Repository Layout

.
├── gigaworld/                         # Core model, pipeline, data loader, scheduler, and utils
│   ├── dataset/                       #   Stage-1 / Stage-2 / DMD data loaders
│   ├── modules/                       #   Transformer + custom Triton / Flash kernels
│   │   ├── gigaworld_kernels/         #     fp32_rmsnorm, tiled_linear, triton_norm, triton_rope
│   │   ├── transformer_gigaworld.py
│   │   └── transformer_functrl_gigaworld.py
│   ├── pipelines/                     #   i2v / t2v main pipelines
│   ├── scheduler/                     #   custom schedulers
│   ├── utils/                         #   TrainConfig, EMA, recycle batch, etc.
│   └── videoalign/                    #   reward / VLM training & inference
├── infer/
│   └── infer_giga_world.py            # Python inference entrypoint
├── scripts/
│   ├── accelerate_configs/            # DeepSpeed ZeRO-2 / ZeRO-3 configs
│   ├── infer/                         #   i2v / t2v × Nano / Pro one-click scripts
│   ├── training/
│   │   ├── configs/                   #   Stage-1 / Stage-2 YAMLs
│   │   ├── stage1/                    #   Nano / Pro Stage-1 launchers
│   │   └── stage2/                    #   Nano / Pro Stage-2 DMD launchers
├── tools/
│   ├── ckpt_tools/                    #   LoRA merge and checkpoint utilities
│   ├── datapipeline/                  #   LeRobot-style data preprocessing
│   ├── data_vis_tools/                #   Web URDF + camera calibration viewer
│   ├── download_tool/                 #   one-click HF / ModelScope downloader
│   ├── offload_data/                  #   offline latent pre-computation / format conversion
│   └── others/                        #   misc conversion tools
├── assets/                            #   README figures and demo media
│   ├── main_page.png                  #   main teaser image
│   ├── data_vis.gif                   #   data visualization demo
│   ├── input_image.png                #   inference first frame
│   ├── control_video.mp4              #   inference control video
│   └── i2v_sample.mp4                 #   generated rollout sample
├── example/
│   ├── toy_train_dataset/             #   Nano / Pro toy training datasets
│   └── toy_datapipeline_dataset/      #   toy preprocessing output (gt / depth / plucker / sketch)
├── model/
│   ├── before_stage1/                 #   Diffusers-converted base checkpoints
│   ├── stage1/                        #   Nano / Pro Stage-1 checkpoints
│   └── stage2_distill/                #   distilled checkpoints (coming soon)
├── train_gigaworld_functrl_uni_stage1.py
├── train_gigaworld_functrl_uni_stage2_dmd.py
├── requirements.txt
└── install.sh

🙏 Acknowledgements

GigaWorld-1 stands on the shoulders of a vibrant open-source ecosystem. We are deeply grateful to the following communities and projects that made this work possible:

🤗 Foundation Models & Architectures

  • Wan (Alibaba) — the wan2.1 and wan2.2_5b backbones that power GigaWorld-1 Nano and Pro
  • Diffusers — the modular diffusion framework we extend with custom attention processors and pipelines
  • Helios — a video generation model that achieves minute-scale, high-quality video synthesis
  • Genesis — a generative universal physics engine and robotics/embodied AI simulation platform
  • Hugging Face 🤗 — hosting, transformers, accelerate, and the entire model & dataset ecosystem
  • open-gigaai/Giga-World-1 and open-gigaai/Giga-World-1-Toydata — the public Hugging Face model and toy-data repositories

🛠️ Training & Acceleration

  • PyTorch & DeepSpeed — the foundation of our distributed training stack
  • Accelerate — the launcher that ties everything together
  • PEFT — the LoRA implementation behind lora_rank=128 and critic_lora
  • xFormers — memory-efficient attention
  • Flash-Attention — fast and memory-efficient exact attention (FA2 / FA3)
  • Triton — custom kernels for RMSNorm, RoPE, and tiled linears (gigaworld/modules/gigaworld_kernels/)

🗃️ Data & Annotation

  • Qwen3-VL — dense long-caption generation from cam_high videos
  • Depth Anything V2 — monocular depth estimation for all three camera views
  • LeRobot — the LeRobot-style raw data layout our preprocessing pipeline consumes
  • Open X-Embodiment & AgiBot — large-scale robot demonstration datasets

🦿 Robotics, Visualization & Tooling

🎬 Predecessors & Inspiration

🌟 Community

A heartfelt thank you to everyone who has filed an issue, opened a PR, shared a workflow, or simply starred the project. Open source is a relay race — we are proud to hand the baton forward.

If you find GigaWorld-1 useful, please consider ⭐ starring the repo and citing the paper (see below).


📖 Citation

@article{gigaworld2025,
  title   = {GigaWorld-1: A Roadmap to World Models for Robot Policy Evaluation},
  author  = {{GigaAI}},
  journal = {arXiv preprint},
  year    = {2025},
  eprint  = {2511.19861},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}

Built with ❤️ by the GigaWorld Team, GigaAI · CVPR 2026

Released under the Apache 2.0 License.

About

A Roadmap to Build World Models for Robot Policy Evaluation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors