Skip to content

Wan 2.2 camera control training - reproducibility issues #976

@lefreud

Description

@lefreud

Hi,

I have been playing with the training code of Wan2.2-Fun-A14B-Control-Camera.sh, and I have a few questions:

  1. Why do we have to start from a checkpoint already trained for camera control? In the code, the training starts from the following checkpoint: PAI/Wan2.2-Fun-A14B-Control-Camera, instead of starting from Wan2.2-I2V (or T2V). This seems to partly defeat the purpose of training, if we can't start from the base model.
  2. I have tried starting from the Wan-2.2 TI2V 5B, but I noticed that with the current code, the control adapter would not have zero-initialization, leading to instabilities at the beginning of training. Are there plans to add some kind of zero initialization? I'm also curious as to what kind of initialization was used for training the provided checkpoints.
  3. Do you think the model can be trained without a reference image and with the 5B version?
  4. Can we train on more complex camera paths than simple camera translations with a specified speed (e.g. rotations, others)? Or maybe this conditioning is somehow too weak to have a high degree of control.
  5. Can we know on what type and amount of data this model was trained on?

Here is how I launch my trainings (note that I modified a few options, for instance I use my own dataset with high quality camera poses and plucker rays, so I don't need dataset_metadata_path and dataset_base_path):

accelerate launch --config_file examples/wanvideo/model_training/full/accelerate_config_5B.yaml examples/wanvideo/model_training/train.py \
  --data_file_keys None \
  --height 480 \
  --width 736 \
  --model_paths '[
    [
        "checkpoints/hf/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/921dbaf3f1674a56f47e83fb80a34bac8a8f203e/diffusion_pytorch_model-00001-of-00003.safetensors",
        "checkpoints/hf/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/921dbaf3f1674a56f47e83fb80a34bac8a8f203e/diffusion_pytorch_model-00002-of-00003.safetensors",
        "checkpoints/hf/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/921dbaf3f1674a56f47e83fb80a34bac8a8f203e/diffusion_pytorch_model-00003-of-00003.safetensors"
    ],
    "checkpoints/hf/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/921dbaf3f1674a56f47e83fb80a34bac8a8f203e/models_t5_umt5-xxl-enc-bf16.pth",
    "checkpoints/hf/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/921dbaf3f1674a56f47e83fb80a34bac8a8f203e/Wan2.2_VAE.pth"
]' \
  --tokenizer_path "checkpoints/hf/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/921dbaf3f1674a56f47e83fb80a34bac8a8f203e/google/umt5-xxl" \
  --learning_rate 1e-4 \
  --num_epochs 100 \
  --remove_prefix_in_ckpt "pipe.dit." \
  --output_path "./models/train/$(date +%Y-%m-%d_%H-%M-%S)_Wan2.2-Fun-5B-Control-Camera_full" \
  --trainable_models "dit" \
  --dataset_num_workers 4

Apologies for all the questions, I really like this project and would like to make the training work on my side!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions