Wan 2.2 camera control training - reproducibility issues

Hi,

I have been playing with the training code of [Wan2.2-Fun-A14B-Control-Camera.sh](https://github.com/modelscope/DiffSynth-Studio/blob/084bc2fc78422fd15b37f7a8db02ad924eaf2917/examples/wanvideo/model_training/full/Wan2.2-Fun-A14B-Control-Camera.sh), and I have a few questions:

1. Why do we have to start from a checkpoint already trained for camera control? In the code, the training starts from the following checkpoint: PAI/Wan2.2-Fun-A14B-Control-Camera, instead of starting from Wan2.2-I2V (or T2V). This seems to partly defeat the purpose of training, if we can't start from the base model.
2. I have tried starting from the Wan-2.2 TI2V 5B, but I noticed that with the current code, the control adapter would not have zero-initialization, leading to instabilities at the beginning of training. Are there plans to add some kind of zero initialization? I'm also curious as to what kind of initialization was used for training the provided checkpoints.
3. Do you think the model can be trained without a reference image and with the 5B version?
4. Can we train on more complex camera paths than simple camera translations with a specified speed (e.g. rotations, others)? Or maybe this conditioning is somehow too weak to have a high degree of control.
5. Can we know on what type and amount of data this model was trained on?

Here is how I launch my trainings (note that I modified a few options, for instance I use my own dataset with high quality camera poses and plucker rays, so I don't need dataset_metadata_path and dataset_base_path):
```
accelerate launch --config_file examples/wanvideo/model_training/full/accelerate_config_5B.yaml examples/wanvideo/model_training/train.py \
  --data_file_keys None \
  --height 480 \
  --width 736 \
  --model_paths '[
    [
        "checkpoints/hf/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/921dbaf3f1674a56f47e83fb80a34bac8a8f203e/diffusion_pytorch_model-00001-of-00003.safetensors",
        "checkpoints/hf/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/921dbaf3f1674a56f47e83fb80a34bac8a8f203e/diffusion_pytorch_model-00002-of-00003.safetensors",
        "checkpoints/hf/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/921dbaf3f1674a56f47e83fb80a34bac8a8f203e/diffusion_pytorch_model-00003-of-00003.safetensors"
    ],
    "checkpoints/hf/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/921dbaf3f1674a56f47e83fb80a34bac8a8f203e/models_t5_umt5-xxl-enc-bf16.pth",
    "checkpoints/hf/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/921dbaf3f1674a56f47e83fb80a34bac8a8f203e/Wan2.2_VAE.pth"
]' \
  --tokenizer_path "checkpoints/hf/hub/models--Wan-AI--Wan2.2-TI2V-5B/snapshots/921dbaf3f1674a56f47e83fb80a34bac8a8f203e/google/umt5-xxl" \
  --learning_rate 1e-4 \
  --num_epochs 100 \
  --remove_prefix_in_ckpt "pipe.dit." \
  --output_path "./models/train/$(date +%Y-%m-%d_%H-%M-%S)_Wan2.2-Fun-5B-Control-Camera_full" \
  --trainable_models "dit" \
  --dataset_num_workers 4
```

Apologies for all the questions, I really like this project and would like to make the training work on my side!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wan 2.2 camera control training - reproducibility issues #976

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wan 2.2 camera control training - reproducibility issues #976

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions