pip install -r requirment.txt
git clone https://github.com/oooolga/Ctrl-V
cd Ctrl-V
python setup.py develop
In the training and evaluation script, set the DATASET_PATH
to the root of the dataset folder. Within this folder, you will find the extracted dataset subfolders. The dataset root folder should be organized in the following format:
Datasets/
├── bdd100k
├── kitti
├── vkitti_2.0.3
└── nuscenes
We have provided a script to render bounding-box frames and save them to your data directory, which can save time during training. However, this step is optional. You can also choose to render the bounding boxes on the fly by setting the use_preplotted_bbox
parameter to False
in the get_dataloader
call.
To render the bounding-box frames before training, run the following commands
python tools/preprocessing/preprocess_dataset.py $DATASET_PATH
A demo script is available at demo_train_bbox_predict.sh.
To run demo_train_bbox_predict.sh, set the $DATASET_PATH
and $OUT_DIR
to your desired path, and then execute
bash ./scripts/train_scripts/demo_train_bbox_predict.sh
To resume training, set the $NAME
variable to the name of the stopped experiment (e.g., bdd100k_bbox_predict_240616_000000
). Ensure that you include --resume_from_checkpoint latest
and that all the hyperparameter settings match those of the stopped experiment. After this setup, you can resume training by re-executing the training command.
To train on different sets, simply modify DATASET
variable's value to kitti
, vkitti
or bdd100k
. You can adjust the number of input frame conditions for your bounding-box predictor by changing the value of num_cond_bbox_frames
. To change the last condition bounding-box frame to its trajectory frame, enable if_last_frame_trajectory
.
A demo script is available at demo_train_video_box2video.sh.
Prior to training the Box2Video model, results may be improved by finetuning the SVD model to the current dataset. To do this run demo_train_video_diffusion.sh, set the $DATASET_PATH
and $OUT_DIR
to your desired path, and then execute
bash ./scripts/train_scripts/demo_train_video_diffusion.sh
To run demo_train_video_box2video.sh, set the $DATASET_PATH
and $OUT_DIR
to your desired path, and set $FINETEUNED_SVD_PATH to the $OUT_DIR from the previous finetuning step and then execute
bash ./scripts/train_scripts/demo_train_video_box2video.sh
(Note: if you do not wish to start from a finetuned model, simply remove the --finetuned_svd_path
argument in demo_train_video_box2video.sh and this will load the (non-finetuned) model from --pretrained_model_name_or_path
.
To resume training, set the $NAME
variable to the name of the stopped experiment (e.g., bdd100k_ctrlv_240616_000000
). Ensure that you include --resume_from_checkpoint latest
and that all the hyperparameter settings match those of the stopped experiment. After this setup, you can resume training by re-executing the training command.
To train on different sets, simply modify DATASET
variable's value to kitti
, vkitti
or bdd100k
.
Demo scripts are available at eval_scripts.
To generate videos using the entire generation pipeline (predict bounding boxes and generate videos based on the predicted bounding box sequences), set the following variables in the demo_eval_overall_{}.sh
scripts: $DATASET_PATH
, $OUT_DIR
, $BOX2VIDEO_DIR
, and $BBOX_MODEL_DIR
, and then execute
bash ./scripts/eval_exripts/demo_eval_overall_{}.sh
For each input sample, the pipeline will predict five bounding-box sequences and select the one with the highest mask-IoU score to generate the final video. We evaluate bounding-box prediction metrics during the generation process, and the results are uploaded to the W&B dashboard.
The generated videos are also uploaded to the W&B dashboard. You can find a local copy of the generated videos in your W&B folder at $OUT_DIR/wandb/run-{run_id}/files/media
.
A demo script is available at demo_eval_box2video_tf.sh.
To generate videos using the ground-truth bounding boxes, set the $DATASET_PATH
and $OUT_DIR
variables in the script ($OUT_DIR
should be the same location you used when training the Box2Video
model), and then execute the following command:
bash ./scripts/eval_scripts/demo_eval_box2video_tf.sh
The generated videos are also uploaded to the W&B dashboard. You can find a local copy of the generated videos in your W&B folder at $OUT_DIR/wandb/run-{run_id}/files/media
.
(See src/ctrlv/metrics/fvd.py
)
TODO
To compute the mAP and AP scores, run the following command
DATASET_NAME="..." #kitti/vkitti/bdd100k/nuscenes
ABSOLUTE_PATH_TO_WANDB_DIR="/..."
RUN_ID="..."
python tools/run_tracking_metrics.py $ABSOLUTE_PATH_TO_WANDB_DIR/wandb/$RUN_ID/files/media/videos $DATASET_NAME
This code would automatic save the YOLOv8 detection results to $ABSOLUTE_PATH_TO_WANDB_DIR
.
Our library is built on the work of many brilliant researchers and developers. We're grateful for their contributions, which have helped us with this project. Special thanks to the following repositories for providing valuable tools that have enhanced our project:
- @huggingface's diffusion model library.
- @ultralytics's yolov8 library.
@misc{luo2024ctrlv,
title={Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion},
author={Ge Ya Luo and Zhi Hao Luo and Anthony Gosselin and Alexia Jolicoeur-Martineau and Christopher Pal},
year={2024},
eprint={2406.05630},
archivePrefix={arXiv}
}