Skip to content

Commit

Permalink
updates
Browse files Browse the repository at this point in the history
  • Loading branch information
choasma committed Mar 21, 2024
1 parent 523445d commit 21d6d19
Showing 1 changed file with 84 additions and 19 deletions.
103 changes: 84 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,33 +9,67 @@
[![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2Fhohonu-vicml%2FTrailblazer&count_bg=%238B00FB&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false)](https://hits.seeyoufarm.com)


This repository contains the implementation of the following paper:
> **TrailBlazer: Trajectory Control for Diffusion-Based Video Generation**<br>
> [Wan-Duo Kurt Ma](https://www.linkedin.com/in/kurt-ma/)<sup>1</sup>, [J.P. Lewis](http://www.scribblethink.org/)<sup>2</sup>, [ W. Bastiaan Kleijn](https://people.wgtn.ac.nz/bastiaan.kleijn)<sup>1</sup>,<br>
Victoria University of Wellington<sup>1</sup>, NVIDIA Research<sup>2</sup>
This repository contains the implementation of the following paper: >
**TrailBlazer: Trajectory Control for Diffusion-Based Video Generation**<br> >
[Wan-Duo Kurt Ma](https://www.linkedin.com/in/kurt-ma/)<sup>1</sup>, [J.P.
Lewis](http://www.scribblethink.org/)<sup>2</sup>, [ W. Bastiaan
Kleijn](https://people.wgtn.ac.nz/bastiaan.kleijn)<sup>1</sup>,<br> Victoria
University of Wellington<sup>1</sup>, NVIDIA Research<sup>2</sup>

## :fire: Overview
![teaser](./assets/figs/teaser.gif)

**TrailBlazer** focuses on enhancing controllability in video synthesis by employing straightforward bounding boxes to guide the subject in various ways, all without the need for neural network training, finetuning, optimization at inference time, or the use of pre-existing videos. Our algorithm is constructed upon a pre-trained (T2V) model, and easy to implement. The subject is directed by a bounding box through the proposed spatial and temporal attention map editing. Moreover, we introduce the concept of keyframing, allowing the subject trajectory and overall appearance to be guided by both a moving bounding box and corresponding prompts, without the need to provide a detailed mask. The method is efficient, with negligible additional computation relative to the underlying pre-trained model. Despite the simplicity of the bounding box guidance, the resulting motion is surprisingly natural, with emergent effects including perspective and movement toward the virtual camera as the box size increases.
![teaser](./assets/teaser.gif)

**TrailBlazer** focuses on enhancing controllability in video synthesis by
employing straightforward bounding boxes to guide the subject in various ways,
all without the need for neural network training, finetuning, optimization at
inference time, or the use of pre-existing videos. Our algorithm is constructed
upon a pre-trained (T2V) model, and easy to implement. The subject is directed
by a bounding box through the proposed spatial and temporal attention map
editing. Moreover, we introduce the concept of keyframing, allowing the subject
trajectory and overall appearance to be guided by both a moving bounding box and
corresponding prompts, without the need to provide a detailed mask. The method
is efficient, with negligible additional computation relative to the underlying
pre-trained model. Despite the simplicity of the bounding box guidance, the
resulting motion is surprisingly natural, with emergent effects including
perspective and movement toward the virtual camera as the box size increases.

## :fire: Requirements

The codebase is tested under **NVIDIA GeForce RTX 3090** with the python library **pytorch-2.1.2+cu121** and **diffusers-0.21.4**. We strongly recommend using a specific version of Diffusers as it is continuously evolving. For PyTorch, you could probably use other version under 2.x.x. With RTX 3090, I follow the [post](https://discuss.pytorch.org/t/geforce-rtx-3090-with-cuda-capability-sm-86-is-not-compatible-with-the-current-pytorch-installation/123499) to avoid the compatibility of sm_86 issue.
The codebase is tested under **NVIDIA GeForce RTX 3090** with the python library
**pytorch-2.1.2+cu121** and **diffusers-0.21.4**. We strongly recommend using a
specific version of Diffusers as it is continuously evolving. For PyTorch, you
could probably use other version under 2.x.x. With RTX 3090, I follow the
[post](https://discuss.pytorch.org/t/geforce-rtx-3090-with-cuda-capability-sm-86-is-not-compatible-with-the-current-pytorch-installation/123499)
to avoid the compatibility of sm_86 issue.

## :fire: Timeline

- [2024/02/07]: The Gradio app is updated with better keyframe interface (See ([link](assets/gradio/gradio.jpg)))
- [2024/03/23]: A new ArXiv update will be made.

- [2024/03/22]: We release the multiple object synthesis (See
[link](doc/Command.md#multiple-objects)), and the
[Peekaboo](https://github.com/microsoft/Peekaboo) integration (See
[link](doc/Peekaboo.md))

- [2024/02/07]: The Gradio app is updated with better keyframe interface (See
([link](assets/gradio/gradio.jpg)))

- [2024/02/06]: We now have Gradio web app at Huggingface Space!

- [2024/02/01]: The official codebase released

- [2024/01/03]: Paper released

- [2023/12/31]: Paper submitted on ArXiv

## :fire: Usage

#### [Prepare]

First of all, download the pre-trained zeroscope model ([link](https://huggingface.co/cerspense/zeroscope_v2_576w)). You need to register huggingface and make access token ([link](https://huggingface.co/))
First of all, download the pre-trained zeroscope model
([link](https://huggingface.co/cerspense/zeroscope_v2_576w)). You need to
register huggingface and make access token ([link](https://huggingface.co/))

```bash
git clone https://huggingface.co/cerspense/zeroscope_v2_576w ${MODEL_ROOT}/cerspense/zeroscope_v2_576w
```
Expand All @@ -47,38 +81,69 @@ git clone https://github.com/hohonu-vicml/Trailblazer && cd Trailbalzer

#### [Run it]

Our executable script is located in the "bin" folder, and the core module is implemented in the "TrailBlazer" folder under the project root. Therefore, no additional dependencies need to be added to PYTHONPATH; you can simply run the command below :smirk: :
Our executable script is located in the "bin" folder, and the core module is
implemented in the "TrailBlazer" folder under the project root. Therefore, no
additional dependencies need to be added to PYTHONPATH; you can simply run the
command below :smirk: :

```bash
python bin/CmdTrailBlazer.py -mr ${MODEL_ROOT} --config config/XXXX.yaml ## single experiment
python bin/CmdTrailBlazer.py -mr ${MODEL_ROOT} --config config/ ## run all yamls in a folder
```

:cupid:**UPDATE**:cupid:: TrailBlazer has just released Gradio app for the alternative interface. Please checkout our documentation ([Gradio.md](doc/Gradio.md)) for more information. To run the app, simply run:
:cupid:**UPDATE**:cupid:: TrailBlazer has just released Gradio app for the
alternative interface. Please checkout our documentation
([Gradio.md](doc/Gradio.md)) for more information. To run the app, simply run:

```bash
python bin/CmdGradio.py ${MODEL_ROOT} # no -mr here
```

When the shell environment variable ZEROSCOPE_MODEL_ROOT is specified, then you
can ignore the -mr (--model-root) argument above.

```bash
export ZEROSCOPE_MODEL_ROOT=/path/to/your/diffusion/root
# then you can ignore -mr term to simplify the command
python bin/CmdTrailBlazer.py --config config/XXXX.yaml
```

Please see [here](doc/Command.md) for more information about the command set
used in TrailBlazer.

#### [Config]

A list of config example files is stored in the `config` folder. Feel free to run each of them and the result will be written in the `/tmp` folder. For more information how to design the config file, and the visual result of each config. Please visit [here](config/README.md).
A list of config example files is stored in the `config` folder. Feel free to
run each of them and the result will be written in the `/tmp` folder. For more
information how to design the config file, and the visual result of each config.
Please visit [here](doc/Config.md) and [there](config/README.md) for more
details about config structure and the visual result, respectively.

## :fire: Contribution

This project is still working in progress, and there are numerous directions in which it can be improved. Please don't hesitate to contact us if you are interested, or feel free to make a pull request to strengthen the ideas.
This project is still working in progress, and there are numerous directions in
which it can be improved. Please don't hesitate to contact us if you are
interested, or feel free to make a pull request to strengthen the ideas.

## :fire: TODO

We apologize this repository is currently not fully public. Currently, we only release the core methods of a single subject synthesis. We will progressively make the multiple subjects and Gradio demo within next two weeks. Stay tuned! (again!)
We regret to inform you that this repository is currently not fully accessible
to the public. Nevertheless, the majority of the core modules have been made
available (e.g., Single, Multiple objects synthesis, and Peekaboo comparison).
Our next release will include useful tools for measuring metrics.

## :fire: Fun

<img src="./assets/figs/Speed-cat.0004.0000.gif" width="256" height="256"> Poor cat: Someone, Stop me!

<img src="./assets/figs/Omg-CatDog.0003.gif" width="256" height="256"> Cat: let's turn into a dog so the authors won't ask me run.
<img src="./assets/v1-TrailBlazer/SpeedKeys-Cat.0000.gif">

Poor cat: Someone, Stop me!

<img src="./assets/v1-TrailBlazer/Cat2Dog.0000.gif">

<img src="./assets/figs/Omg-cat2fish.gif" width="256" height="256"> Cat: I said, dog! NOT fish!
Am I a cat, or a dog...

Please share your funny video if any. We will post here and credit your Github id!
Please inform us if you have generated any interesting videos!

## :fire: Citation

Expand Down

0 comments on commit 21d6d19

Please sign in to comment.