[Model + Pipeline] DragNUWA #6497

a-r-r-o-w · 2024-01-08T21:52:56Z

Model/Pipeline/Scheduler description

DragNUWA enables users to manipulate backgrounds or objects within images directly, and the model seamlessly translates these actions into camera movements or object motions, generating the corresponding video.

Thank you for your amazing and absolutely mind-blowing work at Microsoft Research once again! Can't wait to get into the specifics and learn from your paper ❤️

Code: https://github.com/ProjectNUWA/DragNUWA
Paper: https://arxiv.org/abs/2308.08089
Project Page: https://www.microsoft.com/en-us/research/project/dragnuwa/
Demo: https://huggingface.co/spaces/yinsming/DragNUWA
Authors: @shengming-yin @moymix @tim-learn [Jie Shi] [Houqiang Li] [Gong Ming] @nanduan

Open source status

The model implementation is available.
The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

Regarding implementation: The code base is built upon SVD backbone. Diffusers has, probably, the most intuitive implementation of SVD and adding this should, hopefully, not be too difficult.

@sayakpaul @patrickvonplaten

sayakpaul · 2024-01-09T02:44:43Z

Very cool. You know the drill by now :D

Feel free to open a PR for community examples.

a-r-r-o-w · 2024-01-24T14:04:45Z

Apologies for the delay here. I've been working on my first ComfyUI extension for this but have found it slightly difficult to decipher the code base and add as a new extension. Seems like someone already made a really nice extension recently and beat me to it: link! I will focus on converting to a diffusers format pipeline now.

It might be interesting to compare DragNUWA against MotionCntrl since both work with the SVD backbone, although the latter also supports VideoCrafter and AnimateDiff as backbones. This feature is comparable to the Multi-Motion Brush product provided by RunwayML, which I believe uses something similar under the hood.

a-r-r-o-w · 2024-01-26T08:07:01Z

@sayakpaul @patil-suraj @patrickvonplaten I need some help converting the SVD checkpoint they provide to diffusers format. I see that we have a script for the conversion but it is not very straightforward to use as it does not expose a CLI interface and I've been facing difficulties initiating the conversion by using the code directly. Any pointers on how to go about it would be really helpful, thanks!

sayakpaul · 2024-01-27T04:57:10Z

What problems are you facing exactly when using the conversion script?

It's better to share checkpoints on the Hub rather than Drive :D Why cloud your precious storage space? :D

a-r-r-o-w · 2024-01-27T05:13:52Z

What problems are you facing exactly when using the conversion script?

The script does not expose a CLI (which maybe I can take up in a PR) for easy conversion of weights. As more SVD checkpoints are appearing (MotionCtrl, DragNUWA), it would be a nice and easy way to get things ready for testing.

The script also does not seem to work directly when loading the original yaml format config of SGM implementation as dict. Converting the dict into python objects (which is what the script expects due to the dot notation access of attributes at places) still seems to give errors. I will spend some time improving it.

It's better to share checkpoints on the Hub rather than Drive :D Why cloud your precious storage space? :D

Weights are by the author :) Researchers should really start using HF to store weights instead since Drive keeps erroring out if too many people access these large files, and it blocks downloads 😆 Fortunately, people have downloaded and pushed to hub, but it is not in diffusers format.

From the implementation perspective, I've spent time diving into both DragNUWA and MotionCtrl and understand most of the paper and code. I also feel somewhat confident about being able to implement a training script for both. So far, from testing with the original codebases, MotionCtrl seems to be better then Drag at object consistency so I will prioritize that. Will be opening PRs shortly once I can get the weights converted (both support SVD). The changes should be minimal, and self-contained in the pipelines, but will require modifying the spatio-temporal unet code due to added conditioning from the proposed camera and object modules.

sayakpaul · 2024-01-27T05:17:29Z

Fortunately, people have downloaded and pushed to hub, but it is not in diffusers format.

Yeah please do provide the relevant link.

Let's maybe start from a community pipeline first.

Let's maybe hold off a bit from the training script as SVD is not commercially friendly unlike SDXL.

For the conversion script, I will defer to @patil-suraj and @DN6.

a-r-r-o-w · 2024-01-27T05:19:46Z

Sure. Here are the ones for DragNUWA I found uploaded to the hub:

Original weights: https://huggingface.co/LanHarmony/DragNUWA
Safetensors format: https://huggingface.co/benjamin-paine/dragnuwa-pruned-safetensors

Also, MotionCtrl (which also requires conversion): https://huggingface.co/TencentARC/MotionCtrl

Maybe it also makes sense to add an implementation for .from_single_file with the mixin.

github-actions · 2024-02-21T15:05:33Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions · 2024-03-17T15:03:55Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul added community-examples contributions-welcome labels Jan 9, 2024

a-r-r-o-w mentioned this issue Feb 4, 2024

StableVideoDiffusionPipeline cannot use from_single_file #6839

Open

a-r-r-o-w mentioned this issue Feb 11, 2024

[WIP] DragNUWA SVD #6938

Closed

6 tasks

github-actions bot added the stale Issues that haven't received updates label Feb 21, 2024

yiyixuxu removed the stale Issues that haven't received updates label Feb 21, 2024

github-actions bot added the stale Issues that haven't received updates label Mar 17, 2024

a-r-r-o-w closed this as completed Aug 30, 2024

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model + Pipeline] DragNUWA #6497

[Model + Pipeline] DragNUWA #6497

a-r-r-o-w commented Jan 8, 2024 •

edited

Loading

sayakpaul commented Jan 9, 2024

a-r-r-o-w commented Jan 24, 2024

a-r-r-o-w commented Jan 26, 2024 •

edited

Loading

sayakpaul commented Jan 27, 2024

a-r-r-o-w commented Jan 27, 2024 •

edited

Loading

sayakpaul commented Jan 27, 2024

a-r-r-o-w commented Jan 27, 2024 •

edited

Loading

github-actions bot commented Feb 21, 2024

github-actions bot commented Mar 17, 2024

[Model + Pipeline] DragNUWA #6497

[Model + Pipeline] DragNUWA #6497

Comments

a-r-r-o-w commented Jan 8, 2024 • edited Loading

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

sayakpaul commented Jan 9, 2024

a-r-r-o-w commented Jan 24, 2024

a-r-r-o-w commented Jan 26, 2024 • edited Loading

sayakpaul commented Jan 27, 2024

a-r-r-o-w commented Jan 27, 2024 • edited Loading

sayakpaul commented Jan 27, 2024

a-r-r-o-w commented Jan 27, 2024 • edited Loading

github-actions bot commented Feb 21, 2024

github-actions bot commented Mar 17, 2024

a-r-r-o-w commented Jan 8, 2024 •

edited

Loading

a-r-r-o-w commented Jan 26, 2024 •

edited

Loading

a-r-r-o-w commented Jan 27, 2024 •

edited

Loading

a-r-r-o-w commented Jan 27, 2024 •

edited

Loading