Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model + Pipeline] DragNUWA #6497

Closed
2 tasks done
a-r-r-o-w opened this issue Jan 8, 2024 · 9 comments
Closed
2 tasks done

[Model + Pipeline] DragNUWA #6497

a-r-r-o-w opened this issue Jan 8, 2024 · 9 comments
Labels
community-examples contributions-welcome stale Issues that haven't received updates

Comments

@a-r-r-o-w
Copy link
Member

a-r-r-o-w commented Jan 8, 2024

Model/Pipeline/Scheduler description

DragNUWA enables users to manipulate backgrounds or objects within images directly, and the model seamlessly translates these actions into camera movements or object motions, generating the corresponding video.

example

Thank you for your amazing and absolutely mind-blowing work at Microsoft Research once again! Can't wait to get into the specifics and learn from your paper ❤️

Code: https://github.com/ProjectNUWA/DragNUWA
Paper: https://arxiv.org/abs/2308.08089
Project Page: https://www.microsoft.com/en-us/research/project/dragnuwa/
Demo: https://huggingface.co/spaces/yinsming/DragNUWA
Authors: @shengming-yin @moymix @tim-learn [Jie Shi] [Houqiang Li] [Gong Ming] @nanduan

Open source status

  • The model implementation is available.
  • The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

Regarding implementation: The code base is built upon SVD backbone. Diffusers has, probably, the most intuitive implementation of SVD and adding this should, hopefully, not be too difficult.

@sayakpaul @patrickvonplaten

@sayakpaul
Copy link
Member

Very cool. You know the drill by now :D

Feel free to open a PR for community examples.

@a-r-r-o-w
Copy link
Member Author

Apologies for the delay here. I've been working on my first ComfyUI extension for this but have found it slightly difficult to decipher the code base and add as a new extension. Seems like someone already made a really nice extension recently and beat me to it: link! I will focus on converting to a diffusers format pipeline now.

It might be interesting to compare DragNUWA against MotionCntrl since both work with the SVD backbone, although the latter also supports VideoCrafter and AnimateDiff as backbones. This feature is comparable to the Multi-Motion Brush product provided by RunwayML, which I believe uses something similar under the hood.

@a-r-r-o-w
Copy link
Member Author

a-r-r-o-w commented Jan 26, 2024

@sayakpaul @patil-suraj @patrickvonplaten I need some help converting the SVD checkpoint they provide to diffusers format. I see that we have a script for the conversion but it is not very straightforward to use as it does not expose a CLI interface and I've been facing difficulties initiating the conversion by using the code directly. Any pointers on how to go about it would be really helpful, thanks!

@sayakpaul
Copy link
Member

What problems are you facing exactly when using the conversion script?

It's better to share checkpoints on the Hub rather than Drive :D Why cloud your precious storage space? :D

@a-r-r-o-w
Copy link
Member Author

a-r-r-o-w commented Jan 27, 2024

What problems are you facing exactly when using the conversion script?

The script does not expose a CLI (which maybe I can take up in a PR) for easy conversion of weights. As more SVD checkpoints are appearing (MotionCtrl, DragNUWA), it would be a nice and easy way to get things ready for testing.

The script also does not seem to work directly when loading the original yaml format config of SGM implementation as dict. Converting the dict into python objects (which is what the script expects due to the dot notation access of attributes at places) still seems to give errors. I will spend some time improving it.

It's better to share checkpoints on the Hub rather than Drive :D Why cloud your precious storage space? :D

Weights are by the author :) Researchers should really start using HF to store weights instead since Drive keeps erroring out if too many people access these large files, and it blocks downloads 😆 Fortunately, people have downloaded and pushed to hub, but it is not in diffusers format.

From the implementation perspective, I've spent time diving into both DragNUWA and MotionCtrl and understand most of the paper and code. I also feel somewhat confident about being able to implement a training script for both. So far, from testing with the original codebases, MotionCtrl seems to be better then Drag at object consistency so I will prioritize that. Will be opening PRs shortly once I can get the weights converted (both support SVD). The changes should be minimal, and self-contained in the pipelines, but will require modifying the spatio-temporal unet code due to added conditioning from the proposed camera and object modules.

@sayakpaul
Copy link
Member

Fortunately, people have downloaded and pushed to hub, but it is not in diffusers format.

Yeah please do provide the relevant link.

Let's maybe start from a community pipeline first.

Let's maybe hold off a bit from the training script as SVD is not commercially friendly unlike SDXL.

For the conversion script, I will defer to @patil-suraj and @DN6.

@a-r-r-o-w
Copy link
Member Author

a-r-r-o-w commented Jan 27, 2024

Sure. Here are the ones for DragNUWA I found uploaded to the hub:

Also, MotionCtrl (which also requires conversion): https://huggingface.co/TencentARC/MotionCtrl

Maybe it also makes sense to add an implementation for .from_single_file with the mixin.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Feb 21, 2024
@yiyixuxu yiyixuxu removed the stale Issues that haven't received updates label Feb 21, 2024
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Mar 17, 2024
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-examples contributions-welcome stale Issues that haven't received updates
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants