-
Notifications
You must be signed in to change notification settings - Fork 6.5k
MotionCtrl SVD #6844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MotionCtrl SVD #6844
Conversation
|
Link to converted SVD model: https://huggingface.co/a-r-r-o-w/motionctrl-svd/ @wzhouxiff @jiangyzy @xinntao Thank you for your amazing work! Maybe it makes sense to move this to one of the authors accounts or under the TencentARC organization. |
Any other questions about this pr, I'm ready to reenact motionctrl. Maybe I can give a little help. |
Thanks for trying to look into this @Yanting-K! Nope, I do not have any other questions regarding the implementation. Just haven't found time to debug and fix this yet :( cc @DN6 |
|
The mistake above that caused bad results was using the wrong image encoder checkpoint. Since the authors freeze all layers of SVD and just train the attn2 and cc_projection layers in the UNet, we can reuse the image encoder/vae from SVD. Some results (manually downscaled):
Unfortunately, there is not much object movement here due to no checkpoints for the Object Motion Control module for SVD. They only demonstrate the Camera control module, which works great as we can see but is not enough control the community wants with SVD. More controllability exists in DragNUWA with SVD, which is something I've been working on parallely and will open a PR for it shortly. Maybe the OMCM for VideoCrafter can be used but I haven't found the time to experiment or look into more details. I will be working on adding the Crafter family of models very soon, and so will take a look at OMCM stuff then. @DN6 @sayakpaul @patrickvonplaten I believe this is ready for an initial review. I know the changes are not ideal because of the MotionCtrl-specific additions to the UNet/Attention code. Let me know how to go about it for community/core addition. Thanks |
|
There is one thing I do not understand here though... maybe @wzhouxiff could help me out. Why do you multiply the camera_poses[:, : -1] with [3, 1, 4] for rescaling? The speed makes sense to me, but this just seemed a little random. The results seem to be the same with/without it |
|
@sayakpaul @DN6 I have verified that the implementation is faithful to the original repository for SVD and should be a good candidate for a community version. Since this pipeline involves some light modification to the unet attention layers, how would you suggest I convert to a single file pipeline? Is a, sort of hacky, overriding of the forward method and somehow pushing the cc_projection layers into the appropriate places sound good? This is how it's actually done the original repository. |
|
You can, for now, then add it to |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Thanks for running the tests! Seems like everything passed as failing tests are unrelated which means I haven't (hopefully) broken any existing code. |








What does this PR do?
Link to Colab notebook
Fixes #6688.
This PR adds MotionCtrl to diffusers. These changes are not really ideal to merge. It's still a WIP and I wanted to get a working example with diffusers before figuring out how best to add it to core/community. Currently, I've just hacked through the unet code adding motionctrl specific stuff.
Thanks to ModelsLab for providing GPU support.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@DN6 @sayakpaul @patrickvonplaten