MotionCtrl SVD #6844

a-r-r-o-w · 2024-02-04T13:13:37Z

What does this PR do?

This PR adds MotionCtrl to diffusers. These changes are not really ideal to merge. It's still a WIP and I wanted to get a working example with diffusers before figuring out how best to add it to core/community. Currently, I've just hacked through the unet code adding motionctrl specific stuff.

Thanks to ModelsLab for providing GPU support.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@DN6 @sayakpaul @patrickvonplaten

a-r-r-o-w · 2024-02-04T14:23:27Z

Link to converted SVD model: https://huggingface.co/a-r-r-o-w/motionctrl-svd/

@wzhouxiff @jiangyzy @xinntao Thank you for your amazing work! Maybe it makes sense to move this to one of the authors accounts or under the TencentARC organization.

a-r-r-o-w · 2024-02-04T15:44:46Z

I'd appreciate some help with debugging.

Testing code

from diffusers.pipelines.stable_video_diffusion.pipeline_stable_video_motionctrl_diffusion import StableVideoMotionCtrlDiffusionPipeline
from diffusers.utils import load_image, export_to_gif

pipe = StableVideoMotionCtrlDiffusionPipeline.from_pretrained(
    "a-r-r-o-w/motionctrl-svd", torch_dtype=torch.float16, variant="fp16"
).to("cuda")

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png")
image = image.resize((1024, 576))

camera_pose = ... # use one of the config json files from author's repo
num_frames=14

frames = pipe(
    image=image,
    camera_pose=camera_pose[:num_frames],
    num_frames=num_frames,
    num_inference_steps=25,
    decode_chunk_size=4,
    motion_bucket_id=127,
    min_guidance_scale=2.5,
    max_guidance_scale=1,
    generator=torch.manual_seed(42)
).frames[0]

export_to_gif(frames, "animation.gif")

DDIM	EulerDiscrete

The sampling config used in the authors implementation is:

    sampler_config:
      target: sgm.modules.diffusionmodules.sampling.EulerEDMSampler
      params:
        num_steps: 25
        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.EDMDiscretization
          params:
            sigma_max: 700.0

        guider_config:
          target: sgm.modules.diffusionmodules.guiders.LinearPredictionGuider
          params:
            num_frames: 14
            max_scale: 2.5
            min_scale: 1.0

The sigma_max property is not yet supported by the EulerDiscreteScheduler in diffusers I think. Maybe that could be a potential issue causing bad results? I'm hoping model conversion went correctly since there were no unexpected/missing key errors with strict mode but if anyone could verify, it'd be great.

scripts/convert_svd_to_diffusers.py

Yanting-K · 2024-02-05T02:27:11Z

I'd appreciate some help with debugging.

Testing code
DDIM EulerDiscrete

The sampling config used in the authors implementation is:
    sampler_config:
      target: sgm.modules.diffusionmodules.sampling.EulerEDMSampler
      params:
        num_steps: 25
        discretization_config:
          target: sgm.modules.diffusionmodules.discretizer.EDMDiscretization
          params:
            sigma_max: 700.0

        guider_config:
          target: sgm.modules.diffusionmodules.guiders.LinearPredictionGuider
          params:
            num_frames: 14
            max_scale: 2.5
            min_scale: 1.0
The sigma_max property is not yet supported by the EulerDiscreteScheduler in diffusers I think. Maybe that could be a potential issue causing bad results? I'm hoping model conversion went correctly since there were no unexpected/missing key errors with strict mode but if anyone could verify, it'd be great.

Any other questions about this pr, I'm ready to reenact motionctrl. Maybe I can give a little help.

a-r-r-o-w · 2024-02-06T17:25:13Z

Any other questions about this pr, I'm ready to reenact motionctrl. Maybe I can give a little help.

Thanks for trying to look into this @Yanting-K! Nope, I do not have any other questions regarding the implementation. Just haven't found time to debug and fix this yet :(

cc @DN6

a-r-r-o-w · 2024-02-07T18:56:02Z

The mistake above that caused bad results was using the wrong image encoder checkpoint. Since the authors freeze all layers of SVD and just train the attn2 and cc_projection layers in the UNet, we can reuse the image encoder/vae from SVD.

Some results (manually downscaled):

Unfortunately, there is not much object movement here due to no checkpoints for the Object Motion Control module for SVD. They only demonstrate the Camera control module, which works great as we can see but is not enough control the community wants with SVD. More controllability exists in DragNUWA with SVD, which is something I've been working on parallely and will open a PR for it shortly. Maybe the OMCM for VideoCrafter can be used but I haven't found the time to experiment or look into more details. I will be working on adding the Crafter family of models very soon, and so will take a look at OMCM stuff then.

@DN6 @sayakpaul @patrickvonplaten I believe this is ready for an initial review. I know the changes are not ideal because of the MotionCtrl-specific additions to the UNet/Attention code. Let me know how to go about it for community/core addition. Thanks

a-r-r-o-w · 2024-02-07T19:06:09Z

There is one thing I do not understand here though... maybe @wzhouxiff could help me out. Why do you multiply the camera_poses[:, : -1] with [3, 1, 4] for rescaling? The speed makes sense to me, but this just seemed a little random. The results seem to be the same with/without it

a-r-r-o-w · 2024-02-14T18:40:55Z

@sayakpaul @DN6 I have verified that the implementation is faithful to the original repository for SVD and should be a good candidate for a community version. Since this pipeline involves some light modification to the unet attention layers, how would you suggest I convert to a single file pipeline? Is a, sort of hacky, overriding of the forward method and somehow pushing the cc_projection layers into the appropriate places sound good? This is how it's actually done the original repository.

sayakpaul · 2024-02-15T07:26:27Z

You can, for now, then add it to research_projects with all the modeling changes and pipelining code. This is how it's done for ControlNetXS, for example.

HuggingFaceDocBuilderDev · 2024-02-15T07:33:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w · 2024-02-15T12:48:56Z

Thanks for running the tests! Seems like everything passed as failing tests are unrelated which means I haven't (hopefully) broken any existing code.

a-r-r-o-w added 8 commits February 4, 2024 17:22

add legacy behaviour flag for autoencoder temporal

58a9534

add camera proj and added_cond_kwargs to handle camera pose

38f6b53

update conversion script to handle newer svd checkpoints

84dd94b

Merge branch 'main' into motionctrl

8a92a76

reorganize imports

e17e458

temporarily add motionctrl conversion to svd script

b528f55

push_to_hub

5e470cd

fix push_to_hub fp32

c6c499a

a-r-r-o-w added 5 commits February 4, 2024 20:14

begin pipeline

61454c0

added cond kwargs for camera pose

9670e57

handle relative camera pose

eee6471

unsqueeze and repeat

f3bc672

rename class

bbafde8

a-r-r-o-w mentioned this pull request Feb 4, 2024

StableVideoDiffusionPipeline cannot use from_single_file #6839

Closed

sayakpaul requested a review from DN6 February 4, 2024 17:08

sayakpaul reviewed Feb 4, 2024

View reviewed changes

scripts/convert_svd_to_diffusers.py Outdated Show resolved Hide resolved

fix

2c8bf06

a-r-r-o-w changed the title ~~[WIP] MotionCtrl~~ [WIP] MotionCtrl SVD Feb 5, 2024

a-r-r-o-w added 8 commits February 6, 2024 23:14

Merge branch 'main' into motionctrl

d00f79b

revert changes to scripts/convert_svd_to_diffusers.py

4018dae

add conversion script for motionctrl

1db684c

make model loading strict

e1e2beb

rename script

d21bce6

fix script

4b66b4f

remove unnecessary lines in script

adb2caf

Merge branch 'main' into motionctrl

1aa8b6d

a-r-r-o-w added 2 commits February 7, 2024 22:57

repeat_interleave is not an inplace operation dummy

a5a45b0

add camera speed

7769f0e

a-r-r-o-w changed the title ~~[WIP] MotionCtrl SVD~~ MotionCtrl SVD Feb 7, 2024

a-r-r-o-w mentioned this pull request Feb 18, 2024

[Community] MotionCtrl SVD #7005

Closed

6 tasks

a-r-r-o-w closed this Feb 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MotionCtrl SVD #6844

MotionCtrl SVD #6844

Uh oh!

a-r-r-o-w commented Feb 4, 2024 •

edited

Loading

Uh oh!

a-r-r-o-w commented Feb 4, 2024 •

edited

Loading

Uh oh!

a-r-r-o-w commented Feb 4, 2024 •

edited

Loading

Uh oh!

Uh oh!

Yanting-K commented Feb 5, 2024 •

edited

Loading

Uh oh!

a-r-r-o-w commented Feb 6, 2024

Uh oh!

a-r-r-o-w commented Feb 7, 2024 •

edited

Loading

Uh oh!

a-r-r-o-w commented Feb 7, 2024

Uh oh!

a-r-r-o-w commented Feb 14, 2024 •

edited

Loading

Uh oh!

sayakpaul commented Feb 15, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Feb 15, 2024

Uh oh!

a-r-r-o-w commented Feb 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MotionCtrl SVD #6844

MotionCtrl SVD #6844

Uh oh!

Conversation

a-r-r-o-w commented Feb 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

a-r-r-o-w commented Feb 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Feb 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Yanting-K commented Feb 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Feb 6, 2024

Uh oh!

a-r-r-o-w commented Feb 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

a-r-r-o-w commented Feb 7, 2024

Uh oh!

a-r-r-o-w commented Feb 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul commented Feb 15, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Feb 15, 2024

Uh oh!

a-r-r-o-w commented Feb 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

a-r-r-o-w commented Feb 4, 2024 •

edited

Loading

a-r-r-o-w commented Feb 4, 2024 •

edited

Loading

a-r-r-o-w commented Feb 4, 2024 •

edited

Loading

Yanting-K commented Feb 5, 2024 •

edited

Loading

a-r-r-o-w commented Feb 7, 2024 •

edited

Loading

a-r-r-o-w commented Feb 14, 2024 •

edited

Loading