Add SVD #5895

patil-suraj · 2023-11-22T12:38:32Z

What does this PR do?

Adds Stable Video Diffusion.

HuggingFaceDocBuilderDev · 2023-11-22T15:03:06Z

The documentation is not available anymore as the PR was closed or merged.

drhead · 2023-11-22T16:09:04Z

Is this PR going to add support for the temporally-aware VAE? I am currently working on porting that module and don't want to end up creating any conflicts.

edit: can disregard, I can now see that after the model components implemented here are complete, implementation of the VAE decoder itself would be a trivial matter.

patil-suraj · 2023-11-23T10:23:07Z

@drhead Yes, this PR will support everything related to SVD.

…o test-v

tin2tin · 2023-11-24T04:24:34Z

fp16 weights(not mine): https://huggingface.co/becausecurious/stable-video-diffusion-img2vid-fp16/tree/main

src/diffusers/models/unet_spatio_temporal_condition.py

…o test-v

patrickvonplaten · 2023-11-29T18:48:57Z

docs/source/en/using-diffusers/svd.md

+```
+
+<video width="1024" height="576" controls>
+  <source src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket_generated.mp4?download=true" type="video/mp4">


remove ?download=true

liuquande · 2023-12-05T02:36:46Z

Hi, would this PR consider adding model training/finetuning script for stable video diffusion, thanks!

jeff-da · 2023-12-18T00:18:24Z

FPS should be set as a constant somewhere? I see both 7 and 8 used.

diffusers/src/diffusers/utils/export_utils.py

Line 118 in 9cef07d

def export_to_video(

shliu0 · 2023-12-20T04:10:18Z

Hi, is this PR going to support LCM-LoRA like what have been done in SD image models?

* begin model * finish blocks * add_embedding * addition_time_embed_dim * use TimestepEmbedding * fix temporal res block * fix time_pos_embed * fix add_embedding * add conversion script * fix model * up * add new resnet blocks * make forward work * return sample in original shape * fix temb shape in TemporalResnetBlock * add spatio temporal transformers * add vae blocks * fix blocks * update * update * fix shapes in Alphablender and add time activation in res blcok * use new blocks * style * fix temb shape * fix SpatioTemporalResBlock * reuse TemporalBasicTransformerBlock * fix TemporalBasicTransformerBlock * use TransformerSpatioTemporalModel * fix TransformerSpatioTemporalModel * fix time_context dim * clean up * make temb optional * add blocks * rename model * update conversion script * remove UNetMidBlockSpatioTemporal * add in init * remove unused arg * remove unused arg * remove more unsed args * up * up * check for None * update vae * update up/mid blocks for decoder * begin pipeline * adapt scheduler * add guidance scalings * fix norm eps in temporal transformers * add temporal autoencoder * make pipeline run * fix frame decodig * decode in float32 * decode n frames at a time * pass decoding_t to decode_latents * fix decode_latents * vae encode/decode in fp32 * fix dtype in TransformerSpatioTemporalModel * type image_latents same as image_embeddings * allow using differnt eps in temporal block for video decoder * fix default values in vae * pass num frames in decode * switch spatial to temporal for mixing in VAE * fix num frames during split decoding * cast alpha to sample dtype * fix attention in MidBlockTemporalDecoder * fix typo * fix guidance_scales dtype * fix missing activation in TemporalDecoder * skip_post_quant_conv * add vae conversion * style * take guidance scale as input * up * allow passing PIL to export_video * accept fps as arg * add pipeline and vae in init * remove hack * use AutoencoderKLTemporalDecoder * don't scale image latents * add unet tests * clean up unet * clean TransformerSpatioTemporalModel * add slow svd test * clean up * make temb optional in Decoder mid block * fix norm eps in TransformerSpatioTemporalModel * clean up temp decoder * clean up * clean up * use c_noise values for timesteps * use math for log * update * fix copies * doc * upcast vae * update forward pass for gradient checkpointing * make added_time_ids is tensor * up * fix upcasting * remove post quant conv * add _resize_with_antialiasing * fix _compute_padding * cleanup model * more cleanup * more cleanup * more cleanup * remove freeu * remove attn slice * small clean * up * up * remove extra step kwargs * remove eta * remove dropout * remove callback * remove merge factor args * clean * clean up * move to dedicated folder * remove attention_head_dim * docstr and small fix * update unet doc strings * rename decoding_t * correct linting * store c_skip and c_out * cleanup * clean TemporalResnetBlock * more cleanup * clean up vae * clean up * begin doc * more cleanup * up * up * doc * Improve * better naming * better naming * better naming * better naming * better naming * better naming * better naming * better naming * Apply suggestions from code review * Default chunk size to None * add example * Better * Apply suggestions from code review * update doc * Update src/diffusers/pipelines/stable_diffusion_video/pipeline_stable_diffusion_video.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * style * Get torch compile working * up * rename * fix doc * add chunking * torch compile * torch compile * add modelling outputs * torch compile * Improve chunking * Apply suggestions from code review * Update docs/source/en/using-diffusers/svd.md * Close diff tag * remove slicing * resnet docstr * add docstr in resnet * rename * Apply suggestions from code review * update tests * Fix output type latents * fix more * fix more * Update docs/source/en/using-diffusers/svd.md * fix more * add pipeline tests * remove unused arg * clean up * make sure get_scaling receives tensors * fix euler scheduler * fix get_scalings * simply euler for now * remove old test file * use randn_tensor to create noise * fix device for rand tensor * increase expected_max_difference * fix test_inference_batch_single_identical * actually fix test_inference_batch_single_identical * disable test_save_load_float16 * skip test_float16_inference * skip test_inference_batch_single_identical * fix test_xformers_attention_forwardGenerator_pass * Apply suggestions from code review * update StableVideoDiffusionPipelineSlowTests * update image * add diffusers example * fix more --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: apolinário <joaopaulo.passos@gmail.com>

Kaihua-Chen · 2024-04-03T19:03:26Z

Thanks for supporting stable video diffusion! Should we consider this as the official implementation (e.g., was the performance verified with the original paper)?

* begin model * finish blocks * add_embedding * addition_time_embed_dim * use TimestepEmbedding * fix temporal res block * fix time_pos_embed * fix add_embedding * add conversion script * fix model * up * add new resnet blocks * make forward work * return sample in original shape * fix temb shape in TemporalResnetBlock * add spatio temporal transformers * add vae blocks * fix blocks * update * update * fix shapes in Alphablender and add time activation in res blcok * use new blocks * style * fix temb shape * fix SpatioTemporalResBlock * reuse TemporalBasicTransformerBlock * fix TemporalBasicTransformerBlock * use TransformerSpatioTemporalModel * fix TransformerSpatioTemporalModel * fix time_context dim * clean up * make temb optional * add blocks * rename model * update conversion script * remove UNetMidBlockSpatioTemporal * add in init * remove unused arg * remove unused arg * remove more unsed args * up * up * check for None * update vae * update up/mid blocks for decoder * begin pipeline * adapt scheduler * add guidance scalings * fix norm eps in temporal transformers * add temporal autoencoder * make pipeline run * fix frame decodig * decode in float32 * decode n frames at a time * pass decoding_t to decode_latents * fix decode_latents * vae encode/decode in fp32 * fix dtype in TransformerSpatioTemporalModel * type image_latents same as image_embeddings * allow using differnt eps in temporal block for video decoder * fix default values in vae * pass num frames in decode * switch spatial to temporal for mixing in VAE * fix num frames during split decoding * cast alpha to sample dtype * fix attention in MidBlockTemporalDecoder * fix typo * fix guidance_scales dtype * fix missing activation in TemporalDecoder * skip_post_quant_conv * add vae conversion * style * take guidance scale as input * up * allow passing PIL to export_video * accept fps as arg * add pipeline and vae in init * remove hack * use AutoencoderKLTemporalDecoder * don't scale image latents * add unet tests * clean up unet * clean TransformerSpatioTemporalModel * add slow svd test * clean up * make temb optional in Decoder mid block * fix norm eps in TransformerSpatioTemporalModel * clean up temp decoder * clean up * clean up * use c_noise values for timesteps * use math for log * update * fix copies * doc * upcast vae * update forward pass for gradient checkpointing * make added_time_ids is tensor * up * fix upcasting * remove post quant conv * add _resize_with_antialiasing * fix _compute_padding * cleanup model * more cleanup * more cleanup * more cleanup * remove freeu * remove attn slice * small clean * up * up * remove extra step kwargs * remove eta * remove dropout * remove callback * remove merge factor args * clean * clean up * move to dedicated folder * remove attention_head_dim * docstr and small fix * update unet doc strings * rename decoding_t * correct linting * store c_skip and c_out * cleanup * clean TemporalResnetBlock * more cleanup * clean up vae * clean up * begin doc * more cleanup * up * up * doc * Improve * better naming * better naming * better naming * better naming * better naming * better naming * better naming * better naming * Apply suggestions from code review * Default chunk size to None * add example * Better * Apply suggestions from code review * update doc * Update src/diffusers/pipelines/stable_diffusion_video/pipeline_stable_diffusion_video.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * style * Get torch compile working * up * rename * fix doc * add chunking * torch compile * torch compile * add modelling outputs * torch compile * Improve chunking * Apply suggestions from code review * Update docs/source/en/using-diffusers/svd.md * Close diff tag * remove slicing * resnet docstr * add docstr in resnet * rename * Apply suggestions from code review * update tests * Fix output type latents * fix more * fix more * Update docs/source/en/using-diffusers/svd.md * fix more * add pipeline tests * remove unused arg * clean up * make sure get_scaling receives tensors * fix euler scheduler * fix get_scalings * simply euler for now * remove old test file * use randn_tensor to create noise * fix device for rand tensor * increase expected_max_difference * fix test_inference_batch_single_identical * actually fix test_inference_batch_single_identical * disable test_save_load_float16 * skip test_float16_inference * skip test_inference_batch_single_identical * fix test_xformers_attention_forwardGenerator_pass * Apply suggestions from code review * update StableVideoDiffusionPipelineSlowTests * update image * add diffusers example * fix more --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: apolinário <joaopaulo.passos@gmail.com>

patil-suraj added 5 commits November 21, 2023 16:39

begin model

2f56481

finish blocks

58883ee

add_embedding

7de5d7c

addition_time_embed_dim

cad51d4

use TimestepEmbedding

45c9b56

patil-suraj added 6 commits November 22, 2023 17:44

fix temporal res block

669824e

fix time_pos_embed

ee9d7b8

fix add_embedding

ac94731

add conversion script

5df09ef

fix model

c93606c

up

7b64d3a

DN6 and others added 9 commits November 23, 2023 10:53

add new resnet blocks

edf7121

Merge branch 'test-v' of https://github.com/huggingface/diffusers int…

1bd09b1

…o test-v

make forward work

d4cdfa3

return sample in original shape

165ed7c

fix temb shape in TemporalResnetBlock

28dee6e

add spatio temporal transformers

85846f7

add vae blocks

8ee2807

fix blocks

5218f46

update

47684da

update

9c9d467

tin2tin mentioned this pull request Nov 24, 2023

Stable video diffusion #5889

Closed

2 tasks

patil-suraj added 4 commits November 24, 2023 08:57

fix shapes in Alphablender and add time activation in res blcok

6f87490

use new blocks

ffd9e26

style

c8ec445

fix temb shape

678d19f

patil-suraj added 12 commits November 29, 2023 12:20

fix get_scalings

206f457

simply euler for now

877e8bd

remove old test file

5619c72

use randn_tensor to create noise

c888b98

fix device for rand tensor

109971b

increase expected_max_difference

f1be9ce

fix test_inference_batch_single_identical

4e75f06

actually fix test_inference_batch_single_identical

46b129b

disable test_save_load_float16

367426e

skip test_float16_inference

d0895b1

skip test_inference_batch_single_identical

614f9ad

fix test_xformers_attention_forwardGenerator_pass

60625db

patrickvonplaten reviewed Nov 29, 2023

View reviewed changes

src/diffusers/models/unet_spatio_temporal_condition.py Outdated Show resolved Hide resolved

patrickvonplaten and others added 7 commits November 29, 2023 16:42

Apply suggestions from code review

8fc51ab

update StableVideoDiffusionPipelineSlowTests

fcf0790

Merge branch 'test-v' of https://github.com/huggingface/diffusers int…

66ded24

…o test-v

update image

9962f91

add diffusers example

fbb131c

Merge branch 'test-v' of https://github.com/huggingface/diffusers int…

896485a

…o test-v

fix more

4c04ca2

patrickvonplaten merged commit 63f767e into main Nov 29, 2023
22 checks passed

patil-suraj deleted the test-v branch November 29, 2023 18:14

patrickvonplaten reviewed Nov 29, 2023

View reviewed changes

a-r-r-o-w mentioned this pull request Feb 4, 2024

StableVideoDiffusionPipeline cannot use from_single_file #6839

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SVD #5895

Add SVD #5895

patil-suraj commented Nov 22, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 22, 2023 •

edited

Loading

drhead commented Nov 22, 2023 •

edited

Loading

patil-suraj commented Nov 23, 2023

tin2tin commented Nov 24, 2023

patrickvonplaten Nov 29, 2023

liuquande commented Dec 5, 2023

jeff-da commented Dec 18, 2023

shliu0 commented Dec 20, 2023

Kaihua-Chen commented Apr 3, 2024

Add SVD #5895

Add SVD #5895

Conversation

patil-suraj commented Nov 22, 2023 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Nov 22, 2023 • edited Loading

drhead commented Nov 22, 2023 • edited Loading

patil-suraj commented Nov 23, 2023

tin2tin commented Nov 24, 2023

patrickvonplaten Nov 29, 2023

Choose a reason for hiding this comment

liuquande commented Dec 5, 2023

jeff-da commented Dec 18, 2023

shliu0 commented Dec 20, 2023

Kaihua-Chen commented Apr 3, 2024

patil-suraj commented Nov 22, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 22, 2023 •

edited

Loading

drhead commented Nov 22, 2023 •

edited

Loading