-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SVD #5895
Merged
Merged
Add SVD #5895
Changes from 186 commits
Commits
Show all changes
228 commits
Select commit
Hold shift + click to select a range
2f56481
begin model
patil-suraj 58883ee
finish blocks
patil-suraj 7de5d7c
add_embedding
patil-suraj cad51d4
addition_time_embed_dim
patil-suraj 45c9b56
use TimestepEmbedding
patil-suraj 669824e
fix temporal res block
patil-suraj ee9d7b8
fix time_pos_embed
patil-suraj ac94731
fix add_embedding
patil-suraj 5df09ef
add conversion script
patil-suraj c93606c
fix model
patil-suraj 7b64d3a
up
patil-suraj edf7121
add new resnet blocks
DN6 1bd09b1
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
DN6 d4cdfa3
make forward work
patil-suraj 165ed7c
return sample in original shape
patil-suraj 28dee6e
fix temb shape in TemporalResnetBlock
patil-suraj 85846f7
add spatio temporal transformers
DN6 8ee2807
add vae blocks
DN6 5218f46
fix blocks
DN6 47684da
update
DN6 9c9d467
update
DN6 6f87490
fix shapes in Alphablender and add time activation in res blcok
patil-suraj ffd9e26
use new blocks
patil-suraj c8ec445
style
patil-suraj 678d19f
fix temb shape
patil-suraj b0fc4fd
fix SpatioTemporalResBlock
patil-suraj 5a523e2
reuse TemporalBasicTransformerBlock
patil-suraj 20efe54
fix TemporalBasicTransformerBlock
patil-suraj 6610331
use TransformerSpatioTemporalModel
patil-suraj 29551f8
fix TransformerSpatioTemporalModel
patil-suraj af1e86a
fix time_context dim
patil-suraj 9117547
clean up
DN6 8c3fd58
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
DN6 6481e94
make temb optional
DN6 6c69c7a
add blocks
patil-suraj 8e1851a
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patil-suraj f976f5a
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patil-suraj 1f34311
rename model
patil-suraj f1457b7
update conversion script
patil-suraj 576fa1c
remove UNetMidBlockSpatioTemporal
patil-suraj f9def2a
add in init
patil-suraj 6c28367
remove unused arg
patil-suraj d8c9e67
remove unused arg
patil-suraj 9f22651
remove more unsed args
patil-suraj dff26ce
up
patil-suraj 0c4192b
up
patil-suraj 24b5c43
check for None
patil-suraj e684243
update vae
DN6 05eaec2
Merge branch 'test-v-old' into test-v
DN6 eefed8a
update up/mid blocks for decoder
DN6 37c428a
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
DN6 122a6bd
begin pipeline
patil-suraj 3e47d3c
adapt scheduler
patil-suraj b336529
add guidance scalings
patil-suraj 2f35e8c
fix norm eps in temporal transformers
patil-suraj 132fe97
add temporal autoencoder
DN6 beaaf18
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
DN6 efb1e5e
make pipeline run
patil-suraj e779833
fix frame decodig
patil-suraj f9954a0
decode in float32
patil-suraj 4d4469e
decode n frames at a time
patil-suraj 9da55b3
pass decoding_t to decode_latents
patil-suraj 4346ddd
fix decode_latents
patil-suraj 7ddd14b
vae encode/decode in fp32
patil-suraj df98627
fix dtype in TransformerSpatioTemporalModel
patil-suraj 0cf6c6b
type image_latents same as image_embeddings
patil-suraj d0017d9
allow using differnt eps in temporal block for video decoder
patil-suraj 9af07d1
fix default values in vae
patil-suraj 5316fb5
pass num frames in decode
patil-suraj b071aaa
switch spatial to temporal for mixing in VAE
patil-suraj 8bcf43d
fix num frames during split decoding
patil-suraj 268ffea
cast alpha to sample dtype
patil-suraj d930977
fix attention in MidBlockTemporalDecoder
patil-suraj 21148de
fix typo
patil-suraj 712b995
fix guidance_scales dtype
patil-suraj cf70b9a
fix missing activation in TemporalDecoder
patil-suraj c3bdeb8
skip_post_quant_conv
patil-suraj 6827a1d
add vae conversion
patil-suraj 96af28f
style
patil-suraj e34e9d9
take guidance scale as input
patil-suraj 2a46326
up
patil-suraj fdd182f
allow passing PIL to export_video
patil-suraj 1ce8ff5
accept fps as arg
patil-suraj cb49cbd
add pipeline and vae in init
patil-suraj 13b646e
remove hack
patil-suraj d614a33
use AutoencoderKLTemporalDecoder
patil-suraj f651c12
don't scale image latents
patil-suraj 760333d
add unet tests
DN6 af85fb1
clean up unet
patil-suraj 6adae54
clean TransformerSpatioTemporalModel
patil-suraj 7b6a0d4
add slow svd test
DN6 ab8076f
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
DN6 f7cf8c3
clean up
DN6 3fbe123
make temb optional in Decoder mid block
DN6 b8d84c4
fix norm eps in TransformerSpatioTemporalModel
patil-suraj 1b3cf2d
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patil-suraj 403a81c
clean up temp decoder
DN6 26ed460
clean up
DN6 82cf608
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
DN6 c9d1727
clean up
DN6 a193e49
use c_noise values for timesteps
patil-suraj 804bdeb
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patil-suraj a08ef00
use math for log
patil-suraj 3178b16
update
DN6 847bd0a
fix copies
patil-suraj 18930e0
doc
patil-suraj 90d8e83
upcast vae
patil-suraj 8620851
update forward pass for gradient checkpointing
DN6 ee9f7d2
make added_time_ids is tensor
patil-suraj c452d9c
up
patil-suraj 55b4d09
fix upcasting
patil-suraj 8bc4251
remove post quant conv
DN6 c5941a2
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
DN6 56e8fca
Merge branch 'main' into test-v
DN6 63335d2
add _resize_with_antialiasing
patil-suraj 479f58c
fix _compute_padding
patil-suraj da3e46b
cleanup model
patil-suraj 04171ee
more cleanup
patil-suraj ad213ee
more cleanup
patil-suraj 18cb6d5
more cleanup
patil-suraj 25cfe79
remove freeu
patil-suraj 58f1d61
remove attn slice
patil-suraj 924813a
small clean
patil-suraj ddad380
up
patil-suraj 0567fd0
up
patil-suraj 05c631e
remove extra step kwargs
patil-suraj ac00e32
remove eta
patil-suraj 200314d
remove dropout
patil-suraj 782205e
remove callback
patil-suraj b095e2e
remove merge factor args
patil-suraj e60b2fe
clean
patil-suraj 3d03e44
clean up
DN6 2613335
move to dedicated folder
DN6 0e64d43
remove attention_head_dim
patil-suraj 8e33cb3
docstr and small fix
patil-suraj 2dc556c
update unet doc strings
patil-suraj e3404fa
rename decoding_t
patil-suraj 73386b4
correct linting
patrickvonplaten be346ac
store c_skip and c_out
patil-suraj b74e587
cleanup
patil-suraj b5e6097
clean TemporalResnetBlock
patil-suraj 783f18d
more cleanup
patil-suraj 51aa79a
clean up vae
DN6 c5fc4f0
clean up
DN6 e10e159
begin doc
patil-suraj ad50592
more cleanup
patil-suraj a5c7782
up
patil-suraj a4ba8ef
up
patil-suraj 169ae20
doc
patil-suraj c2d83f0
Improve
patrickvonplaten dda9337
better naming
patrickvonplaten d7a71ed
better naming
patrickvonplaten 550b73f
better naming
patrickvonplaten 9cbe7d6
better naming
patrickvonplaten 1a1067a
better naming
patrickvonplaten 532b861
better naming
patrickvonplaten 878e3ea
better naming
patrickvonplaten 29e57f4
better naming
patrickvonplaten 036c04f
Apply suggestions from code review
patrickvonplaten 889b9e9
Default chunk size to None
patrickvonplaten eb30dde
add example
patil-suraj fbe0936
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patil-suraj 724a134
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patil-suraj aed458f
Better
patrickvonplaten 4ca4b33
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patrickvonplaten e732921
Apply suggestions from code review
patrickvonplaten 994bf57
update doc
patil-suraj ffc2a1e
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patil-suraj ad87aa4
Update src/diffusers/pipelines/stable_diffusion_video/pipeline_stable…
patil-suraj 4e60bb7
style
patil-suraj f37b782
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patil-suraj 36df75c
Get torch compile working
patrickvonplaten f107be7
up
patil-suraj dbc2d2d
rename
patil-suraj b69e753
fix doc
patil-suraj 57f11d6
add chunking
DN6 9bce8bb
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
DN6 43b63d6
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patrickvonplaten d27999e
torch compile
patrickvonplaten e17dda8
torch compile
patrickvonplaten 381ea56
add modelling outputs
DN6 0df06dd
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
DN6 79fbd84
torch compile
patrickvonplaten 4601fc1
Improve chunking
patrickvonplaten 562d9d0
Apply suggestions from code review
patrickvonplaten 2d513f7
Update docs/source/en/using-diffusers/svd.md
patrickvonplaten 5f3a2b8
Close diff tag
apolinario 6aba6e5
remove slicing
patil-suraj efd0a72
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patil-suraj 9162734
resnet docstr
patil-suraj a7342a1
add docstr in resnet
patil-suraj d409239
rename
patil-suraj 52ab94b
Apply suggestions from code review
patrickvonplaten eac5399
update tests
DN6 9fa5d12
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
DN6 ecc7882
Fix output type latents
patrickvonplaten 5143e01
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patrickvonplaten 58814b0
fix more
patrickvonplaten 21e627f
fix more
patrickvonplaten 8510c7e
Update docs/source/en/using-diffusers/svd.md
patrickvonplaten 557f638
fix more
patrickvonplaten deee57e
add pipeline tests
DN6 b33e42e
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
DN6 3b76055
remove unused arg
patil-suraj 5f278af
clean up
DN6 9320cb7
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
DN6 d73fa34
make sure get_scaling receives tensors
patil-suraj 7e42e28
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patil-suraj 5857cbc
fix euler scheduler
patil-suraj 206f457
fix get_scalings
patil-suraj 877e8bd
simply euler for now
patil-suraj 5619c72
remove old test file
patil-suraj c888b98
use randn_tensor to create noise
patil-suraj 109971b
fix device for rand tensor
patil-suraj f1be9ce
increase expected_max_difference
patil-suraj 4e75f06
fix test_inference_batch_single_identical
patil-suraj 46b129b
actually fix test_inference_batch_single_identical
patil-suraj 367426e
disable test_save_load_float16
patil-suraj d0895b1
skip test_float16_inference
patil-suraj 614f9ad
skip test_inference_batch_single_identical
patil-suraj 60625db
fix test_xformers_attention_forwardGenerator_pass
patil-suraj 8fc51ab
Apply suggestions from code review
patrickvonplaten fcf0790
update StableVideoDiffusionPipelineSlowTests
patil-suraj 66ded24
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patil-suraj 9962f91
update image
patil-suraj fbb131c
add diffusers example
patrickvonplaten 896485a
Merge branch 'test-v' of https://github.com/huggingface/diffusers int…
patrickvonplaten 4c04ca2
fix more
patrickvonplaten File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
<!--Copyright 2023 The HuggingFace Team. All rights reserved. | ||
|
||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
--> | ||
|
||
# Stable Video Diffusion | ||
|
||
[[open-in-colab]] | ||
|
||
[Stable Video Diffusion](https://static1.squarespace.com/static/6213c340453c3f502425776e/t/655ce779b9d47d342a93c890/1700587395994/stable_video_diffusion.pdf) is a powerful image-to-video generation model that can generate high resolution (576x1024) 2-4 second videos conditioned on the input image. | ||
|
||
This guide will show you how to use SVD to short generate videos from images. | ||
|
||
Before you begin, make sure you have the following libraries installed: | ||
|
||
```py | ||
!pip install -q -U diffusers transformers accelerate | ||
``` | ||
|
||
## Image to Video Generation | ||
|
||
The are two variants of SVD. [SVD](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) | ||
and [SVD-XT](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt). The svd checkpoint is trained to generate 14 frames and the svd-xt checkpoint is further | ||
finetuned to generate 25 frames. | ||
|
||
We will use the `svd-xt` checkpoint for this guide. | ||
|
||
```python | ||
import torch | ||
|
||
from diffusers import StableVideoDiffusionPipeline | ||
from diffusers.utils import load_image, export_to_video | ||
|
||
pipe = StableVideoDiffusionPipeline.from_pretrained( | ||
"stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16" | ||
) | ||
pipe.enable_model_cpu_offload() | ||
|
||
# Load the conditioning image | ||
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true") | ||
image = image.resize((1024, 576)) | ||
|
||
generator = torch.manual_seed(42) | ||
frames = pipe(image, decode_chunk_size=8, generator=generator).frames[0] | ||
|
||
export_to_video(frames, "generated.mp4", fps=7) | ||
``` | ||
|
||
<video width="1024" height="576" controls> | ||
<source src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket_generated.mp4?download=true" type="video/mp4"> | ||
</video> | ||
|
||
<Tip> | ||
Since generating videos is more memory intensive we can use the `decode_chunk_size` argument to control how many frames are decoded at once. This will reduce the memory usage. It's recommended to tweak this value based on your GPU memory. | ||
Setting `decode_chunk_size=1` will decode one frame at a time and will use the least amount of memory but the video might have some flickering. | ||
|
||
Additionally, we also use [model cpu offloading](../../optimization/memory#model-offloading) to reduce the memory usage. | ||
</Tip> | ||
|
||
|
||
patrickvonplaten marked this conversation as resolved.
Show resolved
Hide resolved
|
||
### Torch.compile | ||
|
||
You can achieve a 20-25% speed-up at the expense of slightly increased memory by compiling the UNet as follows: | ||
|
||
```diff | ||
- pipe.enable_model_cpu_offload() | ||
+ pipe.to("cuda") | ||
+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) | ||
|
||
patrickvonplaten marked this conversation as resolved.
Show resolved
Hide resolved
|
||
### Micro-conditioning | ||
patrickvonplaten marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Along with conditioning image Stable Diffusion Video also allows providing micro-conditioning that allows more control over the generated video. | ||
It accepts the following arguments: | ||
|
||
- `fps`: The frames per second of the generated video. | ||
- `motion_bucket_id`: The motion bucket id to use for the generated video. This can be used to control the motion of the generated video. Increasing the motion bucket id will increase the motion of the generated video. | ||
- `noise_aug_strength`: The amount of noise added to the conditioning image. The higher the values the less the video will resemble the conditioning image. Increasing this value will also increase the motion of the generated video. | ||
|
||
Here is an example of using micro-conditioning to generate a video with more motion. | ||
|
||
```python | ||
import torch | ||
|
||
from diffusers import StableVideoDiffusionPipeline | ||
from diffusers.utils import load_image, export_to_video | ||
|
||
pipe = StableVideoDiffusionPipeline.from_pretrained( | ||
"stabilityai/stable-video-diffusion-img2vid-xt", torch_dtype=torch.float16, variant="fp16" | ||
) | ||
pipe.enable_model_cpu_offload() | ||
|
||
# Load the conditioning image | ||
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true") | ||
image = image.resize((1024, 576)) | ||
|
||
generator = torch.manual_seed(42) | ||
frames = pipe(image, decode_chunk_size=8, generator=generator, motion_bucket_id=180, noise_aug_strength=0.1).frames[0] | ||
export_to_video(frames, "generated.mp4", fps=7) | ||
``` | ||
|
||
<video width="1024" height="576" controls> | ||
<source src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket_generated_motion.mp4?download=true" type="video/mp4"> | ||
</video> | ||
|
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
?download=true