Skip to content

Conversation

@a-r-r-o-w
Copy link
Contributor

What does this PR do?

It turns out we missed adding the pipeline docs in the Latte PR, and only added it for the Latte Transformer 3D model. @maxin-cn Let me know if you want anything else to be added.

Maybe a line about #8842 (ff-chunked inference) could be added once that's merged.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@yiyixuxu

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@maxin-cn
Copy link
Contributor

What does this PR do?

It turns out we missed adding the pipeline docs in the Latte PR, and only added it for the Latte Transformer 3D model. @maxin-cn Let me know if you want anything else to be added.

Maybe a line about #8842 (ff-chunked inference) could be added once that's merged.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@yiyixuxu

Thank you for continuing to refine the inference process of Latte. I don't have anything extra to add at the moment.

@a-r-r-o-w a-r-r-o-w requested a review from sayakpaul July 15, 2024 09:42

*We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte first extracts spatio-temporal tokens from input videos and then adopts a series of Transformer blocks to model video distribution in the latent space. In order to model a substantial number of tokens extracted from videos, four efficient variants are introduced from the perspective of decomposing the spatial and temporal dimensions of input videos. To improve the quality of generated videos, we determine the best practices of Latte through rigorous experimental analysis, including video clip patch embedding, model variants, timestep-class information injection, temporal positional embedding, and learning strategies. Our comprehensive evaluation demonstrates that Latte achieves state-of-the-art performance across four standard video generation datasets, i.e., FaceForensics, SkyTimelapse, UCF101, and Taichi-HD. In addition, we extend Latte to text-to-video generation (T2V) task, where Latte achieves comparable results compared to recent T2V models. We strongly believe that Latte provides valuable insights for future research on incorporating Transformers into diffusion models for video generation.*

**Highlights**: Latte is a latent diffusion transformer proposed as a backbone for modeling different modalities (trained for text-to-video generation here). It achieves state-of-the-art performance across four standard video benchmarks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we hyperlink the benchmarks section?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@a-r-r-o-w a-r-r-o-w Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the hyperlink to arxiv paper benchmarks section here? Or individual links to each of the four benchmarks?

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments.

@a-r-r-o-w
Copy link
Contributor Author

@sayakpaul Requesting another review before merge for latest changes

@sayakpaul sayakpaul merged commit 12625c1 into main Jul 18, 2024
@sayakpaul sayakpaul deleted the latte/pipeline-documentation branch July 18, 2024 03:57
Disty0 pushed a commit to Disty0/diffusers that referenced this pull request Jul 18, 2024
* add pipeline docs for latte

* add inference time to latte docs

* apply review suggestions
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
* add pipeline docs for latte

* add inference time to latte docs

* apply review suggestions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants