Latte
, a novel latent diffusion transformer for video generation, utilizes spatio-temporal tokens extracted from input videos and employs a series of Transformer blocks to model the distribution of videos in the latent space. Latte
achieves state-of-the-art performance on four standard video generation datasets FaceForensics
, SkyTimelapse
, UCF101
, and Taichi-HD
. paper, code, pretrained
However, Latte
still falls short in terms of video generation length and quality compared to Sora
. To achieve training and generation effects close to Sora, the Latte model requires more high-quality text-video paired datasets. Therefore, we have created VidFetch
, an open-source dataset download tool to obtain copyright-free videos from various free video websites.
website | windows | macos | linux |
---|---|---|---|
Pexels | ✔ | 📆 | 📆 |
Mazwai | 📆 | 📆 | 📆 |
Mixkit | ✔ | 📆 | ✔ |
Pixabay | ✔ | 📆 | 📆 |
Coverr | 📆 | 📆 | 📆 |
You can install the stable release on PyPI:
$ pip install vidfetch
or get the latest version by running:
$ pip install -U https://github.com/heatingma/VidFetch/archive/master.zip # with --user for user install (no root)
The following packages are required, and shall be automatically installed by pip
:
aiohttp>=3.9.3,
async_timeout>=4.0.3
tqdm>=4.66.2
texttable>=1.7.0
moviepy>=1.0.3
bs4>=0.0.2
selenium>=4.18.1
requests>=2.31.0
texttable>=1.7.0
huggingface_hub>=0.22.2
You only need three lines of code to start downloading the video
from vidfetch.website import MixkitVideoDataset
mixkit = MixkitVideoDataset(root_dir="mixkit")
mixkit.download(platform="windows")