VideoCC

VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models.

It is created using an automatic pipeline starting from the Conceptual Captions Image-Captioning Dataset.

Paper

More details are available in this paper at ECCV 2022. Please cite the paper if you use or discuss this dataset in your work.

@inproceedings{nagrani2022learning,
  title = {Learning Audio Video Modalities from Image Captions},
  author = {Nagrani, Arsha and Hongsuck Seo, Paul and Seybold, Bryan, and Hauth Anja, and Santiago, Manen, and Chen, Sun and Schmid, Cordelia},
  booktitle = {ECCV},
  year = {2022},
}

Data Format for VideoCC

The data is provided here(note the file should start downloading immediately) as a single CSV file with the following columns:

Video URL, Start timestamp (microseconds), End timestamp(microseconds), Caption.

The data will not be exactly the same as the dataset used to train models in the paper but should be similar. Note that some sections of YouTube videos might contain static frames that are panned in or out. These can be filtered out using a motion filtering tool.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets/images		assets/images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VideoCC

Paper

Data Format for VideoCC

About

Releases

Packages

License

google-research-datasets/videoCC-data

Folders and files

Latest commit

History

Repository files navigation

VideoCC

Paper

Data Format for VideoCC

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages