The full dataset is over 19TB, luckily Meta has a subset of the dataset available for download.
First, you need to clone the repository:
git clone git@github.com:facebookresearch/uco3d.git
Then you can download the subset of the dataset, which is roughly 10GB:
python dataset_download/download_dataset.py --download_small_subset --download_folder <SMALL_SUBSET_DESTINATION_FOLDER>
To parse the dataset into FiftyOne format, you can run the following command:
python parse_dataset.py
This will create a FiftyOne dataset named "UCO3D_Dataset" containing all the objects found in the current directory. Each object will have:
- RGB video view
- Segmentation mask video
- 3D point cloud scene
The script will:
- Convert any PLY point clouds to FiftyOne 3D scenes (.fo3d files)
- Create a persistent dataset that groups the different views of each object
- Add metadata for each sample including category, subcategory and object ID
- Save the dataset for future use
The script will print progress as it processes files and output a summary of how many objects were added.
Note: This dataset doesn't have any camera intrinsics or extrinsics, so the point clouds are not necessarily aligned with the RGB videos.
uCO3D (UnCommon Objects in 3D) is a large-scale dataset of high-quality 360° videos of real-life objects with comprehensive 3D annotations.
It contains 170,000 objects across more than 1,000 diverse object categories derived from the LVIS taxonomy. Each object is captured in a turntable-like video with full 360° coverage, ensuring all sides of the object are visible. The dataset provides accurate 3D annotations including camera parameters, point clouds, and complete 3D Gaussian Splatting reconstructions. All objects are normalized to a canonical reference frame, facilitating training of generative models.
With its combination of scale, diversity, and quality, uCO3D addresses limitations in existing datasets, providing a superior resource for 3D deep learning research and applications.
- Curated by: Xingchen Liu, Piyush Tayal, Jianyuan Wang, Jesus Zarzar, Tom Monnier, Konstantinos Tertikas, Jiali Duan, Antoine Toisoul, Jason Y. Zhang, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, David Novotny (Meta AI and collaborating institutions)
- Funded by: Meta AI
- Shared by: Meta AI
- Language(s) (NLP): en
- License: Not explicitly stated in the paper (likely Meta AI research license)
- Repository: https://github.com/facebookresearch/uco3d
- Hugging Face Dataset Repo: https://huggingface.co/datasets/facebook/uco3d
- Paper: arXiv:2501.07574v1 [cs.CV] (Jan 13, 2025)
- Demo: https://uco3d.github.io
The uCO3D dataset is specifically designed for:
- Training feedforward few-view 3D reconstruction models (like LightplaneLRM)
- Novel view synthesis using diffusion models (CAT3D-like approaches)
- Text-to-3D generation (Instant3D-like pipelines)
- Training large 3D generative models
- Digital twinning applications (creating 3D models of real-life objects)
- Research in 3D representation learning
- Transfer learning for 3D vision tasks
- Benchmarking 3D reconstruction algorithms
The dataset is not intended for:
- Real-time applications without appropriate downstream models
- Human activity or behavioral analysis
- Surveillance or security applications
- Generating photorealistic synthetic human identities
- Classes not represented in the 1,070 LVIS categories
- Medical or safety-critical applications
- Applications requiring perfect geometric accuracy (as SfM reconstructions have inherent limitations)
The full dataset consists of 170,000 video sequences organized by object category. Each sequence contains:
- High-resolution video (60%+ at 1080p+) with full 360° coverage of a single object
- 200 uniformly sampled frames with corresponding camera parameters (intrinsics and extrinsics)
- Object segmentation masks for each frame
- Sparse and dense SfM point clouds
- Complete 3D Gaussian Splat reconstruction
- Textual caption describing the object
- Normalized object-centric coordinate system alignment
Preview Subset: Users can download a small 52-video subset of uCO3D for preview and debugging purposes. This subset represents a random sample of categories and takes approximately 9.6 GB, which is more than 1000x smaller than the full dataset. This preview subset is ideal for initial experimentation and workflow development before committing to the full dataset.
The 1,070 object categories are organized into 50 super-categories, with approximately 20 subcategories per super-category.
uCO3D was created to address limitations in existing 3D datasets:
- CO3Dv2 has limited diversity (only 50 categories) and variable quality
- MVImgNet lacks full 360° coverage of objects
- High-quality 3D scan datasets like OmniObject3D have very limited scale
- Synthetic datasets like Objaverse lack realism
uCO3D aims to provide the optimal balance between quality, scale, and diversity for real object-centric 3D learning.
- Videos were captured by workers on Amazon Mechanical Turk
- Workers followed a sine-wave trajectory (varying elevation) instead of a simple circular path
- Each video was manually assessed for quality
- Object segmentation used text-conditioned Segment-Anything (langSAM) with XMem for temporal consistency
- 3D reconstruction used VGGSfM (more accurate than COLMAP used in previous datasets)
- Scene alignment procedure normalized objects to canonical reference frames
- 3D Gaussian Splat reconstruction was performed for each object using gsplat
- Captioning used a vision-language model for per-view captions, summarized with LLAMA3
Amazon Mechanical Turk workers captured the videos using consumer cameras. The workers were tasked with recording full 360° turntable-like videos of various objects, ensuring high resolution and complete coverage.
- Camera parameters: VGGSfM estimated intrinsic and extrinsic parameters for 200 frames per video
- Point clouds: VGGSfM generated sparse and dense point clouds
- Segmentation: langSAM provided initial segmentations, refined by XMem for temporal consistency
- 3D reconstruction: Each scene was reconstructed using 3D Gaussian Splatting
- Alignment: Objects were aligned to canonical reference frames through a multi-step procedure
- Captioning: Vision-language models generated per-view captions, synthesized by LLAMA3
The annotation quality was validated through both automated metrics and manual verification.
A combination of:
- Automated systems (VGGSfM, langSAM, XMem, 3D Gaussian Splatting, vision-language models)
- Meta AI research team (for quality control and validation)
- Amazon Mechanical Turk workers (for initial video capture)
The dataset focuses on object-centric videos and does not explicitly contain personal information. The objects captured are everyday items, and care was taken during the collection process to focus solely on the objects themselves.
- Category distribution may not be perfectly balanced
- Quality variations may exist across different object categories
- Object selection may reflect cultural biases of the data collectors
- SfM and 3DGS reconstructions have inherent technical limitations
- Some complex object materials (transparent, reflective) may have lower quality reconstructions
- The dataset is limited to static objects and does not include articulated or deformable objects
- Training on this dataset alone may not generalize to all real-world scenarios
- Consider the distribution of categories when training models
- Evaluate models on diverse test sets beyond this dataset
- Combine with synthetic data for categories underrepresented in this dataset
- Be aware of potential biases in object representation
- For highest quality results, filter by reconstruction quality metrics
- For generative models, consider augmenting with other datasets for improved diversity
The dataset card is based on information from the paper "UnCommon Objects in 3D" by Xingchen Liu et al.
For more information, contact the authors via the GitHub repository: https://github.com/facebookresearch/uco3d
@inproceedings{liu24uco3d,
Author = {Liu, Xingchen and Tayal, Piyush and Wang, Jianyuan
and Zarzar, Jesus and Monnier, Tom and Tertikas, Konstantinos
and Duan, Jiali and Toisoul, Antoine and Zhang, Jason Y.
and Neverova, Natalia and Vedaldi, Andrea
and Shapovalov, Roman and Novotny, David},
Booktitle = {arXiv},
Title = {UnCommon Objects in 3D},
Year = {2024},
}
APA: Liu, X., Tayal, P., Wang, J., Zarzar, J., Monnier, T., Tertikas, K., Duan, J., Toisoul, A., Zhang, J. Y., Neverova, N., Vedaldi, A., Shapovalov, R., & Novotny, D. (2025). UnCommon Objects in 3D. arXiv preprint arXiv:2501.07574.