# Using VGGT as a FiftyOne Remote Source Zoo Model

Let's start by downloading a dataset. This one of Marvel Masterpiece trading cards. Let's see how well the model handles this.

In [1]:
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub

dataset = load_from_hub("Voxel51/Total-Text-Dataset")

Downloading config file fiftyone.yml from Voxel51/Total-Text-Dataset


fiftyone.yml:   0%|          | 0.00/115 [00:00<?, ?B/s]

Loading dataset
Importing samples...
 100% |███████████████| 1555/1555 [62.9ms elapsed, 0s remaining, 24.7K samples/s]   
Migrating dataset 'Voxel51/Total-Text-Dataset' to v1.7.0
Downloading 1555 media files...


100%|██████████| 16/16 [00:56<00:00,  3.55s/it]


## Set up the model source

First, you need to register the model source. You can do so as shown here:

In [2]:
import fiftyone.zoo as foz

foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/vggt",
    overwrite=True
)

Downloading https://github.com/harpreetsahota204/vggt...
 100% |████|  128.5Kb/128.5Kb [49.1ms elapsed, 0s remaining, 2.7Mb/s] 
Overwriting existing model source '/home/harpreet/fiftyone/__models__/VGGT'


Next, you need to instantiate the model:

In [None]:
model = foz.load_zoo_model(
    "facebook/VGGT-1B",
    mode="crop", # you can also pass "pad",
    confidence_threshold=0.7
    )

Finally, you can apply the model to your dataset:

In [None]:
dataset.apply_model(model, "depth_map_path")

  with torch.cuda.amp.autocast(enabled=False):


  82% |████████████|--| 1269/1555 [6.6m elapsed, 1.5m remaining, 3.3 samples/s]   

Note, we are saving only the paths to the depth map as a dummy field. We won't have these as a part of our original dataset, instead we will create a Grouped Dataset (shown below):

In [None]:
import fiftyone as fo
import os
from pathlib import Path

# Get filepaths from your existing dataset
filepaths = dataset.values("filepath")

# Create a new grouped dataset
grouped_dataset = fo.Dataset("vggt_results", overwrite=True)
grouped_dataset.add_group_field("group", default="rgb")

# Process each filepath and create the group structure
samples = []
for filepath in filepaths:
    # Extract base information from the filepath
    path = Path(filepath)
    base_dir = path.parent
    base_name = path.stem
    
    # Create paths for each modality following your pattern
    rgb_path = filepath  # Original filepath (RGB)
    depth_path = os.path.join(base_dir, f"{base_name}_depth.png")  # Depth map
    threed_path = os.path.join(base_dir, f"{base_name}.fo3d")  # 3D point cloud
    
    # Create a group for these related samples
    group = fo.Group()
    
    # Create a sample for each modality with the appropriate group element
    rgb_sample = fo.Sample(filepath=rgb_path, group=group.element("rgb"))
    depth_sample = fo.Sample(filepath=depth_path, group=group.element("depth"))
    threed_sample = fo.Sample(filepath=threed_path, group=group.element("threed"))
    
    # Add samples to the list
    samples.extend([rgb_sample, depth_sample, threed_sample])

# Add all samples to the dataset
grouped_dataset.add_samples(samples)

Now we can view the results in the FiftyOne App

In [None]:
fo.launch_app(grouped_dataset)