# 🏥 Visual AI in Healthcare with FiftyOne – VLMs and C-RADIO for ARCADE Dataset
**Exploring visual representations in medical datasets with VLMs and embedding search**

This notebook is part of the **“Visual AI in Healthcare with FiftyOne”** workshop. Through hands-on examples, we explore how to analyze medical datasets using Visual Language Models (VLMs) and embedding-based search to select representative samples for downstream tasks like fine-tuning object detection models.

🔬 **What you’ll learn in this notebook:**

- How to **load the ARCADE dataset** from Hugging Face using FiftyOne utilities  
- How to **extract embeddings** using NVIDIA’s **C-RADIO** embedding model  
- How to **perform similarity and uniqueness queries** with FiftyOne Brain  
- How to **select the most unique and representative images** for training  
- How to **visualize the filtered results** interactively in the FiftyOne App  
- How to **export curated datasets** and share them on Hugging Face  

📚 **Part of the notebook series:**
1. `01_load_arcade_dataset.ipynb` – Load and visualize the ARCADE dataset.  
2. `02_load_deeplesion_balanced.ipynb` – Curate and balance the DeepLesion dataset.  
3. `03_vlms_analysis_arcade.ipynb` – Use VFMs like NVLabs_CRADIOV3 in dataset undersatnding for ARCADE. 
4. `04_finetune_yolo8_stenosis.ipynb` – Train and integrate YOLOv8 for stenosis detection.  
5. `05_medsam2_ct_scan.ipynb` – Run MedSAM2 on CT scans for segmentation.  
6. `06_nvidia_vista_segmentation.ipynb` – Explore NVIDIA-VISTA-3D.  
7. `07_medgemma_vqa.ipynb` – Perform visual question answering and classification with MedGemma.

All notebooks are standalone but are best experienced sequentially.


## 📥 Load the ARCADE dataset from Hugging Face Hub

We begin by importing the ARCADE dataset, a CT angiography dataset curated for stenosis detection, directly from the Hugging Face Hub using FiftyOne's `load_from_hub()` utility.

Before loading, we check if a dataset with the same name already exists in your local FiftyOne environment. If it does, we delete it to ensure a clean workspace for this notebook.

Key points:

- Uses `Voxel51/ARCADE_FO` as the dataset name (FiftyOne handles this internally without slashes)
- Ensures no name conflict by deleting any previously loaded dataset with the same name
- Downloads the `train` split from Hugging Face for interactive querying and filtering


In [2]:
import fiftyone as fo

# Name used internally by FiftyOne (it does not use slashes like 'Voxel51/BTCV-...')
dataset_name = "Voxel51/ARCADE_FO"

# Delete the dataset if it exists
if fo.dataset_exists(dataset_name):
    fo.delete_dataset(dataset_name)
    print(f"Deleted existing dataset: {dataset_name}")
else:
    print(f"No dataset found with name: {dataset_name}")

Deleted existing dataset: Voxel51/ARCADE_FO


In [3]:
import fiftyone as fo

from fiftyone.utils.huggingface import load_from_hub

dataset = load_from_hub(
    "Voxel51/ARCADE_FO", split="train"
    )

Downloading config file fiftyone.yml from Voxel51/ARCADE_FO
Loading dataset
Ignoring unsupported parameter 'splits' for importer type <class 'fiftyone.utils.data.importers.FiftyOneDatasetImporter'>
Importing samples...
 100% |███████████████| 3000/3000 [115.2ms elapsed, 0s remaining, 26.2K samples/s]  


## 🔌 Registering a Remotely-Sourced Zoo Model: C-RADIOv3

FiftyOne allows you to register and use **remotely-sourced zoo models**, meaning model definitions and configurations can be hosted outside of the FiftyOne codebase — for example, in GitHub repositories or public URLs.

This capability enables developers and researchers to contribute and share models that are easily accessible and fully compatible with FiftyOne’s Zoo API.

### 💡 What is a Remotely-Sourced Zoo Model?

Instead of being built-in, a remotely-sourced model is hosted externally but can be registered and used just like native FiftyOne zoo models. You only need to provide a URL to the model's repository or archive.

📌 You can register a model source via:
- GitHub repository (e.g. `https://github.com/<user>/<repo>`)
- GitHub refs (`/tree/<branch>` or `/commit/<sha>`)
- Archive URLs (e.g. `.zip`, `.tar.gz`)

---

### 🤖 C-RADIOv3 Models

In this notebook, we are using a registered model, the [C-RADIOv3 model family](https://github.com/harpreetsahota204/NVLabs_CRADIOV3), created by Harpreet Sahota. These models are designed for **semantic image embeddings** using Vision Transformers and support a variety of architectures and trade-offs between speed and performance.

| Model Name              | Description        | Architecture | Patch Size | Best For                          |
|-------------------------|--------------------|--------------|------------|-----------------------------------|
| `nv_labs/c-radio_v3-b`  | C-RADIOv3-B        | ViT-B/16     | 16×16      | Fast inference, moderate accuracy |
| `nv_labs/c-radio_v3-l`  | C-RADIOv3-L        | ViT-L/16     | 16×16      | Balanced performance              |
| `nv_labs/c-radio_v3-h`  | C-RADIOv3-H        | ViT-H/16     | 16×16      | High accuracy (recommended)       |
| `nv_labs/c-radio_v3-g`  | C-RADIOv3-G        | ViT-H/14     | 14×14      | Maximum performance               |

Once registered, these models can be directly used via FiftyOne’s `load_zoo_model()` API for embedding generation, visualization, and semantic search.

We’ll explore those capabilities in the next steps.


Next, you register the zoo model source:


In [4]:
import fiftyone.zoo as foz

foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/NVLabs_CRADIOV3"
    )

Finally, instantiate the model. Let's start with computing embeddings.

Note: Refer to the [README](https://github.com/harpreetsahota204/NVLabs_CRADIOV3/blob/main/README.md) for available model checkpoints.

In [5]:
radio_embeddings_model = foz.load_zoo_model(
    "nv_labs/c-radio_v3-h",
    feature_format="NCHW", # you can also pass NLC here
)

Using cache found in /Users/paularamos/.cache/torch/hub/NVlabs_RADIO_main
Using cache found in /Users/paularamos/.cache/torch/hub/NVlabs_RADIO_main


You can compute embeddings as follows:

In [6]:
dataset.compute_embeddings(
    model=radio_embeddings_model,
    embeddings_field="radio_embeddings",
)

 100% |███████████████| 3000/3000 [47.1m elapsed, 0s remaining, 1.1 samples/s]      


Once you have your embeddings, you can compute the visualization to visualize in the FiftyOne App:

In [14]:
import fiftyone.brain as fob

 
results = fob.compute_visualization(
    dataset,
    model=radio_embeddings_model,
    method="umap",  # "umap", "tsne", "pca", etc
    brain_key="radio_viz",
    embeddings="radio_embeddings"
)

Generating visualization...




UMAP( verbose=True)
Fri Jul  4 08:41:46 2025 Construct fuzzy simplicial set
Fri Jul  4 08:41:50 2025 Finding Nearest Neighbors
Fri Jul  4 08:41:52 2025 Finished Nearest Neighbor Search
Fri Jul  4 08:41:53 2025 Construct embedding


Epochs completed:  55%| █████▌     277/500 [00:00]

	completed  0  /  500 epochs
	completed  50  /  500 epochs
	completed  100  /  500 epochs
	completed  150  /  500 epochs
	completed  200  /  500 epochs
	completed  250  /  500 epochs


Epochs completed: 100%| ██████████ 500/500 [00:00]


	completed  300  /  500 epochs
	completed  350  /  500 epochs
	completed  400  /  500 epochs
	completed  450  /  500 epochs
Fri Jul  4 08:41:54 2025 Finished embedding


In [20]:
dataset.reload()
dataset.persistent = True

# Make sure this is run after all processing
session = fo.launch_app(dataset, port=5151, auto=False)

Session launched. Run `session.show()` to open the App in a cell output.


You can also build a similarity index over the embeddings to find similar samples in your dataset:

In [None]:
import fiftyone.brain as fob

results = fob.compute_similarity(
    dataset,
    model=radio_embeddings_model,
    backend="sklearn",  # "sklearn", "qdrant", "redis", etc
    brain_key="radio_sim",
    embeddings_field="radio_embeddings"
)

With your computed embeddings you can also perform other embeddings based workflows such as computing uniqueness values:

In [16]:
import fiftyone.brain as fob

fob.compute_uniqueness(
    dataset,
    model=radio_embeddings_model,
    uniqueness_field="radio_uniqueness",
    similarity_index="radio_sim"
    )

Retrieving embeddings from similarity index...
Computing uniqueness...
Uniqueness computation complete


You can also compute representativeness scores:

In [17]:
import fiftyone.brain as fob

fob.compute_representativeness(
    dataset,
    model=radio_embeddings_model,
    representativeness_field="radio_representativeness",
    similarity_index="radio_sim"
    )

Retrieving embeddings from similarity index...
Computing representativeness...
Computing clusters for 3000 embeddings; this may take awhile...
Representativeness computation complete


In [22]:
# Top 20 most unique
unique_view = dataset.sort_by("radio_uniqueness", reverse=True).limit(20)
unique_view.tag_samples("top_unique")

# Top 20 most representative
rep_view = dataset.sort_by("radio_representativeness", reverse=True).limit(20)
rep_view.tag_samples("top_representative")

In [24]:
dataset.reload()
print(dataset)

Name:        Voxel51/ARCADE_FO
Media type:  image
Num samples: 3000
Persistent:  True
Tags:        []
Sample fields:
    id:                       fiftyone.core.fields.ObjectIdField
    filepath:                 fiftyone.core.fields.StringField
    tags:                     fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:                 fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:               fiftyone.core.fields.DateTimeField
    last_modified_at:         fiftyone.core.fields.DateTimeField
    phase:                    fiftyone.core.fields.StringField
    task:                     fiftyone.core.fields.StringField
    subset_name:              fiftyone.core.fields.StringField
    segmentations:            fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    coco_id:                  fiftyone.core.fields.IntField
    default_embedding:        fiftyone.core.fields.VectorField
 

### Computing Spatial Features

You can also compute spatial features. To use this feature you need to set `output_type="spatial"`, additionally spatial features only supports `feature_format="NCHW"`.

You can choose to do some Gaussian smoothing if you'd like, just set `apply_smoothing=True` and choose a value for `smoothing_sigma`.

In [7]:
radio_spatial_model = foz.load_zoo_model(
    "nv_labs/c-radio_v3-h",
    output_type="spatial",
    apply_smoothing=True, # if you want smoothing
    smoothing_sigma=0.8, # how much smoothing you want to apply
    feature_format="NCHW" #this is the required for the heatmap
) 

Using cache found in /Users/paularamos/.cache/torch/hub/NVlabs_RADIO_main
Using cache found in /Users/paularamos/.cache/torch/hub/NVlabs_RADIO_main


Notice that we are using the `apply_model` method here, as we are not computing 1D embeddings like above.

In [8]:
dataset.apply_model(
    radio_spatial_model,
    "radio_spatial_features"
)

 100% |███████████████| 3000/3000 [43.1m elapsed, 0s remaining, 1.2 samples/s]      


You can view your results in the app like so:

In [19]:
fo.launch_app(dataset, port=5151, auto=False)

Session launched. Run `session.show()` to open the App in a cell output.


Dataset:          Voxel51/ARCADE_FO
Media type:       image
Num samples:      3000
Selected samples: 0
Selected labels:  0
Session URL:      http://localhost:5151/

In [11]:
dataset.persistent=True
print(dataset)

Name:        Voxel51/ARCADE_FO
Media type:  image
Num samples: 3000
Persistent:  True
Tags:        []
Sample fields:
    id:                     fiftyone.core.fields.ObjectIdField
    filepath:               fiftyone.core.fields.StringField
    tags:                   fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:               fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
    created_at:             fiftyone.core.fields.DateTimeField
    last_modified_at:       fiftyone.core.fields.DateTimeField
    phase:                  fiftyone.core.fields.StringField
    task:                   fiftyone.core.fields.StringField
    subset_name:            fiftyone.core.fields.StringField
    segmentations:          fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
    coco_id:                fiftyone.core.fields.IntField
    default_embedding:      fiftyone.core.fields.VectorField
    radio_embeddings:    

In [26]:
## Creating a subset for finetuning:

# 1. Filter samples where task == "stenosis"
stenosis_view = dataset.match({"task": "stenosis"})

# 2. Sort by both uniqueness and representativeness
# Normalize values for fairness in ranking
import numpy as np

uniqueness_values = [s["radio_uniqueness"] for s in stenosis_view if s["radio_uniqueness"] is not None]
representativeness_values = [s["radio_representativeness"] for s in stenosis_view if s["radio_representativeness"] is not None]

# Min-max normalization
min_u, max_u = min(uniqueness_values), max(uniqueness_values)
min_r, max_r = min(representativeness_values), max(representativeness_values)

def normalize(val, min_val, max_val):
    return (val - min_val) / (max_val - min_val + 1e-8)

# Add normalized score for ranking
for sample in stenosis_view:
    if sample.radio_uniqueness is not None and sample.radio_representativeness is not None:
        u = normalize(sample.radio_uniqueness, min_u, max_u)
        r = normalize(sample.radio_representativeness, min_r, max_r)
        score = u + r
        sample["combined_score"] = score
        sample.save()

# 3. Select top 300 samples by combined score
top300_view = stenosis_view.sort_by("combined_score", reverse=True).limit(300)
top300_view.tag_samples("top300_for_yolo")

# Optional: Launch view
session = fo.launch_app(top300_view, port=5151, auto=False)



Session launched. Run `session.show()` to open the App in a cell output.


In [35]:
import fiftyone as fo
import fiftyone.utils.random as four

# Step 1: Split using tags (this modifies the samples in-place)
four.random_split(top300_view, {"train": 0.8, "val": 0.2}, seed=51)

# Step 2: Create views using tags
train_view = top300_view.match_tags("train")
val_view = top300_view.match_tags("val")

# Step 3: Extract class labels
label_field = "segmentations"
labels_set = set()
for sample in top300_view:
    if sample[label_field]:
        labels_set.update(d.label for d in sample[label_field].detections)

classes = sorted(labels_set)

# Step 4: Export each split individually
export_dir = "arcade_yolo_subset"
for split_name, split_view in [("train", train_view), ("val", val_view)]:
    split_view.export(
        export_dir=export_dir,
        dataset_type=fo.types.YOLOv5Dataset,
        label_field=label_field,
        classes=classes,
        split=split_name,
        overwrite=False,
    )


 100% |█████████████████| 299/299 [484.5ms elapsed, 0s remaining, 617.1 samples/s]      
Directory 'arcade_yolo_subset' already exists; export will be merged with existing files
 100% |█████████████████| 179/179 [252.4ms elapsed, 0s remaining, 709.2 samples/s]      


In [36]:
# Step 4: Export each split individually
export_dir = "arcade_yolo_subset_coco"
for split_name, split_view in [("train", train_view), ("val", val_view)]:
    split_view.export(
        export_dir=export_dir,
        dataset_type=fo.types.COCODetectionDataset,
        label_field=label_field,
        classes=classes,
        split=split_name,
        overwrite=False,
    )

Ignoring unsupported parameter 'split'
 100% |█████████████████| 299/299 [2.1s elapsed, 0s remaining, 305.7 samples/s]      
Directory 'arcade_yolo_subset_coco' already exists; export will be merged with existing files
Ignoring unsupported parameter 'split'
 100% |█████████████████| 179/179 [544.7ms elapsed, 0s remaining, 328.6 samples/s]      
