# Using Embeddings for Deeper Dataset Understanding

You can use embeddings to gain a deeper understanding of the images in this dataset.
Visual embeddings can help analyze car damage images in several key ways:

1. **Relationship Visualization:** Using dimensionality reduction (like UMAP) to visualize how different damage types cluster together and identify patterns.

2. **Model Comparison:** Compare how different vision models encode and interpret car damage, revealing their unique perspectives and strengths.

3. **Category Analysis:** Explore visual similarities and differences between the six damage types (dent, scratch, crack, glass shatter, tire flat, lamp broken).

4. **Variation Study:** Understand how factors like shooting angle and vehicle color affect damage representation in embedding space.

5. **Feature Detection:** Identify subtle visual features that distinguish different types of damage, which might not be obvious in the annotations.

For this analysis, we’ll use these models:

• CLIP

• SigLIP 2

Note that both of these models can be used for zero-shot classification. We won't discuss their use for that task here, but you're encouraged to [learn more about zero-shot classification in this tutorial](https://github.com/harpreetsahota204/getting-started-fo-experiences/blob/main/zero-shot-prediction/zero-shot-classification.ipynb).

Start by loading the Dataset:

In [97]:
import fiftyone as fo

dataset = fo.load_dataset("cardd_from_hub")

Next you can instantiate the models and then computing embeddings.

# Open CLIP Integration

FiftyOne [integrates natively with the OpenCLIP library](https://beta-docs.voxel51.com/integrations/openclip/), an open source implementation of OpenAI’s CLIP (Contrastive Language-Image Pre-training) model that you can use to run inference on your FiftyOne datasets with a few lines of code!

To use models from OpenCLIP you need to ensure you have installed the `open_clip_torch` package (which is part of the `requirements.txt` for this workshop).

In [98]:
import torch 

import fiftyone.zoo as foz

clip_model = foz.load_zoo_model(
    "zero-shot-classification-transformer-torch",
    name_or_path="openai/clip-vit-base-patch32", 
    device="cuda" if torch.cuda.is_available() else "cpu",
    use_fast=True,
    # install_requirements=True # uncomment this line if you are running this code for the first time
    )

You can also specify model architectures and pretrained weights by passing in optional parameters. Pretrained models can be loaded directly from OpenCLIP with the following syntax:


```python
meta_clip = foz.load_zoo_model(
    name_or_url="open-clip-torch",
    clip_model="ViT-B-32-quickgelu",
    pretrained="metaclip_400m",
)
```


Alternatively you can also load a model from Hugging Face’s Model Hub with the following syntax:


```python
import fiftyone.zoo as foz

open_clip_model = foz.load_zoo_model(
    name_or_url="open-clip-torch",
    clip_model="hf-hub:repo-name/model-name",
    pretrained="",
)
```

As a concrete example, if you were interested in the [StreetCLIP model](https://huggingface.co/geolocal/StreetCLIP) you would use:

```python
street_clip_model = foz.load_zoo_model(
    name_or_url="open-clip-torch",
    pretrained="",
    clip_model="hf-hub:geolocal/StreetCLIP"
)
```



# Hugging Face Integration


You can also run models from Hugging Face as a Zoo Model with [FiftyOne's Hugging Face Integration](https://beta-docs.voxel51.com/integrations/huggingface/#zero-shot-classification). 

To load a model from the Hugging Face Hub, set `name_or_url=zero-shot-classification-transformer-torch`. This specifies that you want to a zero-shot image classification model from the Hugging Face Transformers library. You can then specify the model via the `name_or_path` argument. This should be the repository name or model identifier of the model you want to load:


In [99]:
import torch 

import fiftyone.zoo as foz

siglip_model = foz.load_zoo_model(
    "zero-shot-classification-transformer-torch",
    name_or_path="google/siglip2-so400m-patch14-384",
    device="cuda" if torch.cuda.is_available() else "cpu",
    use_fast=True,
    # install_requirements=True # uncomment this line if you are running this code for the first time
    )

# Computing embeddings

We can use [the `compute_embeddings`](https://beta-docs.voxel51.com/api/fiftyone.core.collections.SampleCollection.html#compute_embeddings) method of the Dataset as follows to compute embeddings for the images in the Dataset.

This method supports all the following cases:

• Using an image model to compute image embeddings for an image collection

• Using an image model to compute frame embeddings for a video collection

• Using a video model to compute embeddings for a video collection

In [None]:
dataset.compute_embeddings(
    model=clip_model,
    embeddings_field="clip_embeddings"
)

In [None]:
dataset.compute_embeddings(
    model=siglip_model,
    embeddings_field="siglip_embeddings"
)

# Computing Visualization

Now, we can use UMAP to reduce the dimensionality of the embeddings and explore them in the FiftyOne app. 

> Note that you will need to have the `umap-learn` package installed for this, which is also listed as a requirement in the `requirements.txt` file of this repository.

We can use the [FiftyOne Brain](https://beta-docs.voxel51.com/fiftyone_concepts/brain/) to perform [dimensionality reduction](https://beta-docs.voxel51.com/tutorials/dimension_reduction/) so that we can viualize the embeddings in the FiftyOne App.

In [None]:
import fiftyone.brain as fob

embedding_fields = [
    "clip_embeddings",
    "siglip_embeddings"
]

# Compute UMAP for each embedding

for field in embedding_fields:
    _fname = field.split("_embeddings")[0]
    brain_key = f"{_fname}_viz"
    
    results = fob.compute_visualization(
        dataset,
        embeddings=field,
        method="umap",
        brain_key=brain_key,
        num_dims=2,
    )

### Visualizing Embeddings

You can launch the app in a notebook by running:

```python

import fiftyone as fo

fo.launch_app(hub_dataset)
```

Or, you can open your terminal and execute `fiftyone app launch`. This will open the App in a browser window. The you will select your Dataset from the dropdown menu, open the embeddings panel by clicking the `+`  next to the Samples viewer, and select the embeddings you want to display by selecting from the dropdown menu in the embeddings panel.

## Patch embeddings

Note that these embeddings are computed for the entire image. You may find it interesting and useful to compute embeddings for each *patch* of an image. That is, each segmentation mask in the Dataset.

You can use the more explicit pattern defined above where we we:

- instantiated a model

- called the `compute_embeddings` method of the Dataset with that model

- called FiftyOne Brain's `compute_visualization` method with those embeddings

However, you can directly compute and visualize embeddings with any Zoo Model using the following pattern:


In [None]:
fob.compute_visualization(
    dataset,
    model="zero-shot-classification-transformer-torch",
    name_or_path="google/siglip2-so400m-patch14-384",    
    patches_field="ground_truth",
    brain_key="siglip_viz_patches",
)

# Visualize text embeddings

You can also compute embeddings for the `damage_report` field we generated using the VLM and visualize those in the App:

In [None]:
import os
import torch
import fiftyone.brain as fob
from transformers import AutoModel

#set an environment variable so tokenizers doesn't yell at us,
# note this related to the `transformers` and `tokenizers` libraries and not a FiftyOne specific environment variable
os.environ["TOKENIZERS_PARALLELISM"] = "false"

jina_embeddings_model = AutoModel.from_pretrained(
    "jinaai/jina-embeddings-v3", 
    trust_remote_code=True,
    device_map = "cuda" if torch.cuda.is_available() else "cpu"
    )

for sample in dataset.iter_samples(autosave=True):
    text_embeddings = jina_embeddings_model.encode(
        sentences = [sample["damage_report"]], # model expects a list of strings
        task="separation"
        )
    sample["text_embeddings"] = text_embeddings.squeeze()

results = fob.compute_visualization(
        dataset,
        embeddings="text_embeddings",
        method="umap",
        brain_key=f"text_embeddings_viz",
        num_dims=2,
        )

Note, at this point it will be helpful for us to install the [Caption View Plugin](https://github.com/mythrandire/caption-viewer) so that we can more easily read the captions as we explore them in embedding space.

# Search your images with natural language

When you create a similarity index powered by the CLIP model, you can also search by arbitrary natural language queries natively in the App.

Note: For the models we used to compute emebeddings above, FiftyOne's implementation uses those model to extract image embeddings. So those models don't *currently* support text prompts, hence we use a model whose implementatin in FiftyOne does support both text and images. 

In [None]:
import fiftyone.brain as fob

text_img_index = fob.compute_similarity(
    dataset,
    model="clip-vit-base32-torch",
    brain_key="text_img_sim",
)

The above will allow you to search images "holistically", that is considering the entire content of an image.

However, you can also do the create an index on the *patch* level and search that via natural language:

In [None]:
import fiftyone.brain as fob

patch_text_img_index = fob.compute_similarity(
    dataset,
    patches_field="ground_truth",
    model="clip-vit-base32-torch",
    brain_key="patch_text_img_sim",
)

#### A quick note on managing Brain runs

You can keep track of the Brain runs you have by calling: `dataset.list_brain_runs()`

If you were to rerun the cell above again, you'd see the error following error: `ValueError: Brain method run with key 'patch_text_img_sim' already exists`


If you want to ever delete a specific Brain run, for example you really like the name `patch_text_img_sim` and want to use it again,  then you can run: `dataset.delete_brain_run("patch_text_img_sim")`

⚠️ To delete ALL the Brain runs on a Dataset, you can call (though it is not recommended): `dataset.delete_brain_runs`

Let's turn to the app and perform some natural language search. Click on the `🔎` icon in the Samples viewer and start searching!

# Other embeddings based workflows

### Computing Representativeness

You can use the [`compute_representativeness` method](https://docs.voxel51.com/api/fiftyone.brain.html#fiftyone.brain.compute_representativeness) from FiftyOne Brain to compute representativeness 

**What is Representativeness?**

A measure that identifies how well a sample represents typical patterns in your dataset, scored from 0 to 1 (1 being most representative).

**Key Uses:**
1. Find outliers (low scores)
2. Identify typical examples (high scores)
3. Guide data augmentation
4. Evaluate model performance on typical vs. atypical cases
5. Prioritize diverse data labeling

**How it Works:**
- Uses clustering to group similar images
- Scores samples based on proximity to cluster centers
- Two methods available:
  - `cluster-center`: Favors samples close to cluster centers
  - `cluster-center-downweight`: Promotes more diversity

**Implementation:**
Simply call `compute_representativeness()` on your dataset - no pre-trained model predictions needed.


In [None]:
import fiftyone as fo
import fiftyone.brain as fob

fob.compute_representativeness(
    dataset,
    embeddings="siglip_embeddings",
    representativeness_field="siglip_representativeness",
    )

You can use this method for any Field on your Dataset which contains embeddings. Let's also do this for `text_embeddings`:

In [None]:
import fiftyone as fo
import fiftyone.brain as fob

fob.compute_representativeness(
    dataset,
    embeddings="text_embeddings", # you can also use "clip_embeddings"
    representativeness_field="text_representativeness",
    )

### Uniqueness

You can use [the `compute_uniqueness` method](https://docs.voxel51.com/api/fiftyone.brain.html#fiftyone.brain.compute_uniqueness) from FiftyOne Brain to help you quantifiy how distinct or exceptional each sample is compared to others in a dataset, revealing outliers and unusual cases that might otherwise remain hidden in the data.

The algorithm identifies which samples are "outliers" or unusual compared to the rest of the dataset. A sample is considered unique when it's far from other samples in the feature space. The more isolated a sample is in the feature space (greater distances to neighbors), the higher its uniqueness score will be. This differs from "representativeness" which would emphasize samples central to clusters.

**How it works:**

1. **Generate Embeddings**: 
   - Each sample is embedded into a vector space using either:
     - A machine learning model (defaults to "simple-resnet-cifar10")
     - Pre-computed embeddings 
     - A similarity index

2. **Find Nearest Neighbors**:
   - For each sample, find its K nearest neighbors (K=3)
   - Calculate distances to these neighbors

3. **Score Uniqueness**:
   - Compute a weighted average of distances to nearest neighbors
   - Weights [0.6, 0.3, 0.1] give more importance to the closest neighbor
   - Normalize scores by dividing by the maximum (resulting in 0-1 range)


In [None]:
import fiftyone.brain as fob
import os 

fob.compute_uniqueness(
    dataset,
    embeddings="siglip_embeddings",
    uniqueness_field = "siglip_uniqueness",
)