In [None]:
!fiftyone plugins download https://github.com/jacobmarks/reverse-image-search-plugin

In [None]:
import os

import fiftyone as fo
import fiftyone.brain as fob
import fiftyone.zoo as foz

FiftyOne has integrations with Hugging Face, which allow you to easily pull datasets from the hub! Learn more about the integration [here](https://docs.voxel51.com/integrations/huggingface.html) and how you can pull datasets from the hub [here](https://docs.voxel51.com/integrations/huggingface.html#loading-datasets-from-the-hub).

In [None]:
import fiftyone.utils.huggingface as fouh

stanford_cars_dataset = fouh.load_from_hub(
    "Multimodal-Fatima/StanfordCars_train",
    split="train",
    format= "ParquetFilesDataset",
    max_samples=1000,
    overwrite=True
    )

For this example, you'll use a version of the [Stanford Cars dataset](https://ai.stanford.edu/~jkrause/papers/fgvc13.pdf) that a Hugging Face community member uploaded.

In [None]:
stanford_cars_dataset = fo.load_dataset("Multimodal-Fatima/StanfordCars_train-1000")

These are just some fields that are unnecessary for our example. So, I'm just going to remove them.

In [None]:
stanford_cars_dataset.delete_sample_fields(
    [
        "clip_tags_ViT_L_14",
        "LLM_Description_gpt3_downstream_tasks_ViT_L_14",
        "LLM_Description_gpt3_downstream_tasks_visual_genome_ViT_L_14",
        "blip_caption_beam_5",
        "Attributes_ViT_L_14_text_davinci_003_full",
        "Attributes_ViT_L_14_text_davinci_003_stanfordcars",
        "clip_tags_ViT_L_14_with_openai_classes",
        "clip_tags_ViT_L_14_wo_openai_classes",
        "clip_tags_ViT_L_14_simple_specific",
        "clip_tags_ViT_L_14_ensemble_specific",
        "clip_tags_ViT_B_16_simple_specific",
        "clip_tags_ViT_B_32_ensemble_specific",
        "Attributes_ViT_B_16_descriptors_text_davinci_003_full",
        "Attributes_LAION_ViT_H_14_2B_descriptors_text_davinci_003_full",
        "clip_tags_LAION_ViT_H_14_2B_simple_specific",
        "clip_tags_LAION_ViT_H_14_2B_ensemble_specific",
        "Attributes_ViT_L_14_descriptors_text_davinci_003_full",
        "clip_tags_ViT_B_16_ensemble_specific"
        ]
)



## **🦒 FiftyOne Model Zoo**

The [FiftyOne Model Zoo 🦒](https://docs.voxel51.com/user_guide/model_zoo/models.html) is a collection of over 70 pre-trained models that can be easily downloaded and run on FiftyOne Datasets. 

📂 This convenient resource provides a consistent interface for a wide variety of models, making it simple to integrate pre-trained models into your workflow.

**💻 Basic Workflow**

To get started, follow these steps:

🔓 **Step 1: Load a Model**: Load a model from the zoo using `foz.load_zoo_model()`.

📁 **Step 2: Load a Dataset**: Load a dataset (or subset of one) to run the model on.

💡 **Step 3: Generate Predictions**: Use methods like `apply_model()` to generate predictions, which are stored in the dataset.

**🔍 Advanced Features**

-  **Embeddings**: Many zoo models can generate embeddings for samples or patches using the `compute_embeddings()` method.
-  **Prediction Logits**: Many zoo models can optionally store prediction logits by passing `store_logits=True` to `apply_model()`. This enables running Brain methods like `compute_uniqueness()` and `compute_hardness()`.
-  **Custom Models**: Custom models, such as PyTorch models, can be wrapped to implement the `Model` interface and used with builtin methods like `apply_model()`.

In [None]:
clip_model = foz.load_zoo_model(
    name="clip-vit-base32-torch",
    install_requirements=True,
)

shufflenet_model = foz.load_zoo_model(
    name="shufflenetv2-0.5x-imagenet-torch",
    install_requirements=True,
)

resnet_model = foz.load_zoo_model(
    name="resnet50-imagenet-torch",
    install_requirements=True,
)

[FiftyOne Brain](https://docs.voxel51.com/user_guide/brain.html) can generate embeddings and create indexes for images and objects or patches within images, which can be used for visualizations and indexes. It is compatible with various embedding models, dimensionality reduction techniques, and similarity backends.
    
🧠 With the Brain you can:

- 👁️ Visualizing your dataset in a low-dimensional embedding space to reveal patterns and clusters

- 🗂️ Indexing your data by similarity to easily find similar examples

- 🦄 Computing uniqueness measures for images to identify the most valuable unlabeled data to annotate

- ⚠️ Identifying possible label mistakes in your annotations

- 💡 Finding the hardest samples for your model to learn from






- Brain runs are tracked and can be listed, loaded, renamed and deleted via the `Dataset` methods like `list_brain_runs()`, `load_brain_results()`, `rename_brain_run()`, etc.

In [None]:
stanford_cars_dataset.compute_embeddings(
    model=clip_model,
    embeddings_field="clip_embeddings",
    progress=True,  
)

stanford_cars_dataset.compute_embeddings(
    model=shufflenet_model,
    embeddings_field="shufflenet_embeddings",
    progress=True,  
)

stanford_cars_dataset.compute_embeddings(
    model=resnet_model,
    embeddings_field="resnet_embeddings",
    progress=True,  
)

### **📊 Compute Visualization**

The `compute_visualization()` method 📊 generates interactive visualizations of your dataset or patches in a low-dimensional space using UMAP, t-SNE, and PCA dimensionality reduction techniques.

Visualizing datasets in low-dimensional embedding spaces helps reveal:

🔹 Patterns and clusters that can help identify failure modes
🔹 Similar examples and outliers
🔹 New samples to add to your training set, helping you improve model performance

In [None]:
fob.compute_visualization(
    stanford_cars_dataset,
    embeddings="clip_embeddings",
    method="umap",
    brain_key = "umap_2d_clip",
    num_dims=2,
    num_workers = os.cpu_count(),
    progress=True, 
)

fob.compute_visualization(
    stanford_cars_dataset,
    embeddings="shufflenet_embeddings",
    method="umap",
    brain_key = "umap_2d_shufflenet",
    num_dims=2,
    num_workers = os.cpu_count(),
    progress=True, 
)

fob.compute_visualization(
    stanford_cars_dataset,
    embeddings="resnet_embeddings",
    method="umap",
    brain_key = "umap_2d_resnet",
    num_dims=2,
    num_workers = os.cpu_count(),
    progress=True, 
)

### **🦄 Compute Uniqueness**

The `compute_uniqueness()` method 📊 computes a uniqueness score for each image, comparing its content to all other images in the dataset.


Computing uniqueness scores for images helps you:

✨ Identify the most valuable data to annotate in the early stages of a machine learning workflow

In [None]:
fob.compute_uniqueness(
    stanford_cars_dataset,
    embeddings = "clip_embeddings",
    uniqueness_field="car_uniqueness",
    num_workers = os.cpu_count(),
    progress=True,
)

### **🔍 Compute Similarity**

The `compute_similarity()` method 🔍 indexes your data by similarity, allowing you to efficiently search for similar samples or objects both programmatically and via point-and-click in the App.

Choose from multiple backends to power your similarity index, including:

🔹 Qdrant
🔹 Redis
🔹 Pinecone
...and more!

In [None]:
sim_index = fob.compute_similarity(
    stanford_cars_dataset,
    model="clip-vit-base32-torch",
    embeddings = "clip_embeddings",
    brain_key="clip_sim_index",
    backend="lancedb",
    num_workers = os.cpu_count(),
    metric="cosine",
    progress=True,
)

In [None]:
stanford_cars_dataset.persistent = True

In [None]:
session = fo.launch_app(stanford_cars_dataset)