# FastVLM Models with FiftyOne
# [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/harpreetsahota204/fast_vlm/blob/main/using_fastvlm_in_fiftyone.ipynb)

This notebook demonstrates how to use Apple's FastVLM models for visual question answering and creative tasks using FiftyOne.

## Setup

First, let's install the required packages:


In [None]:
%pip install fiftyone torch transformers

## Import Dependencies


In [None]:
import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.utils.huggingface as fouh

## Register and Download FastVLM Model

We'll use the 1.5B parameter model for this example as it provides a good balance between performance and resource usage.


In [None]:
# Register the model source
foz.register_zoo_model_source(
    "https://github.com/harpreetsahota204/fast_vlm",
    overwrite=True
)

# Download the model (first time only)
foz.download_zoo_model(
    "https://github.com/harpreetsahota204/fast_vlm",
    model_name="apple/FastVLM-0.5B"
)


## Load Sample Dataset

We'll use the MashUpVQA dataset from HuggingFace, which contains diverse images with questions.


In [None]:
# Load a small subset for demonstration
dataset = fouh.load_from_hub(
    "Voxel51/MashUpVQA",
    max_samples=10,
    overwrite=True
)

print(f"Loaded {len(dataset)} samples")


## Example 1: Basic Visual Question Answering

Let's start with a simple example where we ask the same question for all images.


In [None]:
# Load model with a default prompt
model = foz.load_zoo_model(
    "apple/FastVLM-1.5B",
)

model.prompt="Describe the main activity or event happening in this image."

# Apply to dataset
dataset.apply_model(model, label_field="activity_description")

# View a sample result
sample = dataset.first()
print("Sample Image Description:")
print(sample.activity_description)


## Example 2: Using Dataset Questions

Now let's use the questions that come with the dataset.


In [None]:
# Use questions from the dataset
dataset.apply_model(
    model,
    prompt_field="question",
    label_field="model_answer"
)

# View sample Q&A
sample = dataset.first()
print("Question:", sample.question)
print("Answer:", sample.model_answer)


## Example 3: Creative Generation

FastVLM can also generate creative content based on images.


In [None]:
# Configure model for creative generation
model.prompt = "Write a short, creative poem about what you see in this image."
model.temperature = 0.9  # Increase creativity
model.max_new_tokens = 100  # Allow longer responses

# Generate poems
dataset.apply_model(model, label_field="poem")

# View a sample poem
sample = dataset.first()
print("Generated Poem:")
print(sample.poem)


## Example 4: Detailed Scene Analysis

Let's use a structured prompt to get detailed scene analysis.


In [None]:
# Configure model for detailed analysis
model.prompt = """
Analyze this image and provide:
1. Main subjects/objects
2. Actions/activities
3. Setting/environment
4. Notable details
5. Overall mood/atmosphere
""".strip()

# Generate analysis
dataset.apply_model(model, label_field="detailed_analysis")

# View sample analysis
sample = dataset.first()
print("Detailed Analysis:")
print(sample.detailed_analysis)


## Visualize Results

Launch the FiftyOne App to interactively explore all results.


In [None]:
#install caption viewer plugin:

!fiftyone plugins download https://github.com/mythrandire/caption-viewer

In [None]:
session = fo.launch_app(dataset)


## Cleanup

Close the FiftyOne App session when done.


In [None]:
session.close()
