# [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/harpreetsahota204/moondream3/blob/main/using_moondream3_zoo_model.ipynb)

If opening in Colab, be sure to install:

`pip install fiftyone`

# Using Moondream3 as Remotely Sourced Zoo Model


<div style="background-color: #fff3cd; border: 1px solid #856404; border-radius: 5px; padding: 15px; margin: 10px 0; color: #856404;">
<strong>⚠️ NOTE:</strong> This is a gated model. You need to request access to it and then log into Hugging Face with your token by running hf auth login in your terminal and passing your token.
</div>

In [None]:
import fiftyone as fo
import fiftyone.zoo as foz

import fiftyone as fo
import fiftyone.utils.huggingface as fouh

dataset = fouh.load_from_hub(
    "Voxel51/GQA-Scene-Graph",
    max_samples=50,
    overwrite=True
    )

sample_objects = dataset.values("detections.detections.label")

sample_level_objects =  [list(set(obj)) for obj in sample_objects]

dataset.set_values("sample_level_objects", sample_level_objects)

View the first image for context:

In [None]:
from PIL import Image

Image.open(dataset.first().filepath)

# Setup Zoo Model

In [None]:
import fiftyone as fo
import fiftyone.zoo as foz
foz.register_zoo_model_source("https://github.com/harpreetsahota204/moondream3", overwrite=True)

In [None]:
import fiftyone as fo
import fiftyone.zoo as foz

foz.download_zoo_model(
    "https://github.com/harpreetsahota204/moondream3",
    model_name="moondream/moondream3-preview"
)

Note that Moondream2 has frequent updates, you can check the versions [here](https://huggingface.co/vikhyatk/moondream2/blob/main/versions.txt) and pass the most recent one or any previous versions.

In [None]:
import fiftyone as fo
import fiftyone.zoo as foz

model = foz.load_zoo_model(
    "moondream/moondream3-preview",
    )

# Use Moondream3 for Zero Shot Classification

In [None]:
model.operation="classify"
model.prompt= "Pick one of the animals the image: horse, giraffe, elephant, shark"

dataset.apply_model(
    model, 
    label_field="classification",
)

In [None]:
dataset.first()['classification']

# Use Moondream3 for Captions

The three captioning operations require no additional arguments beyond selecting the operation type. 

Supported `length` values:

* `short`

* `normal`

* `long`

In [None]:
model.operation="caption"
model.length= "short"

dataset.apply_model(
    model, 
    label_field="short_captions",
)

In [None]:
dataset.first()['short_captions']

In [None]:
model.length= "long"

dataset.apply_model(
    model, 
    label_field="long_captions",
)

In [None]:
dataset.first()['long_captions']

# Use Moondream3 for Detection


The results are stored as Detections objects containing bounding boxes and labels:

In [None]:
model.operation="detect"

dataset.apply_model(model, prompt_field="sample_level_objects", label_field="detections")

In [None]:
dataset.first()['detections']

Also supports passing a Python list:

In [None]:
model.prompt=["horse", "house", "saddle", "man", "black jacket"]

dataset.apply_model(model,label_field="list_detections")

In [None]:
dataset.first()['list_detections']

# Use Moondream3 for Keypoints


In [None]:
model.operation="point"

dataset.apply_model(model, prompt_field="sample_level_objects", label_field="pointings")

In [None]:
dataset.first()['pointings']

Also supports lists:

In [None]:
model.prompt=["horse", "house", "saddle", "man", "black jacket"]

dataset.apply_model(model,label_field="list_points")

In [None]:
dataset.first()["list_points"]

# Use Moondream3 for VQA


In [None]:
model.operation="query"

model.prompt="What is the in the background of the image"

dataset.apply_model(model, label_field="vqa_response")

In [None]:
dataset.first()['vqa_response']

When you want to use a Field of a Sample for grounding, you use the following pattern:

In [None]:
dataset.set_values("questions", ["Where is the general location of this scene?"]*len(dataset))

In [None]:
dataset.first()['questions']

In [None]:
dataset.apply_model(
    model,
    label_field="query_field_response",
    prompt_field="questions"
)

In [None]:
dataset.first()['query_field_response']

# Phrase Grounding

This model doesn't support phrase grounding out of the box, but a hacky way you can do this is by passing the caption as a `prompt_field`:

In [None]:
model.operation="detect"

dataset.apply_model(model, label_field="grounded_detections", prompt_field="short_captions")

In [None]:
dataset.first()['grounded_detections']