# Using Florence2 as Remotely Sourced Zoo Model

In [1]:
import fiftyone as fo

# Load a dataset
dataset = fo.load_dataset("cardd_from_hub")
dataset = dataset.take(5)

In [None]:
dataset

For context, here is the first image:

In [None]:
from PIL import Image

Image.open(dataset.first().filepath)

# Setup Zoo Model

In [None]:
import fiftyone.zoo as foz 
foz.register_zoo_model_source("https://github.com/harpreetsahota204/florence2", overwrite=True)

In [None]:
foz.download_zoo_model(
    "https://github.com/harpreetsahota204/florence2",
    model_name="microsoft/Florence-2-base-ft", 
)

In [None]:
model = foz.load_zoo_model(
    "microsoft/Florence-2-base-ft"
    )

# Use Florence2 for Captions

The three captioning operations require no additional arguments beyond selecting the operation type. 

Supported `detail_level` values:

* `basic`

*  `detailed`

* `more_detailed`

In [7]:
model.operation="caption"
model.detail_level= "basic"

In [None]:
dataset.apply_model(model, label_field="captions")

In [None]:
dataset.first()['captions']

To change the caption detail level:

In [None]:
model.detail_level= "more_detailed"

dataset.apply_model(model, label_field="more_detailed_captions")

dataset.first()['more_detailed_captions']

# Use Florence2 for Detection

The operations for `detection`, `dense_region_caption`, `region_proposal` don't require additional parameters for general use. 

However, `open_vocabulary_detection` requires a `text_prompt` parameter to guide the detection towards specific objects. 


The results are stored as Detections objects containing bounding boxes and labels:

In [None]:
model.operation="detection"

model.detection_type="open_vocabulary_detection"

model.prompt="crack, windshield"

dataset.apply_model(model, label_field="ov_prompted_detection")

In [None]:
dataset.first()['ov_prompted_detection']

Or you can use the caption field:

In [None]:
dataset.apply_model(model, label_field="ov_field_detection", prompt_field="captions")

In [None]:
dataset.first()['ov_field_detection']

For dense detections. This doesn't take a prompt as the model will detect all it can:

In [None]:
model.operation="detection"

model.detection_type="dense_region_caption"

dataset.apply_model(model, label_field="dense_detections")

In [None]:
dataset.first()['dense_detections']

# Use Florence2 for Phrase Grounding

Phrase grounding requires either a direct caption or a reference to a caption field. You can provide this in two ways:

In [None]:
model.operation="phrase_grounding"

model.prompt="cake"

dataset.apply_model(model, label_field="cap_phrase_groundings")

In [None]:
dataset.first()['cap_phrase_groundings']

When you want to use a Field of a Sample for grounding, you use the following pattern:

In [None]:
dataset.apply_model(model, 
                    label_field="cap_field_phrase_groundings", 
                    prompt_field="more_detailed_captions"
                    )

In [None]:
dataset.first()['cap_field_phrase_groundings']

# Use Florence2 for Segmentation

Segmentation requires either a direct expression or a reference to a field containing expressions. 

Similar to phrase grounding, you can provide this in two ways:

In [None]:
model.operation="segmentation"

model.prompt="crack"

dataset.apply_model(model, label_field="prompted_segmentations")

In [None]:
dataset.first()['prompted_segmentations']

When you want to use a Field of a Sample for grounding, you use the following pattern:

In [None]:
dataset.apply_model(model, label_field="sample_field_segmentations", prompt_field="captions")

In [None]:
dataset.first()['sample_field_segmentations']

# OCR

Basic OCR ("ocr") requires no additional parameters and returns text strings. For OCR with region information (`ocr_with_region`), you can set `store_region_info=True` to include bounding boxes for each text region:

In [None]:
model.operation="ocr"

model.store_region_info=True

dataset.apply_model(model, label_field="text_regions")

In [None]:
dataset.first()['text_regions']

In [None]:
model.store_region_info=False

dataset.apply_model(model, label_field="text_regions_no_region_info")

In [None]:
dataset.first()['text_regions_no_region_info']