# ShowUI-2B Tutorial: Multimodal Analysis with FiftyOne

This tutorial demonstrates how to use the ShowUI-2b vision-language models with FiftyOne as a vision-language-action model designed for GUI agents.

## 1. Load a Sample Dataset

First, let's load a small UI dataset from the FiftyOne Dataset Zoo.

You can see some other GUI grounding datasets [here](https://huggingface.co/datasets?other=gui-grounding)

In [None]:
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub

# Load 5 random samples from the GUI Act dataset
dataset = load_from_hub(
    "Voxel51/GroundUI-18k",
    max_samples=200,
    shuffle=True,
    overwrite=True
)

To get an idea of what is in this dataset you can launch the FiftyOne App to visualize it:

In [None]:
fo.launch_app(dataset)

Or just look at the first sample:

In [None]:
from PIL import Image

Image.open(dataset.first().filepath)

In [None]:
dataset.first().instruction

## 2. Set Up ShowUI Integration

Register the ShowUI remote zoo model source and load the model.

In [None]:
import fiftyone.zoo as foz

# Register the model source
foz.register_zoo_model_source("https://github.com/harpreetsahota204/ShowUI", overwrite=True)

# Load the `ShowUI-2B` model


In [None]:
model = foz.load_zoo_model(
    "showlab/ShowUI-2B",
    quantized=True #only for GPU
    # install_requirements=True, #you can pass this to make sure you have all reqs installed
    )

Note that for any of the following operations you can use a Field which currently exists on your dataset, all you need to do is pass the name of that field in `prompt_field` when you call `apply_model`. For example:

```python
dataset.apply_model(model, prompt_field="<field-name>", label_field="<label-field>")
```

Alternatively, you can run a single prompt across all samples like so:

```python
model.prompt = "Locate the elements of this UI that a user can interact with."
dataset.apply_model(model, label_field="one_prompt")
```

## 1. Simple UI Grounding

Ask the model to ground an element in a screenshots with a keypoint.

In [4]:
model.operation = "simple_grounding"

The prompt for this operation is:

In [None]:
print(model.system_prompt)

In [None]:
dataset.apply_model(
    model, 
    prompt_field="instruction", # use a field from the dataset
    label_field="simple_grounding_kps"
    )

## 2. Action Grounding

Action grounding is the process of translating high-level task instructions into precise, executable UI actions with specific coordinates and parameters based on visual screen observations.

In this case we are prompting the model to format the action as a dictionary with the following keys:
`{'action': 'ACTION_TYPE', 'value': 'element', 'position': [x,y]}`

In [7]:
model.operation = "action_grounding"

The prompt for this operation is:

In [None]:
print(model.system_prompt)

In [None]:
dataset.apply_model(
    model, 
    prompt_field="instruction", # use a field from the dataset
    label_field="action_grounding_kp",
    )


## 9. View Results

Examine the results for the first sample.

In [None]:
dataset.first()

In [None]:
# Visualize all results in the FiftyOne App
session = fo.launch_app(dataset)

In [13]:
session.freeze()