# OS-Atlas Tutorial with FiftyOne

This tutorial demonstrates how to use the OS-Atlas vision-language models with FiftyOne as a vision-language-action model designed for GUI agents.

## 1. Load a Sample Dataset

First, let's load a small UI dataset from the FiftyOne Hugging Face org.

In [None]:
import fiftyone as fo
import fiftyone.zoo as foz

import fiftyone.utils.huggingface as fouh

dataset = fouh.load_from_hub(
    "Voxel51/GroundUI-18k",
    overwrite=True,
    max_samples=200,
    persistent=True
    )

In [None]:
# if you've already downloaded this dataset you can load it via:

# import fiftyone as fo
# import fiftyone.zoo as foz

# dataset = fo.load_dataset("Voxel51/GroundUI-18k")

Launch the FiftyOne App to visualize the dataset (optional)

In [None]:
fo.launch_app(dataset)

## 2. Set Up OS-Atlas Integration

Register the OS-Atlas remote zoo model source and load the model.

In [None]:
import fiftyone.zoo as foz

# Register the model source
foz.register_zoo_model_source("https://github.com/harpreetsahota204/os_atlas", overwrite=True)

# Load the OS-Atlas model


In [None]:
model = foz.load_zoo_model(
    "OS-Copilot/OS-Atlas-Base-7B", # you could also use "OS-Copilot/OS-Atlas-Pro-7B"
    # install_requirements=True, # you can pass this to make sure you have all reqs installed
    )

Note that for any of the following operations you can use a Field which currently exists on your dataset, all you need to do is pass the name of that field in `prompt_field` when you call `apply_model`. For example:

```python
dataset.apply_model(model, prompt_field="<field-name>", label_field="<label-field>")
```

## 3. Visual Question Answering

Ask the model to describe UI screenshots.

In [None]:
model.operation = "vqa"
model.prompt = "Describe this screenshot and what the user might be doing in it."
dataset.apply_model(model, label_field="vqa_results")

In [None]:
dataset.first()['vqa_results']

Straightforward "read the text" style OCR:

In [None]:
model.prompt = "Read the text on this screenshot"
dataset.apply_model(model, label_field="plain_ocr")

In [None]:
dataset.first()['plain_ocr']

## 4. Grounded Optical Character Recognition (OCR)

Extract and locate text in the UI.

Note: This will take a VERY LONG time!

In [None]:
model.operation = "ocr"
model.prompt = "Read the text for each UI element in this interface only once. Focus on text in toolbars, buttons, menus, and other controls. Do not read the same text more than once."
dataset.apply_model(model, label_field="ocr_results")

In [None]:
dataset.first()['ocr_results']

## 5. Keypoint Detection

Identify important points in the UI.

In [None]:
model.operation = "point"

dataset.apply_model(
    model,
    prompt_field="instruction", # using a field from the dataset
    label_field="ui_keypoints"
    )

In [None]:
dataset.first()['ui_keypoints']

## 6. Classification

Classify the type of UI platform.

In [None]:
model.operation = "classify"
model.prompt = "Classify this UI as coming from one of the following operating systems: android, ios, windows, macos, linux, chromeos, or other"
dataset.apply_model(model, label_field="ui_classifications")

In [None]:
dataset.first()['ui_classifications']

If your dataset has a field that you want to extract labels from to perform zero-shot classification, you can do so with the following code:

In [14]:
classes = dataset.distinct("platform")

You can then use this as part of your prompt:

In [None]:
model.prompt = f"Which of the following websites is this screenshot from? Pick from exactlt one of the following: {classes}"
dataset.apply_model(model, label_field="app_classifications")

In [None]:
dataset.first()['app_classifications']

## 7. Agentic

In this dataset, there's an `instruction` field that contains instructions for an agent.

In [None]:
# If your dataset has a field called "instruction" with instructions
model.operation = "agentic"
dataset.apply_model(model, prompt_field="instruction", label_field="agentic_output")

In [None]:
dataset.first()['agentic_output']

# 8. Detection

In [None]:
# If your dataset has a field called "instruction" with instructions
model.operation = "detect"
dataset.apply_model(model, prompt_field="instruction", label_field="detect_output")

In [None]:
dataset.first()['detect_output']

## 8. Set your own system prompts

You can set your own system prompts for the model for any of the operations.

In [None]:
# first, clear the system prompt
model.system_prompt = None

#then set your custom system prompt

model.system_prompt = "Your awesome custom system prompt!"

## 9. View Results


In [None]:
# Visualize all results in the FiftyOne App
session = fo.launch_app(dataset)

In [23]:
session.freeze()