This assumes you have cloned the dataset repostory from Hugging Face.

You can do that as follows: `git clone https://huggingface.co/datasets/yiye2023/GUIAct`

Once you have done so, load the following `json` and `parquet` files:

In [1]:
import json 

with open('GUIAct/smartphone_test_data.json', 'r') as f:
    json_data = json.load(f)

In [2]:
import pandas as pd 

df = pd.read_parquet('GUIAct/smartphone_test_images.parquet')
df.reset_index(inplace=True)

Import helper functions to process the data and create a FiftyOne dataset from the JSON and parquet files.

These will processes the test set of the smartphone GUI interaction data by:

1. Converting base64-encoded screenshots to JPEG images

2. Normalizing UI element coordinates from pixel values to [0,1] range

3. Structuring interaction data into episodes and steps

4. Converting XML-style point strings (`<point>x,y</point>`) into coordinate pairs

5. Creating a FiftyOne dataset where each sample contains:
   - Screenshot as image
   - UI elements as Detection objects with normalized bounding boxes
   - Action history as structured text
   - Interaction points as Keypoint objects (single points for taps, dual points for swipes)
   - Metadata: episode_id, step number, question text, current action

Input format expects JSON with `uid`, `actions_label`, `image_size`, and UI element data with positions (`x`, `y`, `width`, `height`). Outputs a FiftyOne dataset for ML training data visualization and analysis.

In [3]:
from guiact_smartphone_to_fiftyone import process_json_data, create_dataset

In [4]:
processed_json =  process_json_data(json_data)

In [5]:
processed_json[0]

{'uid': 'uid_episode_10270193012375700035_step_00',
 'image_id': 'uid_episode_10270193012375700035_step_00',
 'image_size': {'width': 720, 'height': 1440},
 'question': 'What is the capital of Brazil?',
 'actions_history': '',
 'logs': '',
 'thoughts': '',
 'actions_label': {'name': 'tap',
  'point': {'absolute': '<point>362, 1412</point>',
   'related': '<point>0.503, 0.981</point>'}},
 'episode_id': 'episode_10270193012375700035',
 'step': 0,
 'current_action': 'tap: <point>0.503, 0.981</point>',
 'structured_history': []}

In [None]:
dataset = create_dataset(df, processed_json)

 100% |███████████████| 2079/2079 [38.6s elapsed, 0s remaining, 54.0 samples/s]      
Computing metadata...
  48% |███████|-------| 1000/2079 [17.0s elapsed, 18.3s remaining, 58.9 samples/s]     

In [None]:
dataset

In [None]:
dataset.skip(40).first()

In [None]:
import fiftyone as fo

fo.launch_app(dataset)

In [None]:
import fiftyone.utils.huggingface as foh

fouh.push_to_hub(
    dataset,
    "guiact_smartphone_test",
)