Chaimalas* and Vyšniauskas* and Brostow, EXPLORER: Robust Collection of Interactable GUI Elements, 2025 [*Equal]
In the world of GUI Automation, collecting and labelling quality and realistic GUI usability data at scale is a challenging task, particularly given that many benchmarking datasets (e.g. RICO, VINS, etc.) are gathered by expensive crowd-sourcing. In this repository, we provide Android tooling written in Kotlin, in order to automatically traverse any GUI application in modern Android devices and automatically collect GUI screenshots and corresponding ground-truth labels for the following downstream GUI Automation tasks:
- Interactable Detection: GUI screenshots with labelled bounding boxes for tappable elements, used to train tappability/clickability detector models (e.g. FCOS, YOLO, RetinaNet, Faster R-CNN)
- Screen Similarity: GUI screenshots labelled into distinct groups, used to train similarity discriminator models
We train Machine Vision models and implement these downstream tasks in our main "Explorer" repository, available at https://github.com/varnelis/Explorer.
activity_main.xml
- Defines UI of the initial app
AndroidGUICollectionthat queries user for storage/MediaProjection permissions
MainActivity.kt
- Launches initial app
AndroidGUICollectionand sets required storage/MediaProjection permissions - Listens for
ACTION_ACCESSIBILITY_EVENTevents asserted by the backgroundMyAccessibilityServiceand launchesScreenCaptureServiceas foreground service to capture current screenshot
MyAccessibilityService.kt
- Runs in background as accessibility service
- Collects interactable elements or same-state screens (depending on collection mode), performs random UI action to simulate real user clicking through target app
ScreenCaptureService.kt
- Started by
MainActivityas foreground service - Captures current screenshot using MediaProjection API; this is compliant with API 29+ standard
AndroidManifest.xml
- Defines core structure and permissions of the app including the foreground and accessibility services
accessibility_service_config.xml
- Registers the accessibility service to filter for all
ACTION_ACCESSIBILITY_EVENTevents
- Deploy
AndroidGUICollectionapp to an Android device. You can do this by opening this project in Android Studio (inside top directory AndroidGUICollection) and clickingRunto install and run the project as an app in a connected device (USB Debugging enabled in device and ADB configured). - Open app
AndroidGUICollectionand clickStorage PermissionsandMediaProj API Permissionsbuttons. This gives the app permission to take screenshots of the top-level screen using MediaProjection APIs and to save them to internal device storage. - Enable the accessibility service
AccessibilityScraperwhich was installed on the device alongside theAndroidGUICollectionapp. In Android API 29, this is found in Settings > Accessibility > Installed services. - Open a target app.
- The
AccessibilityScraperwill run in the background and actuate the target app opened in the foreground.
The behaviour in Step 5 for the target app depends on whether the AccessibilityScraper service in the app has been installed with the Interactable Detection or Screen Similarity configuration. To set this configuration in the source code, set the SCREENSIM private value in app/src/main/java/com/example/androidguicollection/MyAccessibilityService.kt to Boolean false or true respectively, and re-install the app in the Android device.
If SCREENSIM == false, the actuation of the target app is:
- Screenshot current screen, save to Internal Storage within 500ms as
<uuid>.png, and keep track of UUID by updatingall_uuid_scraped.json. - Traverse accessibility-tree hierarchy of active foreground via BFS in
MyAccessibilityService.traverseAccessibilityTreeto extract all tappable/clickable and scrollable elements, then get ground-truth tappable bboxes inMyAccessibilityService.getBboxand save to JSON<uuid>.jsonas{<uuid>: List[bboxes]}. - Do a random UI action:
- If device keyboard is visible/selected, 60% chance to inject 1-6 character random string & 40% chance to tap random tappable element;
- Else, 30% chance to scroll down inside random scrollable element and 70% chance to tap random tappable element.
- Wait 4000ms for loading events to settle; this is heuristic and depends on device speed, internet connection (if applicable to target app) etc. Change by setting
delayMillisinMyAccessibilityService.kt. - Repeat from Step 1. Note: The UI action performed in Step 3 will trigger a
ACTION_ACCESSIBILITY_EVENT, which is detected by theMainActivity--> takes screenshot and calls again from Step 1.
If SCREENSIM == true, the actuation of the target app is:
- Screenshot current screen twice within 5000ms and save to Internal Storage
<uuid1>.pngand<uuid2>.png, and keep track of both UUIDs by updatingall_uuid_scraped.json. - Label the two screenshots into the same group (i.e. labelled same-state) by updating
domain_map.jsonwith new group entry{<group-uuid>: [uuid1, uuid2]}. - Steps 3-5 same as case of
SCREENSIM == falseabove.
If you utilize our Android-based data collection and labelling in your GUI Automation research, please consider citing our work:
@misc{chaimalas2025explorerrobustcollectioninteractable,
title={Explorer: Robust Collection of Interactable GUI Elements},
author={Iason Chaimalas and Arnas Vyšniauskas and Gabriel Brostow},
year={2025},
eprint={2504.09352},
archivePrefix={arXiv},
primaryClass={cs.HC},
url={https://arxiv.org/abs/2504.09352},
}
Also refer to our main "Explorer" repository for the paper implementation.