This Python script is designed to navigate and process a specific dataset structure, particularly one that organizes application data with screenshots and, optionally, their corresponding accessibility tree JSON files. Here's a step-by-step breakdown of what the code does:

1. **Configuration for OCR**: It sets up the path to the Tesseract-OCR executable, which is necessary for the Optical Character Recognition (OCR) part of the script. OCR is used to extract text from images.

2. **Accessibility Tree Parsing**: It defines a function, `parse_accessibility_tree`, which is intended to read and process the accessibility tree JSON files. These files typically contain structured data about the UI elements on the screen, such as buttons, labels, and other interactive elements. The actual parsing logic inside this function would need to be implemented based on specific requirements, as it's currently a placeholder to demonstrate where and how you'd interact with these files.

3. **Screenshot Processing with OCR**: The `process_screenshot_with_ocr` function is where the script processes images. It reads screenshots, converts them to grayscale, applies a threshold to highlight text and UI elements, and then uses OCR to extract any text present in the image. It also identifies contours (which could correspond to UI elements) and outlines them, essentially attempting to detect and visualize elements on the screen.

4. **Dataset Processing**: The `process_dataset` function is the core of the script, designed to navigate through the dataset's directory structure. It:
   - Iterates through each application's directory within the dataset.
   - For each application, it further iterates through directories that represent different states or screens of the application.
   - Within each state's directory, it looks for both screenshot images (`.png` files) and accessibility tree JSON files.
   - If an accessibility tree JSON file is found, it calls `parse_accessibility_tree` to handle it. This part is crucial for extracting structured data about the UI elements present in the screenshot.
   - Regardless of whether an accessibility tree is available, it processes every screenshot found using the `process_screenshot_with_ocr` function to apply OCR and basic element detection.

5. **Example Usage**: Finally, the script concludes with an example call to `process_dataset`, specifying the path to the dataset. This part of the code needs to be adjusted to reflect the actual path where the `IASA_Champ_Final` dataset is located on your system.



In [5]:
import json
import os
import cv2
import pytesseract
import numpy as np

# For this code you need to install Tesseract-OCR from:
# https://github.com/UB-Mannheim/tesseract/wiki

# Update the Tesseract-OCR executable path as per your installation
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'  # Update this path

def parse_accessibility_tree(accessibility_tree_path):
    """
    Parse the accessibility tree JSON file and return a structured representation.
    """
    with open(accessibility_tree_path, 'r') as file:
        data = json.load(file)
    # Implement parsing logic based on your requirements
    return data

def process_screenshot_with_ocr(screenshot_path):
    """
    Apply OCR and basic image processing to identify text and potential UI elements in screenshots.
    """
    img = cv2.imread(screenshot_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    _, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY_INV)
    
    text = pytesseract.image_to_string(thresh)
    print("Detected Text:", text)
    
    contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cv2.drawContours(img, contours, -1, (0, 255, 0), 2)
    
    # For visualization purposes, replace the following lines with saving or further processing
    # cv2.imshow('Processed Image', img)
    # cv2.waitKey(0)
    # cv2.destroyAllWindows()

def process_dataset(dataset_path):
    """
    Process each app screen in the dataset by trying to parse its accessibility tree
    and then applying OCR and basic element detection to the screenshot.
    """
    app_data_path = os.path.join(dataset_path, 'app_data')
    for app_dir in os.listdir(app_data_path):
        app_dir_path = os.path.join(app_data_path, app_dir)
        if os.path.isdir(app_dir_path):
            for state_dir in os.listdir(app_dir_path):
                state_dir_path = os.path.join(app_dir_path, state_dir)
                if not os.path.isdir(state_dir_path):
                    continue
                screenshot_path = None
                accessibility_tree_path = None
                for file in os.listdir(state_dir_path):
                    if file.endswith('.png'):
                        screenshot_path = os.path.join(state_dir_path, file)
                    elif file.endswith('.json'):
                        accessibility_tree_path = os.path.join(state_dir_path, file)
                
                if accessibility_tree_path:
                    print(f"Processing accessibility tree: {accessibility_tree_path}")
                    tree_data = parse_accessibility_tree(accessibility_tree_path)
                    # Implement your logic to process the tree_data
                
                if screenshot_path:
                    print(f"Processing screenshot: {screenshot_path}")
                    process_screenshot_with_ocr(screenshot_path)

# Example usage
dataset_path = 'IASA_Champ_Final'
process_dataset(dataset_path)


Processing screenshot: IASA_Champ_Final\app_data\24hourwallpaper\1707228345\24 Hour Wallpaper-1707228347.19.png
Detected Text: General
= @ 7. @ ai 60
General Place &Time Display Downloads Dynamic Desktop About
Shuffle
Shuffle wallpapers: Never 6
Playlist: All Wallpapers @
Include: All Wallpapers 6
Unhide All Wallpapers
Multiple Monitors

Show the same wallpaper on every display
When unchecked you may set different wallpapers on all displays.

Application
Start-up: Open at Login
Show application in: Dock and Menu Bar 6

Graphics Compatibility Mode

This option may resolve graphics issues on
older model Macs with Nvidia GPUs.

Reset All Preferences...

Processing screenshot: IASA_Champ_Final\app_data\24hourwallpaper\1707228389\24 Hour Wallpaper-1707228390.47.png
Detected Text: Browsing 36 of 105 wallpapers

Xx Onion Valley (High Sierras) a ©) Editors' Choice $

White Sands #2

ee
——_—_—

rafting OF aly.
*” - a
co

Yosemite (Lukens Lake}

Pe ee

“Yosemite (Tioga Road)

Show All Wallpapers