This Python script is designed to navigate and process a specific dataset structure, particularly one that organizes application data with screenshots and, optionally, their corresponding accessibility tree JSON files. Here's a step-by-step breakdown of what the code does:

1. **Configuration for OCR**: It sets up the path to the Tesseract-OCR executable, which is necessary for the Optical Character Recognition (OCR) part of the script. OCR is used to extract text from images.

2. **Accessibility Tree Parsing**: It defines a function, `parse_accessibility_tree`, which is intended to read and process the accessibility tree JSON files. These files typically contain structured data about the UI elements on the screen, such as buttons, labels, and other interactive elements. The actual parsing logic inside this function would need to be implemented based on specific requirements, as it's currently a placeholder to demonstrate where and how you'd interact with these files.

3. **Screenshot Processing with OCR**: The `process_screenshot_with_ocr` function is where the script processes images. It reads screenshots, converts them to grayscale, applies a threshold to highlight text and UI elements, and then uses OCR to extract any text present in the image. It also identifies contours (which could correspond to UI elements) and outlines them, essentially attempting to detect and visualize elements on the screen.

4. **Dataset Processing**: The `process_dataset` function is the core of the script, designed to navigate through the dataset's directory structure. It:
   - Iterates through each application's directory within the dataset.
   - For each application, it further iterates through directories that represent different states or screens of the application.
   - Within each state's directory, it looks for both screenshot images (`.png` files) and accessibility tree JSON files.
   - If an accessibility tree JSON file is found, it calls `parse_accessibility_tree` to handle it. This part is crucial for extracting structured data about the UI elements present in the screenshot.
   - Regardless of whether an accessibility tree is available, it processes every screenshot found using the `process_screenshot_with_ocr` function to apply OCR and basic element detection.

5. **Example Usage**: Finally, the script concludes with an example call to `process_dataset`, specifying the path to the dataset. This part of the code needs to be adjusted to reflect the actual path where the `IASA_Champ_Final` dataset is located on your system.



In [1]:
import json
import os
import cv2
import pytesseract
import numpy as np

For this code you need to install [Tesseract-OCR](https://github.com/UB-Mannheim/tesseract/wiki)(Windows).

In [2]:
# Tesseract-OCR executable path as per your installation
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' 

In [3]:
def describe_node(node, indent=0):
    # Basic properties
    description_parts = [
        f"Name: {node.get('name', 'None')}",
        f"Role: {node.get('role', 'None')}",
        f"Description: {node.get('description', 'None')}",
        f"Role Description: {node.get('role_description', 'None')}",
        f"Value: {node.get('value', 'None')}"
    ]

    # bbox and visible_bbox
    bbox = node.get('bbox', [])
    visible_bbox = node.get('visible_bbox', [])
    bbox_description = f"BBox: {bbox}" if bbox else "BBox: None"
    visible_bbox_description = f"Visible BBox: {visible_bbox}" if visible_bbox else "Visible BBox: None"
    
    description_parts.extend([bbox_description, visible_bbox_description])

    # Indentation for hierarchical structure
    indent_str = "    " * indent
    full_description = indent_str + ", ".join(description_parts)

    print(full_description)

    # Recursively describe children
    children = node.get('children', [])
    for child in children:
        describe_node(child, indent + 1)
        
def process_accessibility_tree(tree):
    describe_node(tree)

def parse_accessibility_tree(accessibility_tree_path):
    with open(accessibility_tree_path, 'r') as file:
        data = json.load(file)
    process_accessibility_tree(data)

In [7]:
def process_screenshot_with_ocr(screenshot_path):
    img = cv2.imread(screenshot_path)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    _, thresh = cv2.threshold(gray, 150, 255, cv2.THRESH_BINARY_INV)

    # Use pytesseract.image_to_data() to get bounding box coordinates for each detected text
    data = pytesseract.image_to_data(thresh, output_type=pytesseract.Output.DICT)

    num_boxes = len(data['text'])
    for i in range(num_boxes):
        if int(data['conf'][i]) > 60:  # Confidence threshold to filter weak detections
            (x, y, w, h) = (data['left'][i], data['top'][i], data['width'][i], data['height'][i])
            img = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
            print(f"Detected text: {data['text'][i]}, Coordinates: {(x, y, x + w, y + h)}")

    # Optionally, display the image with bounding boxes
    # cv2.imshow('OCR Results', img)
    # cv2.waitKey(0)
    # cv2.destroyAllWindows()

In [8]:

def process_dataset(dataset_path):
    """
    Process each app screen in the dataset by trying to parse its accessibility tree
    and then applying OCR and basic element detection to the screenshot.
    """
    app_data_path = os.path.join(dataset_path, 'app_data')
    for app_dir in os.listdir(app_data_path):
        app_dir_path = os.path.join(app_data_path, app_dir)
        if os.path.isdir(app_dir_path):
            for state_dir in os.listdir(app_dir_path):
                state_dir_path = os.path.join(app_dir_path, state_dir)
                if not os.path.isdir(state_dir_path):
                    continue
                screenshot_path = None
                accessibility_tree_path = None
                for file in os.listdir(state_dir_path):
                    if file.endswith('.png'):
                        screenshot_path = os.path.join(state_dir_path, file)
                    elif file.endswith('.json'):
                        accessibility_tree_path = os.path.join(state_dir_path, file)
                
                if accessibility_tree_path:
                    print(f"Processing accessibility tree: {accessibility_tree_path}")
                    tree_data = parse_accessibility_tree(accessibility_tree_path)
                    # Implement your logic to process the tree_data
                
                if screenshot_path:
                    print(f"Processing screenshot: {screenshot_path}")
                    process_screenshot_with_ocr(screenshot_path)

In [None]:
# Example usage
dataset_path = 'IASA_Champ_Final'
process_dataset(dataset_path)

Processing screenshot: IASA_Champ_Final\app_data\24hourwallpaper\1707228345\24 Hour Wallpaper-1707228347.19.png
Detected text: General, Coordinates: (431, 16, 528, 36)
Detected text: =, Coordinates: (85, 74, 135, 112)
Detected text: @, Coordinates: (223, 60, 272, 112)
Detected text: @, Coordinates: (503, 66, 538, 111)
Detected text: ai, Coordinates: (660, 63, 720, 109)
Detected text: General, Coordinates: (70, 116, 148, 132)
Detected text: Display, Coordinates: (350, 116, 423, 136)
Detected text: Downloads, Coordinates: (457, 116, 569, 132)
Detected text: Dynamic, Coordinates: (599, 116, 687, 136)
Detected text: Desktop, Coordinates: (696, 116, 780, 136)
Detected text: About, Coordinates: (821, 116, 881, 132)
Detected text: Shuffle, Coordinates: (41, 206, 133, 226)
Detected text: Shuffle, Coordinates: (157, 272, 241, 292)
Detected text: wallpapers:, Coordinates: (250, 273, 385, 297)
Detected text: Never, Coordinates: (422, 275, 491, 294)
Detected text: 6, Coordinates: (644, 270, 676, 3