* Logical Steps to find all possible function call sequences triggered by a user event in Android apps *

Step 1. 
Parsing the Android Manifest File: Identifying all activities and their entry points.

Step 2. 
Parsing the UI Layout Files: Extracting all UI elements and their associated event handlers.

Step 3. 
Parsing the Java Source Code Files: Identifying all the method declaration and the subsequent method calls within (** Identifying Loops and Branches too, this time)
                                       
Step 4. 
Building a Call Graph: Building the Call Sequences analyzing the above set of results from (Step 1) manifest, (Step 2) layout and (Step 3) source code.

Pre-requisites:
1. Need py4j module which allows this python script to interact with Java Program via temp. server having JVM Machine

Improvements:
1. Handling Nested directories

2. Handling Multiple Activities in a Single File:

3. Error Handling: If any file parsing fails (due to malformed XML, incorrect paths, or other issues), the code will throw an exception and stop executing.

4. Handling Multiple Layout Files with the Same UI ID:

5. Finding All Method Declarations: The code now captures direct method invocations within the current method and also captures chained or deeply nested method calls.

6. The code now handles the branching and loops satement as well.

Pre-requisite for the script

In [66]:
import os                            # Reading/writing files (We will be reading Manifest File, Layout file and Java File
import xml.etree.ElementTree as ET   # Parsing XML file as we will be dealing with manifest file and layout file
from pprint import pprint            # Prettifying the output instead of standard print() statment
from py4j.java_gateway import JavaGateway # This library allows Python script to interact iwth Java Programs

# Path of the directory. Here, I have used three main paths: Manifest Directory (XML File), Layouts Directory (XML files) and Source Directory (.Java files)
manifest_directory = "../Documents/Final/app/src/main/"
layouts_directory = "../Documents/Final/app/src/main/res/layout/"
source_directory = "../Documents/Final/app/src/main/java/com/example/afinal/"

# this is my custom print statment to output the result in readable way
def var_dump(name_of_variable, output):
    print(f"[{name_of_variable}]")
    print(type(output))
    pprint(output)
    print("======")

# Recursively gather all files with a specific extension
# @param: directory: Path of the directory
# @param: extension: Which file type need to be filtered from the given directory
def gather_files(directory, extension):
    files = []

    # root: The current directory path being explored
    # filenames: list of filenames in current directory
    # _ : we do not want 2nd item from os.walk (a tuple), so marked it as _
    for root, _, filenames in os.walk(directory):            #iterates over directories and sub-directories, r
        for filename in filenames:
            if filename.endswith(extension):                 # if mathches the given filename extension
                files.append(os.path.join(root, filename))   # full path to the file is constructed and is added to files[]
    return files
    

Step 1: Parsing the Android manifest File:

In [68]:

#Parses the Android Manifest File to identify all 'activites'      
def parse_manifest_directory(manifest_directory):
    
    # Searches only manifest files (xml) and stores manifest_files
    manifest_files = gather_files(manifest_directory, '.xml')
    activities = [] # Empty list to hold the name of Activites found in Manifest file

    for manifest_file in manifest_files:     
        tree = ET.parse(manifest_file)                  # Parse the XML files and return ElementTree object
        root = tree.getroot()                           # Get the root elment of the prased SML tree. <manifest> element is generally the root element
        for activity in root.findall(".//activity"):    # finall() looks for the 'activity' elements in the tree. We are looking for activity
            activity_name = activity.get("{http://schemas.android.com/apk/res/android}name")  # Get the name of the activity (namespace is used here for retrieving name) 
            activities.append(activity_name)
    return activities

activities = parse_manifest_directory(manifest_directory)
# I am using my custom function to prettify (more readable) the output instead of plain print() 
var_dump("activities", activities)


[activities]
<class 'list'>
['.SettingsActivity', '.MainActivity']


Step 2: Parsing the UI Layout Files and extracts information about any 'onClick' evetns. Nore: For this test purpose, I have only used 'onClick' UI event, it can be extended to handle other events as well. 

In [70]:

# Parse layout files to get handlers
def parse_layouts_directory(layouts_directory):
    layout_files = gather_files(layouts_directory, '.xml')
    handlers = {}

    for layout_file in layout_files:
        tree = ET.parse(layout_file)
        root = tree.getroot()
        for element in root.iter():
            onClick = element.get("{http://schemas.android.com/apk/res/android}onClick") #retrieves the value of the attribute android:onClick attribute
            if onClick:
                element_id = element.get("{http://schemas.android.com/apk/res/android}id", 'unknown') # retrieve the value of the attribute ID of android:onClick attribute, if not assign it to 'unknown'
                handlers[element_id] = onClick # add the id of attribute and value of onClick in Key-Value pair in dictionary
    return handlers

handlers = parse_layouts_directory(layouts_directory)
var_dump("handlers", handlers)


[handlers]
<class 'dict'>
{'@+id/btnCountLoop': 'addNumbers',
 '@+id/btnDisplayText': 'handleText',
 '@+id/btnDoWhile': 'loopDoStatement',
 '@+id/btnForEach': 'forEachStatement',
 '@+id/btnIfStatement': 'displayGreater',
 '@+id/btnSettings': 'launchSettings',
 '@+id/btnSwitch': 'switchStatment',
 '@+id/button': 'goBack'}


Step 3: Parsing the Java Source Code Files

In [72]:
# Start the Java gateway: Connection to a JVM
gateway = JavaGateway()

# Parse Java source files
def parse_java_source_code_directory(source_directory):

    source_files = gather_files(source_directory, '.java') #get the .Java files from the directory
    call_graph = {} #empty dictionary to later hold the call graph / sequence of event

    java_parser = gateway.entry_point #retrieves the entry point to the Java code from the Java gateway

    for source_file in source_files: #Parsing each java files found from the given directory
        try:
            method_calls_map = java_parser.parseJavaFile(source_file)  # this calls the method "parseJavaFile" in JavaParserService.java and should match!!
            
            # Following line is for testing purpose: To check what JavaParser has dumped to this python program
            # var_dump("method_calls_map", method_calls_map)

            call_graph[source_file] = {}

            # Reference only: 
            # key   : onCreate (String) 
            # value : Java object (py4j.java_collections.JavaList) [[[ Understanding this JavaObject was the most difficult part for me, always messed up when producing outpu ]]]
            for key, value in method_calls_map.items():
  
                call_graph[source_file][key] = [str(call) for call in value]  # Convert each MethodCall object to string using its toString method [This was tough solution for me]
                
        except Exception as e:
            print(f"Error parsing {source_file}: {e}")

    return call_graph

call_graph = parse_java_source_code_directory(source_directory)

# for testing purpose
# var_dump("call_graph", call_graph)

# Create a sequence dictionary combining *handlers - (Contains methods and its UI Trigger Name) and call graphs - (Contains Method Declaration and the subsequent mthodname within along with branching and loops) 
sequence = {}
for key, value in handlers.items():
    for source_file, methods in call_graph.items():
        if value in methods:
            sequence[key] = {value: methods[value]}

# Print the new dictionary
var_dump("sequence", sequence)

[sequence]
<class 'dict'>
{'@+id/btnCountLoop': {'addNumbers': ['findViewById (METHOD_CALL)',
                                      'setText (METHOD_CALL)',
                                      'valueOf (METHOD_CALL)',
                                      'setText (FOR_STMT)',
                                      'valueOf (FOR_STMT)']},
 '@+id/btnDisplayText': {'handleText': ['findViewById (METHOD_CALL)',
                                        'toString (METHOD_CALL)',
                                        'getText (METHOD_CALL)',
                                        'setText (METHOD_CALL)',
                                        'findViewById (METHOD_CALL)',
                                        'show (METHOD_CALL)',
                                        'makeText (METHOD_CALL)']},
 '@+id/btnDoWhile': {'loopDoStatement': ['findViewById (METHOD_CALL)',
                                         'setText (METHOD_CALL)',
                                         'setText (DO_S