# Using a Global Call Graph

Consider a (mutli-directed) Graph where every single node represents a single function and an edge represents a function call. Abstracting these ideas away into a Graph allows us to treat updating names of functions and classes as a Graph Identification problem, identifying node labels.

Indeed, there are some fixed points, such as entry points of the every application (Activities, Services, Content Providers, ...). Even "external" nodes can be used, such as the Android APIs (Activity Classes, ...) or even libraries (if you want to consider them as such). Clearly, in the Call Graph, in this extremely large amount of nodes can be found a pattern, since functions usually remain the same and node edges hence remain.

This is exactly the problem GraphGuard wants to solve: using labeled graphs, find specific nodes with non-corresponding labels in an updated graph (assuming the Global Call Graph does not change too much, which seems reasonable considering how little app code actually gets updated in regards library code, and the Android APIs).


At this moment, this method of even generating or loading (needless to say showing and rendering) a Graph of this size is too resource intensive to continue working on this. However, it still is an idea worth to be pursued with optimizations from `androguard`'s side, which may or may not happen. Its single-threaded APIs and Python limitations hurt the workflow (Pickle Serialization to load Apk Sessions, ...). Visualizing the full graph with Gephi works well enough, while `networkx` also hits its limits with a graph of this size.

In [1]:
import networkx as nx
import androguard.cli
import sys
import os
from pathlib import Path

In [2]:
def generate_cg(APK,
                show=True,
                output='callgraph.gml',
                verbose=False,
                classname=r'.*',
                methodname=r'.*',
                descriptor=r'.*',
                accessflag=r'.*',
                no_isolated=False):
    """Copied from androguard Github, needs graph as return value (avoid loading twice)"""
    from androguard.core.androconf import show_logging
    from androguard.core.bytecode import FormatClassToJava
    from androguard.misc import AnalyzeAPK
    import networkx as nx
    import logging
    log = logging.getLogger("androcfg")
    if verbose:
        show_logging(logging.INFO)

    a, d, dx = AnalyzeAPK(APK)

    entry_points = map(FormatClassToJava,
                       a.get_activities() + a.get_providers() +
                       a.get_services() + a.get_receivers())
    entry_points = list(entry_points)

    log.info("Found The following entry points by search AndroidManifest.xml: "
             "{}".format(entry_points))

    CG = dx.get_call_graph(classname,
                           methodname,
                           descriptor,
                           accessflag,
                           no_isolated,
                           entry_points,
                           )

    write_methods = dict(gml=_write_gml,
                         gexf=nx.write_gexf,
                         gpickle=nx.write_gpickle,
                         graphml=nx.write_graphml,
                         yaml=nx.write_yaml,
                         net=nx.write_pajek,
                         )

    if show:
        plot(CG)
    else:
        writer = output.rsplit(".", 1)[1]
        if writer in ["bz2", "gz"]:
            writer = output.rsplit(".", 2)[1]
        if writer not in write_methods:
            print("Could not find a method to export files to {}!"
                  .format(writer))
            sys.exit(1)

        write_methods[writer](CG, output)
    return CG

In [3]:
apk_file = '/home/jaqxues/Downloads/com.snapchat.android_11.6.1.66-2125_minAPI19(arm64-v8a)(nodpi)_apkmirror.com.apk'
out_file = f'cg.{Path(apk_file).name}.gml'

CG = nx.read_gml(out_file) if os.path.exists(out_file) else generate_cg(apk_file, show=False, output=out_file)