# GraphGuard

***Locate and find Classes in Apks with updated Obfuscation Mapping***


Processing Steps:
1. String Matcher (Finding Classes and Methods
  * Counting Strings used in Classes and Methods and try to find exact matching counter.
  * Find Classes by identifying Strings used only in this single Class.
2. Structure Matcher (Finding Classes)
  * Modifiers of class
  * Modifiers, Parameters, Parameter Types and Return Types of Methods
  * Number and Types of Fields.
3. Method Matcher (Find Methods from matching Classes)
  * Modifiers
  * Return Type, Parameter Types
  * Bytecode Length
  * References to and from

In [None]:
%matplotlib notebook


from IPython.core.display import display, HTML
display(HTML("<style>div.output_area pre {white-space: pre;}</style>"))

In [None]:
import unittest
from collections import defaultdict, Counter
from os import path

from androguard.core.analysis.analysis import MethodAnalysis, ClassAnalysis, FieldAnalysis
from androguard.core.bytecode import FormatClassToJava
from androguard.misc import AnalyzeAPK
from androguard.session import Save, Session, Load

from formats import *
from decs import *

from matching import matcher, strings, structures, methods

from start import process_files

# Loading Androguard

The following code loads the files and starts Androguard

It should support multiprocessing, however the Pipe communication seems to break when transmitting the processed Androguard Objects. I suspect the Object is simply too big for Pickle to serialize or another component in the transmitting chain.

In [None]:
AG_SESSION_FILE = "./Androguard.ag"
MULTIPROCESS_FILES = False  # Currently not working due to serialization issues


# Matching Rules
strings.MAX_USAGE_COUNT_STR = 20
strings.UNIQUE_STRINGS_MAJORITY = 2 / 3

methods.MIN_MATCH_POINTS = 2



# APK Files to load
file_paths = (
    "../../../Downloads/com.snapchat.android_10.85.5.74-2067_minAPI19(arm64-v8a)(nodpi)_apkmirror.com.apk",
    "../../../Downloads/com.snapchat.android_10.86.5.61-2069_minAPI19(arm64-v8a)(nodpi)_apkmirror.com.apk"
)

In [None]:
(a, d, dx), (a2, d2, dx2) = process_files(file_paths, MULTIPROCESS_FILES)

# List of Methods

Defining the list of methods to find (obviously requires full class names)

In [None]:
decs_to_find = (
    MethodDec("rD5", "a", "rD5", "qD5"),
    MethodDec("MSg", "j0", "SGd"),
    MethodDec("x45", "h"),
    MethodDec("GIb", "<init>", skip_params=True)
)

# Processing and Matching

Loading the accumulator, an object that manages all the possible candidates that are matched by the different Matchers, and extracts the matching candidates. It also performs Inner joins on previous candidates to find the exact (or optimal) match.

In [None]:
accumulator = matcher.Accumulator()

args = (dx, dx2, resolved_classes, decs_ma)

Resolving the previously defined MethodDecs. If this fails, the MethodDecs are wrong and contain an error. Make sure the method specified with the MethodDec exists.

In [None]:
resolved_classes = resolve_classes(dx, decs_to_find)
resolved_methods = resolve_methods(decs_to_find, resolved_classes)
decs_ma = dict(zip(decs_to_find, resolved_methods))

print("Resolved all Classes and Methods")
if False:
    print("", *map(pretty_format_ma,resolved_methods), sep="\n* ")

## String Matcher

### Exact Counter Match

Extracts Strings used either in the given methods directly or in the classes the methods define for both, the old version and the new version. It then compares the Counters for classes and methods and tries to find exact matches between the Counter Objects.

In [None]:
string_matcher = strings.StringMatcher(*args, accumulator.get_unmatched_ms(decs_to_find))
candidates_cs, candidates_ms = string_matcher.compare_counters()

accumulator.add_candidates(candidates_cs, candidates_ms)

### Unique Strings

Gather all Strings that are used only in a single class ("Unique Strings") that we still need to match. Then try to find the matching class by only searching for the Unique Strings.

In [None]:
candidates_cs = string_matcher.compare_unique_strings(accumulator.get_unmatched_cs(decs_to_find))

accumulator.add_candidates(candidates_cs)

## Structure Matcher

Iterating through every single class and checks for each unmatched class if both have a similar "Profile":
* Number of Methods and Fields
* Types of Fields and Descriptors of Methods

In [None]:
structure_matcher = structures.StructureMatcher(*args, accumulator.get_unmatched_ms(decs_to_find))
candidates_cs = structure_matcher.get_exact_structure_matches(accumulator.get_unmatched_cs(decs_to_find))

accumulator.add_candidates(candidates_cs)

## Method Matcher

Uses different weighted criteria to get exact or optimal matches. The criteria are:
* Modifiers
* Return Type and Parameter Types
* Length of Bytecode
* References to and from

In [None]:
method_matcher = methods.MethodMatcher(*args, accumulator.get_unmatched_ms(decs_to_find))

# Exact Matches
candidates_ms = method_matcher.try_resolve_ms(accumulator.get_unmatched_cs(decs_to_find), accumulator.matching_cs, True)
accumulator.add_candidates(candidates_ms=candidates_ms)

# Non-Exact Matches by using weights on the criteria
candidates_ms = method_matcher.try_resolve_ms(accumulator.get_unmatched_cs(decs_to_find), accumulator.matching_cs, False)
accumulator.add_candidates(candidates_ms=candidates_ms)

# Results

In [None]:
print(len(accumulator.matching_cs), "/", len(resolved_classes), "Classes were matched")
print(len(accumulator.matching_ms), "/", len(decs_to_find), "Methods were matched")

print()
print("Matching Classes:")
for c1, c2 in accumulator.matching_cs.items():
    print("*", pretty_format_class(c1), "->", pretty_format_class(c2))

print()
print("Matching Methods: ")
for m, ma in accumulator.matching_ms.items():
    print("*", m.pretty_format(), "->", pretty_format_ma(ma))