# Debug Me Analysis

## Setup

### Debug Me File

In [None]:
with open('./debugme_unpacked', 'rb') as f:
    debugme = f.read()

The binary addresses start at `0x400000` so we need this as an offset. As for our search space, it makes sense to avoid the plaintext strings, and null-byte strings: these will XOR together to produce the original plaintext string anyway. Looking through the .data section listing in Ghidra we can see another high-entropy section of scrambled strings start at `0x0055c1c0` and end at `0x0055c470`, so let's search there.

In [None]:
offset = 0x400000
start_addr = 0x0055c1c0
end_addr = 0x0055c470

start = start_addr - offset
end = end_addr - offset

### Decrypted Text Analysis

During my original searching I used a wordlist and a trie, similar to my approach to the ENIGMA challenge, so search for strings, but it turned out to not be necessary: common ASCII is all you need.

In [None]:
validset = set([ord(c) for c in "!@#$%^&*-_+=(){}[]<>,.?/:;' "])
validset |= set([ord('a')+c for c in range(26)])
validset |= set([ord('A')+c for c in range(26)])
validset |= set([ord('0')+c for c in range(10)])
validset.add(0) # include null-termination of strings

## String Searching

### Pairwise-XOR Search

We'll search for all pairs of addresses in our search space, and build the longest common ASCII string possibly by XORing them. Any strings that meet a given threshold in length we'll print for us to read through and analyze.

In [None]:
threshold = 8

for istart in range(start,end):
    for jstart in range(istart,end):
        decoded = ''
        for ik in range(end-jstart):
            c = debugme[istart+ik] ^ debugme[jstart+ik]
            if c not in validset:
                break
            decoded += chr(c)
        score = len(decoded)
        if score >= threshold:
            print(f"{hex(istart+offset)}-{hex(istart+offset+score)} {hex(jstart+offset)}-{hex(jstart+offset+score)} {jstart-istart} {score}: {decoded}")
            

What do we see here? Fragments of plaintext strings, but not full sentences. However between the different strings we notice that some share an address. Together this implies there must be a single XOR key, but that it is not completely contiguous within the memory: some form of scrambling obfuscation is happening.

If we read carefully we'll see what looks like the plaintext strings we already have seen from running the executable, and are present in plaintext. But there are at different addresses.

- `0x55c244`-`0x55c250` `0x55c304`-`0x55c310` `192` `12`: `"s it and I'l"`
- `0x55c1d1`-`0x55c1e0` `0x55c251`-`0x55c260` `128` `15`: `" give up my sec"`

Looking at the address ranges, `0x55c244`-`0x55c250` and `0x55c251`-`0x55c260` are contiguous too.

What if we searched for all the known strings, looking for a common XOR key?

### Pairwise-XOR vs Known-String Hits Search

In [None]:
targetStrings = [
    "I'm thinking of a number between 1 and 3000000.",
    "Guess it and I'll give up my secrets.",
    "I don't think you understand how numbers work...",
    "Nope! Next time, try concentrating harder.",
]

We can perform the same search as before, but this time when we find a decrypted string above the threshold we can also check for it being within the target strings. We shall keep our best alignments and see what we get.

In [None]:
from collections import Counter
import difflib

threshold = 8

hits = Counter()
targetHits = [(0,)]*len(targetStrings)

for istart in range(start,end):
    for jstart in range(istart+1,end):
        decoded = ''
        for ik in range(end-jstart):
            c = debugme[istart+ik] ^ debugme[jstart+ik]
            if c not in validset:
                break
            decoded += chr(c)
        score = len(decoded)
        if score >= threshold:
            hits[istart] += 1
            hits[jstart] += 1
            for itarget,targetString in enumerate(targetStrings):
                seq = difflib.SequenceMatcher(None, decoded, targetString)
                seqMatch = seq.find_longest_match(0, len(decoded), 0, len(targetString))
                if seqMatch.size > targetHits[itarget][0]:
                    targetHits[itarget] = (seqMatch.size, seqMatch, istart+seqMatch.a,jstart+seqMatch.a, decoded[seqMatch.a:])

targetHits

The results are very insightful: all the top alignments for each target string are exactly 15 characters long, but even more interesting is they all share the same starting location in the target string, and share 1 address. This confirms a single XOR key, and gives us the ability to calculate where the start of our target strings should be in memory, assuming they are contiguous:

In [None]:
targetStarts = [istart-seqMatch.b if hits[istart] == 1 else jstart-seqMatch.b for l,seqMatch,istart,jstart,decoded in targetHits]
for tstart, tstring in zip(targetStarts,targetStrings):
    print(f"{hex(tstart+offset)}: {tstring}")

### Known-String Simplest-Common-Key Search

To search for the scrambled XOR key, we can use dynamic programming: the idea being we want to use as few different chunks as possible: we want as much contiguity as possible.  
So for each starting position in the target strings, `ic`, we want to search between `start` to `end` for the longest contiguous XOR key which will decode all of the `targetString`s.
When we have such a valid substring of the XOR key, it is the best candidate to decode up to its end if it can be used in a sequence such that on average as few different chunks are used.

In [None]:
length = min(map(len,targetStrings))

nkeys = [-1]*(length+1)
solkeys = [[]]*(length+1) # Note that a more efficient method of recovering the final solution is possible
solkeylens = [[]]*(length+1)

for ic in range(length):
    for istart in range(start,end):
        for ik in range(length-ic):
            if any(debugme[tstart+ic+ik] ^ debugme[istart+ik] != ord(targetString[ic+ik]) for tstart,targetString in zip(targetStarts,targetStrings)):
                break
            score = ik+1
            if nkeys[ic+score] == -1 or nkeys[ic+score] > nkeys[ic]+1:
                nkeys[ic+score] = nkeys[ic] + 1
                solkeys[ic+score] = solkeys[ic] + [istart]
                solkeylens[ic+score] = solkeylens[ic] + [score]

key_chunk_addrs = solkeys[-1]
key_chunk_lens = solkeylens[-1]

for kstart, klen in zip(key_chunk_addrs,key_chunk_lens):
    print(f"{hex(kstart+offset)}, {klen}")

Interestingly, we only 2 long chunks of length 12 and 15, and others all length 1. Also I don't see any pattern between the addresses of the longer chunks. Oh well.

### Simple Key Recovery

Of course the easiest way to get the key is to just take our longest known string and its start, which we now know, and XOR them to get out the key.

In [None]:
maxStart = 0
maxStr = ''
for tstart,targetString in zip(targetStarts,targetStrings):
    if len(targetString) > len(maxStr):
        maxStart = tstart
        maxStr = targetString

xorkey = [debugme[maxStart+ik]^ord(c) for ik,c in enumerate(maxStr)]
print(len(xorkey))
print(xorkey)

### Unknown String Search

With the key now in hand we can search for other unknown strings hidden in the binary. Im also going to check for when we reach a null byte, i.e. terminator for C strings, and then only check for padding null bytes afterwards. This way we can better see which strings are adjacent.

In [None]:
threshold = 8
for istart in range(start,end):
    decoded = ''
    nullhit = False
    for ik,k in enumerate(xorkey):
        c = debugme[istart+ik] ^ k
        if c not in validset or (nullhit and c != 0):
            break
        nullhit = nullhit or c == 0
        decoded += chr(c)
    score = len(decoded)
    if score >= threshold:
        print(f"{hex(istart+offset)}-{hex(istart+score+offset)}: {decoded}")

Amazing! Looks like we found the flag... or at least its printf template.

### Guessing More Of The Key

We have some of pieces of text which are cut off, but if we look through our original pairwise-XOR search, we can guess what they are and get more of the key.  
Also I noticed is that the addresses almost always end in `0x0`, except for the 2 longest strings: this implies the strings are stored in blocks of 16 with null byte padding. We can use this to pull out the final few bytes of the XOR key.

In [None]:
tstart = 0x55c2b0 - offset
targetString = "Are you trying to influence me? Your Jedi mind tricks are no good here."
remainder = 16 - (len(targetString) % 16)
xorkey = [debugme[tstart+ik] ^ ord(c) for ik,c in enumerate(targetString)]
xorkey = xorkey + [debugme[tstart+len(targetString)+ik] for ik in range(remainder)]
print(len(xorkey))
print(xorkey)

In [None]:
threshold = 8
for istart in range(start,end):
    decoded = ''
    nullhit = False
    for ik,k in enumerate(xorkey):
        c = debugme[istart+ik] ^ k
        if c not in validset or (nullhit and c != 0):
            break
        nullhit = nullhit or c == 0
        decoded += chr(c)
    score = len(decoded)
    if score >= threshold:
        print(f"{hex(istart+offset)}-{hex(istart+score+offset)}: {decoded}")

## Patching The Binary

### Marking The Binary

In [None]:
with open('debugme_unpacked', 'rb') as f:
    data = bytearray(f.read())

xorkey_0 = 23 # xorkey[0] # The first byte of the XOR key
offset = 0x400000
addrs = [
    (0x0055c008, 0x55c210), # "I'm thinking..."
    (0x0055c038, 0x55c240), # "Guess..."
    (0x0055c060, 0x55c270), # "I don't..."
    (0x0055c0f8, 0x55c440), # "Nope..."
]

for debug,xorvl in addrs:
    data[debug-offset] = ord('0')
    data[xorvl-offset] = ord('1') ^ xorkey_0

with open('debugme_marked', 'wb') as f:
    f.write(data)

### Disabling The TracerId

In [None]:
with open('debugme_marked', 'rb') as f:
    data = bytearray(f.read())

xorkey_0 = 23 # xorkey[0] # The first byte of the XOR key
offset = 0x400000

data[0x55c1e0-offset] = ord('_') ^ xorkey_0

with open('debugme_notracerid', 'wb') as f:
    f.write(data)

### Disabling ptrace

In [None]:
with open('debugme_notracerid', 'rb') as f:
    data = bytearray(f.read())

addr = 0x0051b72c
offset = 0x400000

data[addr-offset:addr-offset+2] = [0x90,0x90]

with open('debugme_noptrace', 'wb') as f:
    f.write(data)

### Guaranteed Win 

In [None]:
with open('debugme_noptrace', 'rb') as f:
    data = bytearray(f.read())

start_addr = 0x00404aa6
end_addr = 0x00404ad8
offset = 0x400000

new_instr = [
    0x4c, 0x89, 0xe9,
    0x48, 0x33, 0x0d, 0xb8, 0x88, 0x1b, 0x00
]

l = end_addr-start_addr
data[start_addr-offset:start_addr-offset+l] = new_instr + [0x90]*(l - len(new_instr))

with open('debugme_solution', 'wb') as f:
    f.write(data)