



#### The Next Generation of Virtualization-based Obfuscators

Tim Blazytko

❤ @mr\_phrazer

synthesis.to

Moritz Schloegel

**Ƴ** ∂m\_u00d8

mschloegel.me

#### **About Us**

- Tim Blazytko
  - · Chief Scientist, co-founder of emproof
  - designs software protections for embedded devices
  - trainer for (de)obfuscation and reverse engineering techniques



- Moritz Schloegel
  - · last-year PhD student at CISPA Helmholtz Center
  - working with bugs by day (mostly fuzzing)
  - · code deobfuscation by night



# Setting the Scene

- VM-based obfuscation
- Attacks on VMs

→ Next-Gen

#### Motivation

Prevent Complicate reverse engineering attempts.

- intellectual property
- malicious payloads
- Digital Rights Management

# Virtualization-based Obfuscation

```
mov ecx, [esp+4]
xor eax, eax
mov ebx, 1
__secret_ip:
  mov edx, eax
  add edx, ebx
  mov eax, ebx
  mov ebx, edx
  loop __secret_ip
mov eax, ebx
ret
```

```
mov ecx, [esp+4]
xor eax, eax
mov ebx, 1
__secret_ip:
  mov edx, eax
  add edx, ebx
  mov eax, ebx
  mov ebx, edx
 loop __secret_ip
mov eax, ebx
ret
```

```
mov ecx, [esp+4]
xor eax, eax
mov ebx, 1
__secret_ip:
  mov edx, eax
  add edx, ebx
  mov eax, ebx
      ebx, edx
  loop __secret_ip
nov eax, ebx
ret
```

```
mov ecx, [esp+4]
xor eax, eax
mov ebx, 1
__secret_ip:
  mov edx, eax
  add
      edx, ebx
  mov eax, ebx
       ebx, edx
  loop __secret_ip
Nov eax, ebx
ret
```

#### made-up instruction set



```
mov ecx, [esp+4]
xor eax, eax
mov ebx, 1
__secret_ip:
 push __bytecode
 call vm_entry
mov eax, ebx
ret
```

11**11** 

#### made-up instruction set

```
__bytecode:
db 54 68 69 73 20 64 6f
db 65 73 6e 27 74 20 6c
db 6f 6f 6b 20 6c 69 6b
db 65 20 61 6e 79 74 68
db 69 6e 67 20 74 6f 20
db 6d 65 2e de ad be ef
```

```
mov ecx, [esp+4]
xor eax, eax
mov ebx, 1
__secret_ip:
  push __bytecode
  call vm_entry
mov eax, ebx
ret
```

#### made-up instruction set



#### **Core Components**

VM Entry/Exit Context Switch: native context ⇔ virtual context

VM Dispatcher Fetch-Decode-Execute loop

Handler Table Individual VM ISA instruction semantics

- Entry Copy native context (registers, flags) to VM context.
- Exit Copy VM context back to native context.
- Mapping from native to virtual registers is often 1:1.

#### **Core Components**

VM Entry/Exit Context Switch: native context ⇔ virtual context

VM Dispatcher Fetch-Decode-Execute loop

Handler Table Individual VM ISA instruction semantics

- 1. Fetch and decode instruction
- 2. Forward virtual instruction pointer
- 3. Look up handler for opcode in handler table
- 4. Invoke handler



#### **Core Components**

VM Entry/Exit Context Switch: native context ⇔ virtual context

VM Dispatcher Fetch-Decode-Execute loop

Handler Table Individual VM ISA instruction semantics

- Table of function pointers indexed by opcode
- One handler per virtual instruction
- Each handler decodes operands and updates VM context







```
__vm_dispatcher:
  mov  bl, [rsi]
  inc   rsi
  movzx rax, bl
  jmp    __handler_table[rax * 8]
```

VM Dispatcher

```
rsi - virtual instruction pointer
rbp - VM context
```

```
__vm_dispatcher:
  mov   bl, [rsi]
  inc   rsi
  movzx rax, bl
  jmp   __handler_table[rax * 8]
```

VM Dispatcher

```
rsi - virtual instruction pointerrbp - VM context
```

```
handle vnor:
 mov rcx, [rbp]
 mov rbx. [rbp + 4]
 not rcx
 not rbx
 and rcx, rbx
 mov [rbp + 4], rcx
 pushf
 pop [rbp]
 jmp vm dispatcher
```

Handler performing **nor** (with flag side-effects)

# Breaking Virtual Machine Obfuscation I

How to reconstruct the original code?

## Breaking Virtual Machine Obfuscation I

#### How to reconstruct the original code?

- 1. understand VM architecture/context
- 2. reverse engineer handler semantics
- 3. write a disassembler for the bytecode
- 4. reconstruct VM control flow
- 5. reconstruct high-level code

| opcode | register | register |
|--------|----------|----------|
|--------|----------|----------|

| opcode register register |  |
|--------------------------|--|
|--------------------------|--|

0a 01 02 0b 01 05



0a 01 02 0b 01 05

add



0a 01 02 0b 01 05

add r1

opcode register register

0a 01 02 0b 01 05

add r1, r2



0a 01 02 0b 01 05

add r1, r2 mul

opcode register register

0a 01 02 0b 01 05

add r1, r2 mul <mark>r1</mark>

opcode register register

0a 01 02 0b 01 05

add r1, r2 mul r1, r5

| opcode register register |
|--------------------------|
|--------------------------|

0a 01 02 0b 01 05

add r1, r2 mul r1, r5

| opcode | register | register |  |
|--------|----------|----------|--|
|--------|----------|----------|--|

0a 01 02 0b 01 05

VM computes (r1 + r2) \* r5.

Virtual Machine Hardening

**Hardening Technique #1** – Obfuscating individual VM components.

· Handlers are conceptually simple.

#### **Hardening Technique #1** – Obfuscating individual VM components.

- · Handlers are conceptually simple.
- · Apply traditional code obfuscation transformations:
  - · Substitution (mov rax, rbx → push rbx; pop rax)
  - Opaque Predicates
  - · Junk Code
  - ...

```
mov eax, dword [rbp]
mov ecx, dword [rbp+4]
cmp r11w, r13w
sub rbp. 4
not eax
clc
cmc
cmp rdx, 0x28b105fa
not ecx
cmp r12b. r9b
```

Hardening Technique #2 - Duplicating VM handlers.

• Handler table is typically indexed using one byte (= 256 entries).

Hardening Technique #2 - Duplicating VM handlers.

- Handler table is typically indexed using one byte (= 256 entries).
- · Idea: Duplicate existing handlers to populate full table.
- Use traditional obfuscation techniques to impede code similarity analyses.

**Goal:** Increase workload of reverse engineer.

# handle\_vpush handle\_vadd handle\_vnor handle\_vpop

handle\_vpush handle\_vadd handle\_vnor handle vadd' handle vpop handle vnor handle\_vnor' handle\_vadd''

handle\_vpush handle vadd handle\_vnor'' handle\_vpop

Hardening Technique #3 – No central VM dispatcher.

- A central VM dispatcher allows attacker to easily observe VM execution.
- Idea: Instead of branching to the central dispatcher, *inline* it into each handler.

Goal: No "single point of failure".

(Themida, VMProtect Demo)





R. Morris Editor

# Threaded Code

James R. Bell Digital Equipment Corporation

The concept of "threaded code" is presented as an alternative to machine language code. Hardware and software realizations of it are given. In software it is realized as interpretive code not needing an interpreter. Extensions and optimizations are mentioned.

Key Words and Phrases: interpreter, machine code, time tradeoff, space tradeoff, compiled code, subroutine calls, threaded code

CR Categories: 4.12, 4.13, 6.33



Hardening Technique #4 - No explicit handler table.

· An *explicit* handler table easily reveals all VM handlers.

Hardening Technique #4 - No explicit handler table.

- · An *explicit* handler table easily reveals all VM handlers.
- Idea: Instead of querying an explicit handler table, encode the next handler address in the VM instruction itself.

Goal: Hide location of handlers that have not been executed yet.

(VMProtect Full, SolidShield)

Hardening Technique #4 - No explicit handler table.

· An explicit handler table easily reveals all VM handlers.



Goal: Hide location of handlers that have not been executed yet.

(VMProtect Full, SolidShield)

Hardening Technique #4 - No explicit handler table.

· An *explicit* handler table easily reveals all VM handlers.



Goal: Hide location of handlers that have not been executed yet.

(VMProtect Full, SolidShield)

# Interpretation Techniques\*

#### PAUL KLINT

Mathematical Centre, P.O. Box 4079, 1009AB Amsterdam, The Netherlands

#### SUMMARY

The relative merits of implementing high level programming languages by means of interpretation or compilation are discussed. The properties and the applicability of interpretation techniques known as classical interpretation, direct threaded code and indirect threaded code are described and compared.

KEY WORDS Interpretation versus compilation Interpretation techniques Instruction encoding Code generation Direct threaded code Indirect threaded code.

**Hardening Technique #5** – Blinding VM bytecode.

• Global analyses on the bytecode possible, easy to patch instructions.

#### Hardening Technique #5 - Blinding VM bytecode.

- · Global analyses on the bytecode possible, easy to patch instructions.
- · Idea:
- Flow-sensitive instruction decoding ("decryption" based on key register).
- · Custom decryption routine per handler, diversification.
- · Patching requires re-encryption of subsequent bytecode.

Goal: Hinder global analyses of bytecode and patching.

operand 
$$\leftarrow [vIP + 0]$$

context 
$$\leftarrow$$
 semantics(context, operand)  
next\_handler  $\leftarrow$  [vIP + 4]

operand  $\leftarrow [vIP + 0]$ operand  $\leftarrow$  unmangle(operand, key)  $\leftarrow$  unmangle'(key, operand) kev  $\leftarrow$  semantics(context, operand) context next handler  $\leftarrow [VIP + 4]$ next handler ← unmangle"(next\_handler, key) kev ← unmangle'''(**key**, next\_handler)  $vIP \leftarrow vIP + 8$ imp next\_handler

#### Breaking Virtual Machine Obfuscation II

#### How to deal with hardened VMs?

- locate VM entry and bytecode
- · simplify handlers with program analyses techniques
- $\boldsymbol{\cdot}$  write a  $control\text{-}flow\ sensitive\ disassembler^1$  and reconstruct high-level code

https://synthesis.to/2021/10/21/vm\_based\_obfuscation.html

**Automated Attacks on VMs** 

# Instruction Removal

```
mov eax, 0xdead
mov eax, 0x1234
not eax
push eax
mov eax, 0x5678
mov ecx, ecx
add eax, 0x1111
add ecx, 0x0
mov edx, eax
pop eax
not eax
ret
```

```
mov eax, Oxdead
mov eax, 0x1234
not eax
push eax
mov eax, 0x5678
mov ecx, ecx
add eax, 0x1111
add ecx, 0x0
mov edx, eax
pop eax
not eax
ret
```

```
×
mov eax, 0x1234
not eax
push eax
mov eax, 0x5678
X
add eax, 0x1111
×
mov edx, eax
pop eax
not eax
ret
```

```
×
    mov eax. 0x1234
    not eax
    push eax
    mov eax, 0x5678
Dead Code Elimination
    ×
    mov edx, eax
    pop eax
    not eax
```

ret

```
×
mov eax, 0x1234
not eax
push eax
mov eax, 0x5678
X
add eax, 0x1111
×
mov edx, eax
pop eax
not eax
ret
```

```
×
mov eax, 0x1234
not eax
push eax
mov eax, 0x5678
X
add eax, 0x1111
×
mov edx, eax
pop eax
not eax
ret
```

```
×
mov eax, 0x1234
not eax
push eax
×
×
mov eax, 0x6789
×
mov edx, eax
pop eax
not eax
ret
```

```
×
mov eax. 0x1234
not eax
push eax
Constant Folding
×
mov edx, eax
pop eax
not eax
ret
```

```
×
mov eax, 0x1234
not eax
push eax
×
×
mov eax, 0x6789
×
mov edx, eax
pop eax
not eax
ret
```

```
×
mov eax, 0x1234
not eax
push eax
×
×
mov eax, 0x6789
×
mov edx, eax
pop eax
not eax
ret
```

```
X
mov eax, 0x1234
not eax
push eax
×
×
×
X
mov edx, 0x6789
pop eax
not eax
ret
```

```
\times
   mov eax. 0x1234
   not eax
   push eax
Constant Propagation
   ×
   mov edx, 0x6789
   pop eax
   not eax
```

ret

```
X
mov eax, 0x1234
not eax
push eax
×
×
X
X
mov edx, 0x6789
pop eax
not eax
ret
```

```
X
mov eax, 0x1234
not eax
push eax
×
×
X
X
mov edx, 0x6789
pop eax
not eax
ret
```

```
X
mov eax, 0x1234
not eax
×
X
×
X
X
mov edx, 0x6789
X
not eax
ret
```

```
X
mov eax, 0x1234
not eax
×
X
×
X
X
mov edx, 0x6789
X
not eax
ret
```

```
X
mov eax, 0x1234
X
X
X
×
X
X
mov edx, 0x6789
X
X
ret
```

```
X
    mov eax, 0x1234
     ×
Peephole Optimization
     ×
    mov edx, 0x6789
     \times
     \times
     ret
```

```
X
mov eax, 0x1234
X
X
X
×
X
X
mov edx, 0x6789
X
X
ret
```







#### **SATURN**

#### Software Deobfuscation Framework Based on LLVM

Peter Garba\* Thales, DIS - Cybersecurity Munich, Germany peter.garba@thalesgroup.com Matteo Favaro Zimperium, Mobile Security Noale, Italy matteo.favaro@reversing.software



## Symbolic Execution

```
handle vnor:
mov rcx, [rbp]
mov rbx, [rbp + 4]
not rcx
not rbx
and rcx, rbx
mov [rbp + 4], rcx
pushf
pop [rbp]
jmp __vm_dispatcher
```

```
handle vnor:
• mov rcx, [rbp]
 mov rbx, [rbp + 4]
 not rcx
 not rbx
 and rcx, rbx
 mov [rbp + 4], rcx
 pushf
 pop [rbp]
 jmp __vm_dispatcher
```

```
rcx ← [rbp]
```

```
handle vnor:
 mov rcx, [rbp]
• mov rbx, [rbp + 4]
 not rcx
 not rbx
 and rcx, rbx
 mov [rbp + 4], rcx
 pushf
 pop [rbp]
 jmp vm dispatcher
```

```
 \begin{array}{lll} \mathsf{rcx} & \leftarrow & [\mathsf{rbp}] \\ \mathsf{rbx} & \leftarrow & [\mathsf{rbp} + 4] \end{array}
```

```
__handle_vnor:
 mov rcx, [rbp]
 mov rbx, [rbp + 4]
• not rcx
 not rbx
 and rcx, rbx
 mov [rbp + 4], rcx
 pushf
 pop [rbp]
 jmp vm dispatcher
```

```
__handle_vnor:
 mov rcx, [rbp]
 mov rbx, [rbp + 4]
 not rcx
• not rbx
 and rcx, rbx
 mov [rbp + 4], rcx
 pushf
 pop [rbp]
 jmp vm dispatcher
```

```
rcx \leftarrow [rbp]

rbx \leftarrow [rbp + 4]

rcx \leftarrow \neg rcx = \neg [rbp]

rbx \leftarrow \neg rbx = \neg [rbp + 4]
```

```
__handle_vnor:
 mov rcx. [rbp]
 mov rbx, [rbp + 4]
 not rcx
 not rbx
• and rcx, rbx
 mov [rbp + 4], rcx
 pushf
 pop [rbp]
 jmp vm dispatcher
```

(with flag side-effects)

```
jmp __vm_dispatcher
Handler performing nor
```

```
rcx \leftarrow [rbp]

rbx \leftarrow [rbp + 4]

rcx \leftarrow \neg rcx = \neg [rbp]

rbx \leftarrow \neg rbx = \neg [rbp + 4]

rcx \leftarrow rcx \land rbx

= (\neg [rbp]) \land (\neg [rbp + 4])
```

```
__handle_vnor:
 mov rcx. [rbp]
 mov rbx. [rbp + 4]
 not rcx
 not rbx
• and rcx, rbx
 mov [rbp + 4]. rcx
 pushf
 pop [rbp]
  jmp __vm_dispatcher
```

```
rcx \leftarrow [rbp]

rbx \leftarrow [rbp + 4]

rcx \leftarrow ¬rcx = ¬[rbp]

rbx \leftarrow ¬rbx = ¬[rbp + 4]

rcx \leftarrow rcx \wedge rbx

= (¬[rbp]) \wedge (¬[rbp + 4])

= [rbp] \downarrow [rbp + 4]
```

```
__handle_vnor:
 mov rcx. [rbp]
 mov rbx. [rbp + 4]
 not rcx
 not rbx
 and rcx, rbx
• mov [rbp + 4], rcx
  pushf
  pop [rbp]
  jmp vm dispatcher
```

```
rcx \leftarrow [rbp]
rbx \leftarrow [rbp + 4]
rcx \leftarrow \neg rcx = \neg [rbp]
rbx \leftarrow \neg rbx = \neg [rbp + 4]
rcx \leftarrow rcx \land rbx
= (\neg [rbp]) \land (\neg [rbp + 4])
= [rbp] \downarrow [rbp + 4]
[rbp + 4] \leftarrow rcx = [rbp] \downarrow [rbp + 4]
```

```
handle vnor:
 mov rcx. [rbp]
 mov rbx, [rbp + 4]
 not rcx
 not rbx
 and rcx, rbx
     [rbp + 4]. rcx
 mov
pushf
 pop [rbp]
 jmp vm dispatcher
```

```
__handle_vnor:
 mov rcx, [rbp]
 mov rbx. [rbp + 4]
 not rcx
 not rbx
 and rcx, rbx
     [rbp + 4]. rcx
 mov
  pushf
• pop [rbp]
  jmp vm dispatcher
```

$$rcx \leftarrow [rbp]$$

$$rbx \leftarrow [rbp+4]$$

$$rcx \leftarrow \neg rcx = \neg [rbp]$$

$$rbx \leftarrow \neg rbx = \neg [rbp+4]$$

$$rcx \leftarrow rcx \land rbx$$

$$= (\neg [rbp]) \land (\neg [rbp+4])$$

$$= [rbp] \downarrow [rbp+4]$$

$$[rbp+4] \leftarrow rcx = [rbp] \downarrow [rbp+4]$$

$$rsp \leftarrow rsp-4$$

$$[rsp] \leftarrow flags$$

$$[rbp] \leftarrow [rsp] = flags$$

$$rsp \leftarrow rsp+4$$

```
__handle_vnor:
 mov rcx, [rbp]
 mov rbx. [rbp + 4]
 not rcx
 not rbx
 and rcx, rbx
     [rbp + 4]. rcx
 mov
 pushf
  pop [rbp]
• jmp vm dispatcher
```

$$rcx \leftarrow [rbp]$$

$$rbx \leftarrow [rbp+4]$$

$$rcx \leftarrow \neg rcx = \neg [rbp]$$

$$rbx \leftarrow \neg rbx = \neg [rbp+4]$$

$$rcx \leftarrow rcx \land rbx$$

$$= (\neg [rbp]) \land (\neg [rbp+4])$$

$$= [rbp] \downarrow [rbp+4]$$

$$[rbp+4] \leftarrow rcx = [rbp] \downarrow [rbp+4]$$

$$rsp \leftarrow rsp-4$$

$$[rsp] \leftarrow flags$$

$$[rbp] \leftarrow [rsp] = flags$$

$$rsp \leftarrow rsp+4$$

```
rcx \leftarrow [rbp]
handle_vnor:
                                                       rbx \leftarrow [rbp + 4]
        rcx, [rbp]
 mov
                                                        rcx \leftarrow \neg rcx = \neg [rbp]
        rbx, [rbp + 4]
 mov
                                                       rbx \leftarrow \neg rbx = \neg [rbp + 4]
 not
        rcx
 not
                                  ← ([rbp]
 and
 mov
                                                                     — Linbl 1 Linb 4 41
 pushf
                                                  [rbp + 4] \leftarrow rcx = [rbp] \downarrow [rbp + 4]
      [rbp]
 gog
 jmp vm dispatcher
                                                       rsp \leftarrow rsp - 4
                                                      [rsp] ← flags
                                                      [rbp] \leftarrow [rsp] = flags
Handler performing nor
                                                       rsp \leftarrow rsp + 4
 (with flag side-effects)
```

# Program Synthesis

$$f(x,y,z) := (((x \oplus y) + ((x \land y) \cdot 2)) \lor z) + (((x \oplus y) + ((x \land y) \cdot 2)) \land z)$$

$$f(x,y,z) := (((x \oplus y) + ((x \land y) \cdot 2)) \lor z) + (((x \oplus y) + ((x \land y) \cdot 2)) \land z)$$

$$(1,1,1) \longrightarrow \bigcirc \bigcirc \bigcirc$$

$$f(x,y,z) := (((x \oplus y) + ((x \wedge y) \cdot 2)) \vee z) + (((x \oplus y) + ((x \wedge y) \cdot 2)) \wedge z)$$

$$(1,1,1) \longrightarrow \boxed{?}$$

$$f(x,y,z) := (((x \oplus y) + ((x \land y) \cdot 2)) \lor z) + (((x \oplus y) + ((x \land y) \cdot 2)) \land z)$$

$$(2,3,1) \longrightarrow 6$$

$$f(x,y,z) := (((x \oplus y) + ((x \wedge y) \cdot 2)) \vee z) + (((x \oplus y) + ((x \wedge y) \cdot 2)) \wedge z)$$

$$(2,3,1) \longrightarrow 6 \qquad (1,1,1) \to 3 (2,3,1) \to 6$$

$$f(x,y,z) := (((x \oplus y) + ((x \land y) \cdot 2)) \lor z) + (((x \oplus y) + ((x \land y) \cdot 2)) \land z)$$

$$(0,7,2) \longrightarrow 9 \qquad (1,1,1) \to 3 (2,3,1) \to 6$$

$$f(x,y,z) := (((x \oplus y) + ((x \land y) \cdot 2)) \lor z) + (((x \oplus y) + ((x \land y) \cdot 2)) \land z)$$

$$(0,7,2) \longrightarrow 9 \qquad (1,1,1) \to 3 (2,3,1) \to 6 (0,7,2) \to 9$$

We use f as a black-box:

$$f(x,y,z) := (((x \oplus y) + ((x \wedge y) \cdot 2)) \vee z) + (((x \oplus y) + ((x \wedge y) \cdot 2)) \wedge z)$$

$$(1,1,1) \to 3$$
  
 $(2,3,1) \to 6$ 

$$(0,7,2) \rightarrow 9$$

We **learn** a function h that has the same I/O behavior.

We use f as a black-box:

$$f(x,y,z) := (((x \oplus y) + ((x \land y) \cdot 2)) \lor z) + (((x \oplus y) + ((x \land y) \cdot 2)) \land z)$$

$$h(x,y,z) := x + y + Z \rightarrow 3$$

$$(2,3,1) \rightarrow 6$$

$$(0,7,2) \rightarrow 9$$

We **learn** a function h that has the same I/O behavior.

#### VM ISA

- $\cdot x + y$
- *x* − *y*
- $\cdot x \wedge y$
- $\cdot x \vee y$
- $X \oplus Y$

predictable set of handler semantics

| VM ISA             |  |  |  |  |  |
|--------------------|--|--|--|--|--|
| $\cdot x + y$      |  |  |  |  |  |
| · x - y            |  |  |  |  |  |
| $\cdot x \wedge y$ |  |  |  |  |  |
| $\cdot x \vee y$   |  |  |  |  |  |
| • $x \oplus y$     |  |  |  |  |  |
|                    |  |  |  |  |  |

| Lookup Table |               |    |              |  |  |  |
|--------------|---------------|----|--------------|--|--|--|
| (5,3)        | $\rightarrow$ | 8: | x + y        |  |  |  |
| (5,3)        | $\rightarrow$ | 2: | x - y        |  |  |  |
| (5,3)        | $\rightarrow$ | 1: | $X \wedge y$ |  |  |  |
| (5,3)        | $\rightarrow$ | 7: | $x \vee y$   |  |  |  |
| (5,3)        | $\rightarrow$ | 6: | $X \oplus Y$ |  |  |  |

- predictable set of handler semantics
- pre-computed lookup tables of I/O samples

| VM ISA             |  |  |  |  |  |
|--------------------|--|--|--|--|--|
| $\cdot x + y$      |  |  |  |  |  |
| $\cdot x - y$      |  |  |  |  |  |
| $\cdot x \wedge y$ |  |  |  |  |  |
| $\cdot x \vee y$   |  |  |  |  |  |
| • $x \oplus y$     |  |  |  |  |  |
|                    |  |  |  |  |  |

| Lookup Table |               |    |              |  |  |  |
|--------------|---------------|----|--------------|--|--|--|
| (5,3)        | $\rightarrow$ | 8: | x + y        |  |  |  |
| (5,3)        | $\rightarrow$ | 2: | X - Y        |  |  |  |
| (5,3)        | $\rightarrow$ | 1: | $X \wedge y$ |  |  |  |
| (5,3)        | $\rightarrow$ | 7: | $x \vee y$   |  |  |  |
| (5,3)        | $\rightarrow$ | 6: | $X \oplus Y$ |  |  |  |

- predictable set of handler semantics
- pre-computed lookup tables of I/O samples
- SMT solvers to prove **semantic equivalence**

## Attack Surface

#### Shortcomings of VMs

- predictable instruction semantics with meaningful mnemonics
  - vulnerable to synthesis-based attacks
  - $\cdot \ \ \text{facilitates writing } \textbf{disassemblers}$

#### Shortcomings of VMs

- · predictable instruction semantics with meaningful mnemonics
  - vulnerable to synthesis-based attacks
  - facilitates writing disassemblers
- · VM components are **independent** of each other
  - · isolated analysis possible
  - · obfuscation limited to **local** constructs (e.g., handler level)

#### VM Attack Landscape



#### VM Attack Landscape



### VM Attack Landscape



# Next-Gen VM-based Obfuscators

# Design Goals



**Design Principle #1** – Complex and target-specific instruction sets.

**Design Principle #1** – Complex and target-specific instruction sets.

 $\boldsymbol{\cdot}$  handler semantics are based on instruction sequences from the target program

**Design Principle #1** – Complex and target-specific instruction sets.

- handler semantics are based on instruction sequences from the target program
- · complex handler semantics
  - introduce diversity
  - provide resilience against syntesis-based attacks

**Design Principle #1** – Complex and target-specific instruction sets.

- handler semantics are based on instruction sequences from the target program
- · complex handler semantics
  - introduce diversity
  - provide resilience against syntesis-based attacks
- · can be data-flow dependent

**Design Principle #1** – Complex and target-specific instruction sets.

· handler semantics are based on instruction sequences from the target program

# No meaningful instruction mnemonics for VM disassemblers

- introduce diversity
- provide resilience against syntesis-based attacks
- · can be data-flow dependent

- interlocking of handlers & semantics to enforce a cross-handler analysis
  - · mixed Boolean-Arithmetic encodings across handlers
  - · dataflow-dependent or multi-threaded opaque predicates
  - merged handler semantics

- interlocking of handlers & semantics to enforce a cross-handler analysis
  - · mixed Boolean-Arithmetic encodings across handlers
  - · dataflow-dependent or multi-threaded opaque predicates
  - merged handler semantics
- · analysis **effort rises** enormously

- interlocking of handlers & semantics to enforce a cross-handler analysis
  - Analysis tools reach their limits
  - · dataflow-dependent or multi-threaded opaque predicates
  - merged handler semantics
- · analysis **effort rises** enormously



#### Loki

· academic prototype of next-gen VM

industry shifts towards novel VM designs

 paper at USENIX Sec'22: "Loki: Hardening Code Obfuscation Against Automated Attacks" https://www.usenix.org/system/files/sec22-schloegel.pdf

39

#### **LOKI: Hardening Code Obfuscation Against Automated Attacks**

Moritz Schloegel, Tim Blazytko, Moritz Contag, Cornelius Aschermann Julius Basler, Thorsten Holz, Ali Abbasi

Ruhr-Universität Bochum, Germany

| opcode | register | register |
|--------|----------|----------|
|--------|----------|----------|

0a 01 02 add r1, r2 0b 01 05 mul r1, r5

opcode register register

0a 01 02 0b 01 05 add r1, r2 mul r1, r5

$$f(x,y) := x + y$$
$$g(x,y) := x * y$$

handler can be represented as mathematical functions



| 0a 01 | 02 | add r1, r2   | f(x,y) := x +   |
|-------|----|--------------|-----------------|
| 0b 01 | 05 | mul r1, r5   | g(x,y) := x * y |
| a2 03 | ?? | shl r3, 0xff |                 |



0a 01 02 add r1, r2 
$$f(x,y) := x + y$$
  
0b 01 05 mul r1, r5  $g(x,y) := x * y$   
a2 03 ?? shl r3, 0xff

| opcode | register | register | constant |
|--------|----------|----------|----------|
|--------|----------|----------|----------|

```
0a 01 02 00 add r1, r2 f(x,y) := x + y
0b 01 05 00 mul r1, r5 g(x,y) := x * y
a2 03 ?? ff shl r3, 0xff
```

| opcode | register | register | constant |
|--------|----------|----------|----------|
|--------|----------|----------|----------|

| 0a | 01 | 02 | 00 | add r1, | r2   | $f(x,y,\mathbf{c}) := x + y$ |
|----|----|----|----|---------|------|------------------------------|
| 0b | 01 | 05 | 00 | mul r1, | r5   | g(x, y, c) := x * y          |
| a2 | 03 | ?? | ff | shl r3, | 0xff |                              |

handler can be represented as mathematical functions

| opcode r | egister | register | constant |
|----------|---------|----------|----------|
|----------|---------|----------|----------|

| 0a 01 02 00 | add r1, r2   | f(x,y,c) := x + y  |
|-------------|--------------|--------------------|
| 0b 01 05 00 | mul r1, r5   | g(x,y,c) := x * y  |
| a2 03 ?? ff | shl r3, 0xff | h(x,y,c) := x << c |

| opcode | register | register | constant |
|--------|----------|----------|----------|
|--------|----------|----------|----------|

| 0a 01 02 | 00 | add r1, r2   | f(x,y,c) := x + y  |
|----------|----|--------------|--------------------|
| 0b 01 05 | 00 | mul r1, r5   | g(x,y,c) := x * y  |
| a2 03 ?? | ff | shl r3, 0xff | h(x,y,c) := x << c |

- handler can be represented as mathematical functions
- instruction semantics refer to the handler's actual computation

# Can we do better?

# Merging Instruction Semantics

$$f(x, y, c) := x + y$$

$$g(x,y,c) := x - y << c$$

# **Merging Instruction Semantics**

$$f(x,y,c) := x + y g(x,y,c) := x - y << c$$

$$f(x,y,c,k) := \begin{cases} x + y & \text{if } k == 0 \\ x - y << c & \text{if } k == 1 \end{cases}$$

# **Merging Instruction Semantics**

$$f(x,y,c) := x + y g(x,y,c) := x - y << c$$

$$f(x,y,c,k) := \begin{cases} x + y & \text{if } k == 0 \\ x - y << c & \text{if } k == 1 \end{cases}$$

$$f(x, y, c) := x + y$$
  $g(x, y, c) := x - y << c$ 

# f(x,y,c) := x + y g(x,y,c) := x - y <<Key-dependent instruction semantics

$$f(x,y,c,k) := \begin{cases} x+y & \text{if } k == 0\\ x-y << c & \text{if } k == 1 \end{cases}$$

$$f(x, y, c, k) := \begin{cases} x + y & \text{if } k == 0 \\ x - y << c & \text{if } k == 1 \end{cases}$$

$$f(x,y,c,k) := \begin{cases} x+y & \text{if } k == 0\\ x-y << c & \text{if } k == 1 \end{cases}$$



$$f(x,y,c,k) := \begin{cases} x+y & \text{if } k == 0\\ x-y << c & \text{if } k == 1 \end{cases}$$









# Polynomial Encodings



$$f(x, y, c, k) :=$$
  $(k == 0) \cdot x + y + (k == 1) \cdot x - y << c$ 

$$f(x, y, c, k) :=$$
  $(n \mod k == 0) \cdot x + y + (k^2 == q \mod m) \cdot x - y << c$ 

$$f(x,y,c,k) := \begin{cases} n \mod k == 0 \\ + (k^2 == q \mod m) \\ \cdot x - y << c \end{cases}$$





$$f(x, y, c, k) :=$$
  $(n \mod k == 0)$   $\cdot x + y$   
  $+ (k^2 == q \mod m)$   $\cdot x - y << c$ 

$$f(x,y,c,k) := \begin{cases} n \mod k == 0 \end{cases} \cdot x + y + pf(k) \cdot x - y << c$$

$$f(x,y,c,k) := \begin{cases} (n \mod k == 0) & \cdot & x+y \\ + & pf(k) & \cdot & x-y << c \end{cases}$$

$$pf(k) := ((0xff \land k) \oplus 0xcd) \cdot 0x28cbfbeb9a020a33$$

$$pf(k) := ((0xff \land k) \oplus 0xcd) \cdot 0x28cbfbeb9a020a33$$

$$pf(0x1336) = 1$$

$$pf(k) := ((0xff \land k) \oplus 0xcd) \cdot 0x28cbfbeb9a020a33$$

$$pf(0x1336) = 1$$



$$pf(k) := ((0xff \land k) \oplus 0xcd) \cdot 0x28cbfbeb9a020a33$$

$$pf(0x1336) = 1 \quad pf(0xabcd) = 0$$



$$f(x,y,c,k) := \begin{cases} (n \mod k == 0) & \cdot & x+y \\ + & pf(k) & \cdot & x-y << c \end{cases}$$

$$pf(k) := ((0xff \land k) \oplus 0xcd) \cdot 0x28cbfbeb9a020a33$$

$$pf(0x1336) = 1$$
  $pf(0xabcd) = 0$ 





$$f(x,y,c,k) := \begin{cases} (n \mod k == 0) & \cdot & x+y \\ + & pf(k) & \cdot & x-y << c \end{cases}$$

$$pf(k) := ((0xff \land k) \oplus 0xcd) \cdot 0x28cbfbeb9a020a33$$

$$pf(0x1336) = 1$$
  $pf(0xabcd) = 0$   $pf(0x1000) = 0x20ab58bbaa53a22ad7$ 







$$pf(k) := ((0xff \land k) \oplus 0xcd) \cdot 0x28cbfbeb9a020a33$$

$$pf(0x1336) = 1$$
  $pf(0xabcd) = 0$   $pf(0x1000) = 0x20ab58bbaa53a22ad7$   $pf(0xdead) = 0xf4c7e7859c0c3d320$ 







Partial point functions for key selection

# Point functions subvert I/O sampling

$$pf(0x1336) = 1$$
  $pf(0xabcd) = 0$   $pf(0x1000) = 0x20ab58bbaa53a22ad7$   
 $pf(0xdead) = 0xf4c7e7859c0c3d320$ 







# SMT-hard Key Encodings and Point Functions





$$f(x, y, c, k) := (n_1 \mod k == 0) \cdot x + y + pf(k) \cdot x - y << c$$

$$f(x, y, c, k) :=$$
  $(n_1 \mod k == 0) \cdot x + y + (x + x) + pf(k) \cdot x - y \cdot (x + y)$ 

$$f(x,y,c,k) := \begin{cases} (n_1 \mod k == 0) & \cdot & x+y+(x+x) \\ + & pf(k) & \cdot & x-y\cdot(x+y) \end{cases}$$



mov edx, eax edx.1 := eax

mov ecx, 0x20 ecx.1 := 0x20

add edx, ecx edx.2 := edx.1 + ecx.1

imul edx, 0x10 edx.3 := edx.2 \* 0x10

| mov eax, | eax  | edx.1 := eax           |
|----------|------|------------------------|
| mov ecx, | 0x20 | ecx.1 := 0x20          |
| add edx, | ecx  | edx.2 := edx.1 + ecx.1 |

imul edx, 
$$0x10$$
 edx.3 := edx.2 \*  $0x10$ 

```
mov edx, eax  edx.1 := eax   ecx.1 := 0x20   edx.2 := edx.1 + ecx.1   imul edx, 0x10   edx.3 := edx.2 * 0x10
```

| mov eax, | eax  | edx.1 := eax           |
|----------|------|------------------------|
| mov ecx, | 0x20 | ecx.1 := 0x20          |
| add edx, | ecx  | edx.2 := edx.1 + ecx.1 |

imul edx, 0x10 edx.3 := edx.2 \* 0x10

Recursively replace uses by their definitions

edx.3 := edx.2 \* 0x10

```
mov edx, eax  edx.1 := eax   ecx.1 := 0x20   edx.2 := edx.1 + ecx.1   imul edx, 0x10   edx.3 := edx.2 * 0x10
```

$$edx.3 := edx.2 * 0x10$$

| mov edx, eax   | edx.1 := eax           |
|----------------|------------------------|
| mov ecx, 0x20  | ecx.1 := 0x20          |
| add edx, ecx   | edx.2 := edx.1 + ecx.1 |
| imul edx. 0x10 | edx.3 := edx.2 * 0x10  |

$$edx.3 := edx.2 * 0x10 = (edx.1 + ecx.1) * 0x10$$

$$edx.3 := edx.2 * 0x10 = (edx.1 + ecx.1) * 0x10$$

# Semantically Complex Operations



$$f(x, y, c, k) :=$$
  $(n_1 \mod k == 0) \cdot x + y + (x + x) + pf(k) \cdot x - y \cdot (x + y)$ 

$$f(x, y, c, k) :=$$
  $(n_1 \mod k == 0) \cdot ((x \oplus y) + 2 \cdot (x \wedge y)) + (x \ll 1) + pf(k) \cdot x - y \cdot (x + y)$ 

```
Syntactically complex expressions (x \wedge y)
```

## Mixed Boolean Arithmetic Expressions

$$x - y \cdot (x + y)$$

#### Rewriting rules:

1) 
$$x + y \rightarrow (x \oplus y) + 2 \cdot (x \wedge y)$$

$$2) \quad x \oplus y \to (x \vee y) - (x \wedge y)$$

47) 
$$X \wedge y \rightarrow (\neg X \vee y) - \neg X$$

# Mixed Boolean Arithmetic Expressions

$$x - y \cdot (x + y)$$

#### Rewriting rules:

1) 
$$x + y \rightarrow (x \oplus y) + 2 \cdot (x \wedge y)$$

$$2) \quad x \oplus y \to (x \vee y) - (x \wedge y)$$

$$47) \quad x \wedge y \to (\neg x \vee y) - \neg x$$







#### Rewriting rules:

- 1)  $x + y \rightarrow (x \oplus y) + 2 \cdot (x \wedge y)$ 2)  $x \oplus y \rightarrow (x \vee y) (x \wedge y)$
- $(x \land y) \rightarrow (\neg x \lor y) \neg x$

$$x - y \cdot (x + y)$$

#### Rewriting rules:

1) 
$$x + y \rightarrow (x \oplus y) + 2 \cdot (x \wedge y)$$

$$2) \quad x \oplus y \to (x \vee y) - (x \wedge y)$$

$$47) \quad x \wedge y \to (\neg x \vee y) - \neg x$$

$$x - y \cdot ((x \oplus y) + 2 \cdot (x \wedge y))$$

final expression

# Traditional Approach

$$x - y \cdot (x + y)$$

#### Rewriting rules:

- 1)  $x + y \rightarrow (x \oplus y) + 2 \cdot (x \wedge y)$
- 2)  $x \oplus y \rightarrow (x \lor y) (x \land y)$

$$(47) \quad x \wedge y \to (\neg x \vee y) - \neg x$$

$$x - y \cdot ((x \oplus y) + 2 \cdot (x \wedge y))$$

$$x - y \cdot (x + y)$$

#### Rewriting rules:

1) 
$$x + y \rightarrow (x \oplus y) + 2 \cdot (x \wedge y)$$

2) 
$$x \oplus y \rightarrow (x \lor y) - (x \land y)$$

 $(\neg x \lor y) - \neg x$ 

$$x - y \cdot ((x \oplus y) + 2 \cdot (x \wedge y))$$

$$x - y \cdot (x + y)$$

#### Rewriting rules:

1) 
$$x + y \rightarrow (x \oplus y) + 2 \cdot (x \land y)$$
  
2)  $x \oplus y \rightarrow (x \lor y) - (x \land y)$ 

# Lookup table w/ \*all\* identities

$$x-y\cdot((x\oplus y)+2\cdot(x\wedge y))$$

$$x - y \cdot (x + y)$$

#### Rewriting rules:

1) 
$$x + y \rightarrow (x \oplus y) + 2 \cdot (x \wedge y)$$

$$2) \quad x \oplus y \to (x \vee y) - (x \wedge y)$$

. . .

847,000) 
$$x \wedge y \rightarrow (\neg x \vee y) - \neg x$$

$$x - y \cdot ((x \oplus y) + 2 \cdot (x \wedge y))$$

$$x - y \cdot (x + y)$$

#### Rewriting rules:

1) 
$$x + y \rightarrow (x \oplus y) + 2 \cdot (x \wedge y)$$

$$2) \quad x \oplus y \to (x \vee y) - (x \wedge y)$$

. . .

847,000) 
$$x \wedge y \rightarrow (\neg x \vee y) - \neg x$$

$$x - y \cdot ((x \oplus y) + 2 \cdot (x \wedge y))$$

final expression



#### Rewriting rules:

- 1)  $x + y \rightarrow (x \oplus y) + 2 \cdot (x \wedge y)$
- $2) \quad x \oplus y \to (x \vee y) (x \wedge y)$

847,000) 
$$x \wedge y \rightarrow (\neg x \vee y) - \neg x$$

$$x - y \cdot ((x \oplus y) + 2 \cdot (x \wedge y))$$

#### Rewriting rules:

1) 
$$x + y \rightarrow (x \oplus y) + 2 \cdot (x \wedge y)$$

$$2) \quad x \oplus y \to (x \vee y) - (x \wedge y)$$

847.000) 
$$\times \wedge V \rightarrow (\neg X \vee V) - \neg X$$

$$x - y \cdot ((x \oplus y) + 2 \cdot (x \wedge y))$$

$$x - y \cdot ((x \oplus y) + 2 \cdot (x \wedge y))$$

#### Rewriting rules:

1) 
$$x + y \rightarrow (x \oplus y) + 2 \cdot (x \wedge y)$$

$$2) \quad x \oplus y \to (x \vee y) - (x \wedge y)$$

. . .

847,000) 
$$x \wedge y \rightarrow (\neg x \vee y) - \neg x$$

$$x - y \cdot ((x \oplus y) + 2 \cdot (x \wedge y))$$

$$x - y \cdot ((x \oplus y) + 2 \cdot (x \wedge y))$$
Rewriting rules:
$$1) \quad x + y \to (x \oplus y) + 2 \cdot (x \wedge y)$$

$$2) \quad x \oplus y \to (x \vee y) - (x \wedge y)$$

$$...$$

$$847,000) \quad x \wedge y \to (\neg x \vee y) - \neg x$$

$$x - y \cdot (((x \vee y) - (x \wedge y)) + 2 \cdot (x \wedge y))$$

$$x - y \cdot ((x \oplus y) + 2 \cdot (x \wedge y))$$
Rewriting rules:
$$1) \quad x + y \to (x \oplus y) + 2 \cdot (x \wedge y)$$

$$2) \quad x \oplus y \to (x \vee y) - (x \wedge y)$$

$$...$$

$$847,000) \quad x \wedge y \to (\neg x \vee y) - \neg x$$

$$x - y \cdot (((x \vee y) - (x \wedge y)) + 2 \cdot (x \wedge y))$$

$$x - y \cdot (((x \lor y) - (x \land y)) + 2 \cdot (x \land y))$$
Rewriting rules:
$$1) \quad x + y \to (x \oplus y) + 2 \cdot (x \land y)$$

$$2) \quad x \oplus y \to (x \lor y) - (x \land y)$$

$$...$$

$$847,000) \quad x \land y \to (\neg x \lor y) - \neg x$$

$$x - y \cdot (((x \lor y) - ((\neg x \lor y) - \neg x)) + 2 \cdot (x \land y))$$

$$x - y \cdot (((x \lor y) - (x \land y)) + 2 \cdot (x \land y))$$

#### Rewriting rules:

1) 
$$x + y \rightarrow (x \oplus y) + 2 \cdot (x \wedge y)$$

$$2) \quad x \oplus y \to (x \vee y) - (x \wedge y)$$

. . .

847,000) 
$$x \wedge y \rightarrow (\neg x \vee y) - \neg x$$

$$x - y \cdot (((x \vee y) - ((\neg x \vee y) - \neg x)) + 2 \cdot (x \wedge y))$$

$$x-y\cdot(((x\vee y)-(x\wedge y))+2\cdot(x\wedge y))$$
Rewriting rules:
$$1) \quad x+y\to(x\oplus y)+2\cdot(x\wedge y)$$

$$2) \quad x\oplus y\to(x\vee y)-(x\wedge y)$$
Recursive Rewriting

$$x - y \cdot (((x \lor y) - ((\neg x \lor y) - \neg x)) + 2 \cdot (x \land y))$$

$$x - y \cdot (x + y)$$

$$expr \equiv h^{-1}(h(expr))$$

$$x - y \cdot (x + y)$$

$$expr \equiv h^{-1}(h(expr))$$

$$x - y \cdot (x + y)$$

$$expr \equiv h^{-1}(h(expr))$$

$$x-y\cdot(x+y)$$

#### Rewrite as:

$$expr \equiv h^{-1}(h(expr))$$

$$h: a \mapsto 39a + 23$$
  
 $h^{-1}: a \mapsto 151a + 111$ 

$$x-y\cdot(x+y)$$

#### Rewrite as:

$$expr \equiv h^{-1}(h(expr))$$

### Invertible function on 1 byte:

 $h: a \mapsto 39a + 23$  $h^{-1}: a \mapsto 151a + 111$ 

$$x - y \cdot (x + y)$$

#### Rewrite as:

$$expr \equiv h^{-1}(h(expr))$$

### Invertible function on 1 byte:

 $h: a \mapsto 39a + 23$ 

 $h^{-1}: a \mapsto 151a + 111$ 

$$x-y\cdot(x+y)$$

$$x-y\cdot (h^{-1}(h(x+y)))$$

#### Rewrite as:

$$expr \equiv h^{-1}(h(expr))$$

#### Invertible function on 1 byte:

 $h: a \mapsto 39a + 23$  $h^{-1}: a \mapsto 151a + 111$ 

$$x-y\cdot(x+y)$$

$$x-y\cdot (h^{-1}(h(x+y)))$$

#### Rewrite as:

$$expr \equiv h^{-1}(h(expr))$$

$$h: a \mapsto 39a + 23$$

$$h^{-1}: a \mapsto 151a + 111$$

$$\implies expr \equiv h^{-1}(h(expr)) \mod 2^8$$

$$x-y\cdot(x+y)$$

$$x-y\cdot (h^{-1}(h(x+y)))$$

$$x - y \cdot (h^{-1}(39 \cdot (x + y) + 23))$$

#### Rewrite as:

$$expr \equiv h^{-1}(h(expr))$$

$$h: a \mapsto 39a + 23$$

$$h^{-1}: a \mapsto 151a + 111$$

$$\implies expr \equiv h^{-1}(h(expr)) \mod 2^8$$

$$x-y\cdot(x+y)$$

$$x-y\cdot (h^{-1}(h(x+y)))$$

$$x - y \cdot (h^{-1}(39 \cdot (x + y) + 23))$$

#### Rewrite as:

$$expr \equiv h^{-1}(h(expr))$$

$$h: a \mapsto 39a + 23$$

$$h^{-1}: a \mapsto 151a + 111$$

$$\implies expr \equiv h^{-1}(h(expr)) \mod 2^8$$

$$x - y \cdot (x + y)$$

$$x-y\cdot (h^{-1}(h(x+y)))$$

$$x - y \cdot (h^{-1}(39 \cdot (x + y) + 23))$$

$$x - y \cdot (151 \cdot (39 \cdot (x + y) + 23) + 111)$$

#### Rewrite as:

$$expr \equiv h^{-1}(h(expr))$$

#### Invertible function on 1 byte:

 $h: a \mapsto 39a + 23$ 

 $h^{-1}: a \mapsto 151a + 111$ 



#### Rewrite as:

$$expr \equiv h^{-1}(h(expr))$$

#### Invertible function on 1 byte:

 $h: a \mapsto 39a + 23$ 

$$h^{-1}: a \mapsto 151a + 111$$

# Binary Permutation Polynomial Inversion and Application to Obfuscation Techniques

Lucas Barthelemyabd lbarthelemy@quarkslab.com

Guenaël Renaultbee guenael.renault@upmc.fr Ninon Eyrolles<sup>a</sup> neyrolles@quarkslab.com Raphaël Roblin<sup>bd</sup>

raph.roblin@gmail.com

"Quarkslab, Paris, France
"Sorbonne Universités, UPMC Univ Paris 06, F-75005, Paris, France
"CNRS, UMR 7606, LIP6, F-75005, Paris, France
"UPMC Computer Science Master Department, SFPN Course
"Inria, Paris Center, PolSys Project

# **Syntactically Complex Operations**



Taking it all together

### Loki: Academic Next-Gen VM Prototype

**Design Principle #1** – Complex and target-specific instruction sets.

**Design Principle #2** – Intertwining VM components.

### Loki: Academic Next-Gen VM Prototype

**Design Principle #1** – Complex and target-specific instruction sets.

**Design Principle #2** – Intertwining VM components.

 $\cdot$   $merged\ semantics$  to encforce  $cross\mbox{-handler}$  analysis

**Design Principle #1** – Complex and target-specific instruction sets.

- $\cdot$  merged semantics to encforce cross-handler analysis
- polynomial encodings to interlock instruction semantics

**Design Principle #1** – Complex and target-specific instruction sets.

- $\cdot$  merged semantics to encforce cross-handler analysis
- polynomial encodings to interlock instruction semantics
- point functions to subvert I/O sampling

**Design Principle #1** – Complex and target-specific instruction sets.

- merged semantics to encforce cross-handler analysis
- polynomial encodings to interlock instruction semantics
- point functions to subvert I/O sampling
- · complex, data-flow dependent instruction semantics to thwart program synthesis

**Design Principle #1** – Complex and target-specific instruction sets.

- merged semantics to encforce cross-handler analysis
- polynomial encodings to interlock instruction semantics
- point functions to subvert I/O sampling
- · complex, data-flow dependent instruction semantics to thwart program synthesis
- MBAs to thwart symbolic execution

## Impact on Deobfuscation

# Verging on the Limits

## Challenges in Code Deobfuscation

**Design Principle #1** – Complex and target-specific instruction sets.

- · synthesis-based attacks are no longer feasible
- $\boldsymbol{\cdot}$  no  $\boldsymbol{\mathsf{meaningful}}$  instruction  $\boldsymbol{\mathsf{mnemonics}}$  for disassemblers

vadd vs. vneg\_vadd\_vmul\_vxor\_vpush

## Challenges in Code Deobfuscation

**Design Principle #2** – Intertwining VM components.

· shift towards **global analysis**; larger analysis scope required

 $\boldsymbol{\cdot}$  analysis  $\boldsymbol{\text{effort rises enormously}} :$  limitations of binary analysis techniques & tools

# What needs to be done?

## Better Analysis Tools

• better support for interprocedural & multi-threaded analysis

• improve tooling for large instruction sequences (performance and memory footprint)

· advances in binary lifting

Yes, these are hard problems.

## Selection of Analysis Windows

· identification of relevant sources and sinks

strategies to isolate and simplify (partial) data flows

- automated exploration of control and  $data\ flows\ (CFG/DFG\ construction)$ 

#### Advances in MBA Deobfuscation

- · simplification of large polynomial MBAs
- $\cdot$  improvements on synthesis-based approaches to reach higher semantic depths
- strategies to synthesize constants

 $(x \oplus 0xf5692443e29a24c2) \cdot 0x3886553866f35c17$ 

## Research Catches Up

#### Linear MBAs are dead

## Efficient Deobfuscation of Linear Mixed Boolean-Arithmetic Expressions

Benjamin Reichenwallner & Peter Meerwald-Stadler Denuvo GmbH Salzburg, Austria

## Hex-Rays Decompiler Plugin



https://github.com/HexRaysSA/goomba

#### Novel Methods for Complex MBAs

SECRET CLUB

## Improving MBA Deobfuscation using Equality Saturation



## However ...

## Open Challenges

· analysis tools still insufficient

· selection of analysis windows remains challenging

 $\boldsymbol{\cdot}$  low impact of MBA deobfuscation in practice

## Open Challenges

· analysis tools still insufficient

· selection of analysis windows remains challenging

## Deobfuscation still not feasible

· low impact of MBA deobfuscation in practice

# Conclusion

## **Takeaways**

- 1. current VMs can be broken in a (semi-)automated fashion
- 2. industry shifts to novel VM designs
- 3. code deobfuscation research has to catch up despite recent advancements

### **Takeaways**

- 1. current VMs can be broken in a (semi-)automated fashion
- 2. industry shifts to novel VM designs
- 3. code deobfuscation research has to catch up despite recent advancements

Next-gen VMs will shape the landscape of modern obfuscation in the next years.

### Summary

- · virtualization-based obfuscation
- attacks on VMs (instruction removal, symbolic execution, program synthesis)
- next-gen VMs and their impact on deobfuscation
- · recent advancements in MBA deobfuscation