Reverse engineering tool for virtualization wrappers
Python C++
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
doc Specify that lxml isn't optional Aug 9, 2013
example Initial commit Jun 28, 2013
.gitignore Initial commit Jun 28, 2013 Specify that lxml isn't optional Aug 9, 2013 Initial commit Jun 28, 2013 Initial commit Jun 28, 2013


Reverse engineering tool for virtualization wrappers


The Virtual Deobfuscator was developed as part of the DARPA Cyber Fast Track program. The goal was to create a tool that could remove virtual machine (VM) based protections from malware. I developed a prototype version that looks very promising. Virtual machine protections are a relatively new form of obfuscation. They work by translating sections of a binary’s original machine code into bytecode for a custom VM. This transformation is destructive — the original binary is lost. The VM itself is embedded in the protected binary. It is used at runtime to interpret the instructions that were converted to bytecode. The goal of the Virtual Deobfuscator is to analyze a runtrace and filter out the VM processing instructions, leaving a reverse engineer with a bytecode version of the original binary. It doesn’t need to be tailored to the particular VM being analyzed, and so far it’s worked on all the VM interpreters I have tested it on.

Quick Start

First make an output directory; call it output and make it in the main VirtualDeobfuscator source directory (neither of these are requirements). Chdir into it and run the following command:

python ../ -i ../example/olly_loop_eax.txt -d 1 -t verify.txt

Three files will be generated: vd.xml, vd_IR.txt, and verify.txt. The first one is the converted trace database; this is what gets used for clustering. Perform clustering with the command:

python ../ -c -d 1

This command will generate a lot of files. The one you really care about is called final_assembly.txt. You can find out more about the contents of this file, as well as more details about the Virtual Deobfuscator, in doc/WhitePaper.docx.

How it works

The Virtual Deobfuscator is based on pattern matching. It will analyze a runtrace and match patterns of instructions called clusters. This process continues recursively until no more instructions or clusters can be grouped into larger clusters. The remaining unclustered instructions contain the interpreted bytecodes; they are the instructions actually executed by the VM as it processed bytecodes. Since protection VMs generally use RISC-based architectures, their instruction sets are simpler. This means that most instructions from the original program are represented by multiple bytecodes. The post-clustering instruction trace, then, contains a lot more instructions than the original binary did. To clean it up, I run the instructions through a peephole optimizer to remove redundant instructions and get something closer to the original.


The Virtual Deobfuscator’s parser can handle traces from three popular debugging tools: WinDbg, OllyDbg, and Immunity Debugger. It can easily be extended to work with traces generated by other tools. The parser converts traces to a normalized XML format for later processing, so tool developers can also modify their tools output directly to that format. There’s no DTD for our XML format, but it’s extremely simple

Binary repackaging

The repackaging step uses the output of the clustering process to create a binary fragment containing the original x86 program code without the VM. This allows for further analysis in disassemblers such as IDA Pro. I generate the binary by assembling the “sections” of assembly code created by the Virtual Deobfuscator. This code is stored in the file final_assembly_nasm.asm. I assemble it using the Netwide Assembler (NASM).

Peephole optimization

Once the runtrace has been reduced to just the bytecode instructions and packaged as a binary, we run the code through a peephole optimizer, implemented as an IDA Pro Python script. This will take care of any remaining redundancy in the code (remember, the bytecodes are for a RISC machine, so there will be redundancy compared to the original CISC instructions). This step also help remove simple obfuscations.


Jason Raber
HexEffect, LLC