Skip to content
master
Switch branches/tags
Code

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

binsequencer (rid?l?r)

[+] INTRO [+]

Binsequencer is intended to scan a corpus of similar malware (family/campaign/like-tools) and build a YARA rule that will detect similar sections of code.

Specifically, each file will be analyzed and have their data abstracted into sequences of x86 instruction sets. These sets are then used in a sliding window to find commonality across the entire sample corpus. Upon finding an acceptable match, the application will attempt various methods of techniques to create a YARA match moving most specific to least. In the least specific matching, it will convert the matched instruction sets into a series of x86 opcodes, surrounded by wildcards, for usage in a YARA rule.

There are a couple of options to adjust the minimum length of the instruction set, but 25 has proven to be fairly reliable while testing samples. If you go too low, you'll start matching more samples that may be unrelated. You can also choose how many matches you want to use for your YARA rule and the application will attempt to find unique instruction sets. Additionally, while the script is intended to be run on x86 PE files, you can instruct it to run on non-PE (JAR/PDF/etc) files or just individual files (shellcode). Results may vary significantly if it fails opcode matching as the bytes may not actually be opcodes - YMMV.

Note that a match does not imply maliciousness and a match does not imply it's relevant to your samples (could be shared code, similar programing style, common compiler, or just the same bytes).

usage: binsequencer.py [-h] [-c <integer_percent>] [-m <integer>]
                       [-l <integer>] [-v] [-a {x86,x64}] [-g <file>] [-d]
                       [-Q] [-n] [-o] [-s]
                       ...

Sequence a set of binaries to identify commonalities in code structure.

positional arguments:
  file

optional arguments:
  -h, --help            show this help message and exit
  -c <integer_percent>, --commonality <integer_percent>
                        Commonality percentage the sets criteria for matches,
                        default 100
  -m <integer>, --matches <integer>
                        Set the minimum number of matches to find, default 1
  -l <integer>, --length <integer>
                        Set the minimum length of the instruction set, default
                        25
  -v, --verbose         Prints data while processing, use only for debugging
  -a {x86,x64}, --arch {x86,x64}
                        Select code architecture of samples, default x86
  -g <file>, --gold <file>
                        Override gold selection
  -d, --default         Accept default prompt values
  -Q, --quiet           Disable output except for YARA rule
  -n, --nonpe           Process non-PE files (eg PCAP/JAR/PDF/DOC)
  -o, --opcode          Use only the opcode matching technique
  -s, --strings         Include strings in YARA for matched hashes

Quick links to examples:

[+] EXAMPLES [+]

basic_usage
$ python binsequencer.py APT1_Malware/

[+] Extracting instructions and generating sets

	[-]APT1_Malware/0050e14f8e6bca0b2b99708f0659e38f407debec5ab7afc71de48fb104508a60
		.text - 2978 instructions extracted
	[-]APT1_Malware/04a23b3cb2d6361df66ca94a470ffa1017a8e5cd3255ce342219765d7d4619bc
		.text - 2980 instructions extracted
<TRUNCATED>
	[-]APT1_Malware/f737829e9ad9a025945ad9ce803641677ae0fe3abf43b1984a7c8ab994923178
		.text - 4574 instructions extracted
	[-]APT1_Malware/fc2751ff381d75154c76da7a42211509f7cc3fd4b50956e36e53b4f7653534d5
		.text - 1907 instructions extracted

[+] Golden hash (1907 instructions) - APT1_Malware/64a373487c4cc2b8b60687ecc01150b546b18be7069981c5fe5d48075190f1ff

[+] Zeroing in longest mnemonic instruction set in .text

	[-] Matches - 0     Block Size - 1907  Time - 0.00 seconds
	[-] Matches - 0     Block Size - 954   Time - 0.06 seconds
	[-] Matches - 0     Block Size - 478   Time - 0.07 seconds
	[-] Matches - 120   Block Size - 240   Time - 0.11 seconds
	[-] Matches - 1     Block Size - 359   Time - 0.07 seconds
	[-] Matches - 0     Block Size - 418   Time - 0.07 seconds
	[-] Matches - 0     Block Size - 389   Time - 0.07 seconds
	[-] Matches - 0     Block Size - 375   Time - 0.07 seconds
	[-] Matches - 0     Block Size - 368   Time - 0.07 seconds
	[-] Matches - 0     Block Size - 365   Time - 0.07 seconds
	[-] Matches - 0     Block Size - 363   Time - 0.07 seconds
	[-] Matches - 0     Block Size - 362   Time - 0.07 seconds
	[-] Matches - 0     Block Size - 361   Time - 0.07 seconds
	[-] Matches - 0     Block Size - 360   Time - 0.07 seconds
	[-] Matches - 1     Block Size - 359   Time - 0.07 seconds
	[-] Matches - 0     Block Size - 360   Time - 0.08 seconds

	[-] Moving 1 instruction sets to review with a length of 359

    [*] Do you want to display matched instruction set? [Y/N] y

	push|push|push|mov|mov|mov|xor|cdq|idiv|cmp|jne|mov|jmp|cmp|jne|lea|jmp|cmp|lea|je|mov|lea|mov|imul|mov|mov|shr|add|dec|cmp|jge|pop|xor|pop|pop|ret|push|lea|push|push|mov|mov|call|mov|mov|mov|mov|mov|add|shr|rep movsd|mov|and|rep movsb|mov|mov|cmp|mov|jl|mov|mul|shr|lea|sub|mov|mov|add|mov|add|sar|and|mov|mov|mov|mov|mov|and|sar|shl|and|or|mov|mov|mov|mov|sar|and|and|shl|or|mov|mov|mov|and|dec|mov|mov|jne|mov|cmp|jne|mov|mov|lea|sar|and|mov|mov|mov|mov|and|sar|shl|and|or|mov|mov|mov|and|mov|mov|mov|jmp|cmp|jne|mov|mov|sar|and|mov|mov|mov|and|shl|mov|mov|mov|mov|mov|add|push|mov|call|add|mov|pop|pop|pop|pop|pop|ret|nop|nop|nop|nop|nop|nop|nop|nop|nop|nop|push|push|mov|push|mov|or|xor|repne scasb|not|dec|mov|mov|and|jns|dec|or|inc|je|pop|pop|xor|pop|ret|push|push|xor|call|add|test|je|mov|jmp|push|push|call|add|test|je|mov|lea|cdq|and|add|mov|mov|sar|sub|cmp|jge|pop|pop|xor|pop|ret|sub|push|mov|push|xor|xor|call|mov|mov|add|cmp|jl|mov|shr|mov|neg|lea|mov|movsx|push|push|call|mov|sub|shl|mov|movsx|push|push|call|mov|sub|mov|sar|or|shl|mov|mov|movsx|push|push|call|mov|sub|mov|sar|or|shl|mov|mov|movsx|push|push|call|mov|sub|or|mov|mov|add|add|add|dec|mov|jne|mov|cmp|jne|movsx|push|push|call|mov|sub|shl|mov|movsx|push|push|call|mov|sub|mov|sar|or|shl|mov|mov|movsx|push|push|call|mov|sub|sar|or|add|mov|add|jmp|cmp|jne|movsx|push|push|call|mov|sub|shl|mov|movsx|push|push|call|mov|sub|sar|or|add|mov|inc|mov|mov|mov|mov|shr|rep movsd|mov|push|and|rep movsb|call|add|mov|pop|pop|pop|pop|ret|nop|nop|nop|nop|nop

    [*] Do you want to disassemble the underlying bytes? [Y/N] y

	0x10001000:	push       ecx                                      | 51
	0x10001001:	push       ebx                                      | 53
<TRUNCATED>
	0x10001409:	nop                                                 | 90
	0x1000140a:	nop                                                 | 90

    [*] Do you want to display the raw byte blob? [Y/N] y

	

    [*] Do you want to keep this set? [Y/N] y

[+] Keeping 1 mnemonic set using 100 % commonality out of 48 hashes

	[-] Length - 359   Section - .text

[+] Printing offsets of type: longest

	[-] Gold matches

	----------v SET rule0 v----------
	push|push|push|mov|mov|mov|xor|cdq|idiv|cmp|jne|mov|jmp|cmp|jne|lea|jmp|cmp|lea|je|mov|lea|mov|imul|mov|mov|shr|add|dec|cmp|jge|pop|xor|pop|pop|ret|push|lea|push|push|mov|mov|call|mov|mov|mov|mov|mov|add|shr|rep movsd|mov|and|rep movsb|mov|mov|cmp|mov|jl|mov|mul|shr|lea|sub|mov|mov|add|mov|add|sar|and|mov|mov|mov|mov|mov|and|sar|shl|and|or|mov|mov|mov|mov|sar|and|and|shl|or|mov|mov|mov|and|dec|mov|mov|jne|mov|cmp|jne|mov|mov|lea|sar|and|mov|mov|mov|mov|and|sar|shl|and|or|mov|mov|mov|and|mov|mov|mov|jmp|cmp|jne|mov|mov|sar|and|mov|mov|mov|and|shl|mov|mov|mov|mov|mov|add|push|mov|call|add|mov|pop|pop|pop|pop|pop|ret|nop|nop|nop|nop|nop|nop|nop|nop|nop|nop|push|push|mov|push|mov|or|xor|repne scasb|not|dec|mov|mov|and|jns|dec|or|inc|je|pop|pop|xor|pop|ret|push|push|xor|call|add|test|je|mov|jmp|push|push|call|add|test|je|mov|lea|cdq|and|add|mov|mov|sar|sub|cmp|jge|pop|pop|xor|pop|ret|sub|push|mov|push|xor|xor|call|mov|mov|add|cmp|jl|mov|shr|mov|neg|lea|mov|movsx|push|push|call|mov|sub|shl|mov|movsx|push|push|call|mov|sub|mov|sar|or|shl|mov|mov|movsx|push|push|call|mov|sub|mov|sar|or|shl|mov|mov|movsx|push|push|call|mov|sub|or|mov|mov|add|add|add|dec|mov|jne|mov|cmp|jne|movsx|push|push|call|mov|sub|shl|mov|movsx|push|push|call|mov|sub|mov|sar|or|shl|mov|mov|movsx|push|push|call|mov|sub|sar|or|add|mov|add|jmp|cmp|jne|movsx|push|push|call|mov|sub|shl|mov|movsx|push|push|call|mov|sub|sar|or|add|mov|inc|mov|mov|mov|mov|shr|rep movsd|mov|push|and|rep movsb|call|add|mov|pop|pop|pop|pop|ret|nop|nop|nop|nop|nop
	----------^ SET rule0 ^-----------

		APT1_Malware/64a373487c4cc2b8b60687ecc01150b546b18be7069981c5fe5d48075190f1ff                        0x10001000 - 0x1000140a in .text

	[-] Remaining matches

	----------v SET rule0 v----------
		APT1_Malware/0050e14f8e6bca0b2b99708f0659e38f407debec5ab7afc71de48fb104508a60                        0x10001000 - 0x1000140a in .text
		APT1_Malware/04a23b3cb2d6361df66ca94a470ffa1017a8e5cd3255ce342219765d7d4619bc                        0x10001000 - 0x1000140a in .text
<TRUNCATED>
		APT1_Malware/f737829e9ad9a025945ad9ce803641677ae0fe3abf43b1984a7c8ab994923178                        0x10001700 - 0x10001aff in .text
		APT1_Malware/fc2751ff381d75154c76da7a42211509f7cc3fd4b50956e36e53b4f7653534d5                        0x10001000 - 0x1000140a in .text
	----------^ SET rule0 ^-----------

[+] Generating YARA rule for matches off of bytes from gold - APT1_Malware/64a373487c4cc2b8b60687ecc01150b546b18be7069981c5fe5d48075190f1ff

    [*] Do you want to try and morph rule0 for accuracy and attempt to make it VT Retro friendly [Y/N] y

[+] Check 01 - Checking for exact byte match

[+] Check 02 - Checking for optimal opcode match

[+] Check 03 - Dynamically morphing YARA rule0
	[*] Dynamic morphing succeeded

    [*] Do you want to include matched sample names in rule meta? [Y/N] y

    [*] Do you want to include matched byte sequence in rule comments? [Y/N] y

[+] Completed YARA rules

/*

SAMPLES:

APT1_Malware/e9d191e5a9565068627795d74eb6605f3878b6c5655955f72f69dffa5076e495
APT1_Malware/f48db6b5d9d34ead2dc736cd7f8af15b7b6fb3e39fe0baf5eac52e1e3967795c
<TRUNCATED>
APT1_Malware/4f0532e15ced95a1cebc13dd268dcbe7c609d4da237d9e46916678f288d3d9c6
APT1_Malware/383f0d2cbf8914c3ecb23ea82bff38e1c048980806e37d75e3539362d105675c

BYTES:



INFO:

binsequencer.py APT1_Malware/
Match SUCCESS for morphing

*/

rule rule0
    {
        meta:
            description = "Autogenerated by Binsequencer v.1.0.4 from APT1_Malware/64a373487c4cc2b8b60687ecc01150b546b18be7069981c5fe5d48075190f1ff"
            author      = ""
            date        = "2017-11-28"

        strings:
            $rule0_bytes = { 5153568B??????B9????????8B??33??99F7??3B??75??8B??EB??83????75??8D????EB??83????8D????74??8B??????8D????????????B8????????F7??8B??????8B??C1????03??493B??7D??5E33??5B59C3558D????575289??????89??????E8????????8B??8B??????8B??8B??8B??83????C1????F3A58B??83????F3A48B??????8B??????83????C6??????0F8C????????B8????????F7??D1??8D????2B??89??????8B??????83????8A????83????C1????83????89??????8A??????????88??????8A??????8A??????83????C1????C1????83????0B??8A??????????88??????8A??????8A??????C1????83????83????C1????0B??8A??????????88??????8A??????83????4A8A??????????88??????75??8B??????83????75??8B??????8A????8D??????C1????83????8A??????????88????8A????8A??83????C1????C1????83????0B??8A??????????88??????8A??83????8A????????????88??????C6????????EB??83????75??8B??????8A????C1????83????8A??????????88????8A????83????C1????8A??????????88??????B0??88??????88??????83????55C6??????E8????????83????8B??5F5D5E5B59C39090909090909090909053568B??????578B??83????33??F2AEF7??498B??8B??25????????79??4883????4074??5F5E33??5BC368????????5633??(E8|FF)[4-6]83????85??74??BB????????EB??6A??56(E8|FF)[4-6]83????85??74??BB????????8D????9983????03??8B??8B??????C1????2B??3B??7D??5F5E33??5BC32B??5589??????5033??33??E8????????8B??8B??????83????83????0F8C????????8B??C1????89??????F7??8D????89??????0FBE????5068????????(E8|FF)[4-6]B9????????2A??C0????88????0FBE??????5251(E8|FF)[4-6]8A????2D????????8B??C1????0A??C0????88????88??????0FBE??????5268????????(E8|FF)[4-6]8A??????2D????????8B??C1????0A??C0????88??????88??????0FBE??????5268????????(E8|FF)[4-6]8A??????2D????????0A??8B??????88??????83????83????83????4889??????0F85????????8B??????83????75??0FBE????5068????????(E8|FF)[4-6]B9????????2A??C0????88????0FBE??????5251(E8|FF)[4-6]8A????2D????????8B??C1????0A??C0????88????88??????0FBE??????5268????????(E8|FF)[4-6]8A??????2D????????C1????0A??83????88??????83????EB??83????75??0FBE????5068????????(E8|FF)[4-6]B9????????2A??C0????88????0FBE??????5251(E8|FF)[4-6]8A????2D????????C1????0A??83????88????438B??????8B??8B??8B??C1????F3A58B??5583????F3A4E8????????83????8B??5D5F5E5BC39090909090 }

        condition:
            all of them
}
non_pe
$ python binsequencer.py -n -d -s ShellCode/cobaltstrike.bin

[+] Extracting instructions and generating sets

	[-]ShellCode/cobaltstrike.bin
		nonpefile - 524 instructions extracted

[+] Golden hash (524 instructions) - ShellCode/cobaltstrike.bin

[+] Zeroing in longest mnemonic instruction set in nonpefile

	[-] Moving 1 instruction sets to review with a length of 524

[+] Keeping 1 mnemonic set using 100 % commonality out of 1 hashes

	[-] Length - 524   Section - nonpefile

[+] Printing offsets of type: longest

	[-] Gold matches

	----------v SET rule0 v----------
	jmp|int3|int3|int3|dec|mov|dec|mov|push|dec|sub|dec|mov|mov|dec|mov|dec|mov|dec|mov|dec|test|je|inc|movups|dec|arpl|xor|dec|mov|movdqu|inc|mov|test|je|dec|mov|dec|shr|inc|movzx|inc|test|je|dec|mov|inc|mov|movsx|ror|cmp|jl|add|add|dec|inc|dec|sub|jne|dec|lea|xor|inc|mov|dec|add|inc|cmp|jbe|mov|inc|xor|dec|add|dec|lea|movsx|dec|inc|inc|ror|inc|add|cmp|jne|inc|lea|cmp|je|inc|inc|cmp|jb|jmp|inc|mov|add|dec|add|movzx|inc|mov|dec|add|mov|dec|add|jmp|xor|dec|mov|dec|mov|dec|add|pop|ret|int3|int3|int3|inc|mov|dec|mov|mov|push|push|push|push|inc|push|inc|push|inc|push|inc|push|dec|sub|dec|mov|inc|mov|mov|inc|mov|call|mov|dec|mov|call|mov|dec|mov|call|mov|dec|mov|call|dec|arpl|xor|dec|add|dec|mov|inc|mov|dec|mov|inc|lea|mov|call|inc|mov|dec|mov|dec|mov|inc|mov|dec|test|je|dec|mov|dec|sub|mov|mov|dec|add|dec|sub|jne|inc|movzx|movzx|dec|test|je|dec|lea|dec|add|mov|dec|sub|inc|mov|dec|add|inc|mov|dec|add|dec|test|je|inc|mov|dec|add|mov|dec|add|dec|sub|jne|dec|add|dec|test|jne|mov|dec|add|mov|test|je|dec|mov|mov|dec|add|inc|call|inc|mov|dec|mov|inc|mov|dec|add|dec|add|jmp|dec|cmp|jge|dec|arpl|inc|movzx|inc|mov|inc|mov|inc|mov|dec|sub|dec|add|mov|dec|add|jmp|dec|mov|dec|mov|dec|add|dec|add|call|dec|mov|dec|add|dec|add|dec|cmp|jne|mov|dec|add|test|jne|inc|mov|dec|mov|inc|mov|dec|mov|inc|mov|dec|sub|cmp|inc|lea|je|inc|mov|dec|add|inc|mov|test|je|mov|inc|mov|dec|lea|inc|mov|dec|add|dec|sub|dec|shr|je|inc|movzx|dec|sub|movzx|shr|cmp|jne|dec|and|dec|add|jmp|cmp|jne|dec|and|inc|add|jmp|cmp|jne|dec|and|dec|mov|dec|shr|add|jmp|inc|cmp|jne|dec|and|inc|add|dec|add|dec|test|jne|inc|mov|dec|add|inc|mov|test|jne|mov|inc|xor|xor|dec|or|dec|add|inc|call|dec|mov|mov|dec|mov|call|inc|test|je|cmp|je|mov|dec|add|inc|mov|inc|test|je|cmp|je|inc|mov|xor|inc|mov|dec|add|dec|add|inc|test|je|inc|mov|dec|add|xor|inc|movsx|dec|add|ror|add|inc|cmp|jne|inc|cmp|je|add|dec|add|dec|add|inc|cmp|jb|jmp|inc|movzx|cmp|je|mov|dec|mov|shl|dec|cwde|dec|add|inc|mov|inc|mov|dec|add|inc|call|dec|mov|dec|add|inc|pop|inc|pop|inc|pop|inc|pop|pop|pop|pop|pop|ret|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|int3|push|dec|mov|dec|and|dec|sub|call|dec|mov|pop|ret
	----------^ SET rule0 ^-----------

		ShellCode/cobaltstrike.bin                                                                           0x10000000 - 0x10000435 in nonpefile

	[-] Remaining matches

	----------v SET rule0 v----------
	----------^ SET rule0 ^-----------

[+] Generating YARA rule for matches off of bytes from gold - ShellCode/cobaltstrike.bin

[+] Check 01 - Checking for exact byte match
	[*] Exact byte match found across all samples

[+] Completed YARA rules

/*

SAMPLES:

ShellCode/cobaltstrike.bin

BYTES:



INFO:

binsequencer.py -n -d -s ShellCode/cobaltstrike.bin
Match SUCCESS for morphing

*/

rule rule0
    {
        meta:
            description = "Autogenerated by Binsequencer v.1.0.4 from ShellCode/cobaltstrike.bin"
            author      = ""
            date        = "2017-11-28"

        strings:
            $rule0_bytes = {}

            $string_0 = { 41584963403C33 } // AXIc@<3
            $string_1 = { 5C242048 } // \$ H
            $string_2 = { 74242848 } // t$(H
            $string_3 = { 4C24204C } // L$ L
            $string_4 = { 53555657415441554156415748 } // SUVWATAUAVAWH
            $string_5 = { 4863753C33 } // Hcu<3
            $string_6 = { 44242041 } // D$ A
            $string_7 = { 7D29496344243C41 } // })IcD$<A
            $string_8 = { 7C242044 } // |$ D
            $string_9 = { 4C2B5630 } // L+V0
            $string_10 = { 5E284533 } // ^(E3
            $string_11 = { 38415F415E415D415C5F5E5D5B } // 8A_A^A]A\_^][

        condition:
            all of them
}

[+] CHANGE LOG [+]

v1.1.0 - 13MAY2020

  • Added prefix try/catch when iterating over sections with bytes that don't map to x86

v1.0.9 - 19DEC2019

  • Non-PE files will now disable SKIPDATA in Capstone when run to prevent disassembly issues

v1.0.8 - 05SEP2019

  • Converted script to Python3.

v1.0.7 - 09JUL2019

  • Added some handling for YARA compilation issues - should remove bits of the rule till match occurs again

v1.0.4 - 30NOV2017

  • Added ability to run on non-PE files.
  • Added ability to run on single, individual files. Included option to force opcode technique since single files would always byte match.
  • Cleaned up code significantly.
  • Added "string" matching as well.
  • Removed list as input so you specify a directory or file now.
  • Modified how some of dynamic morphing works to be more precise - added INC, DEC, POP, and JMP as additional variants.

v1.0.3 - 04OCT2017

  • Added two new methods for the byte matching - exact byte and same-length bytes.

v1.0.2 - 10JUN2016

  • Full null-byte matches will now automatically be blacklisted.
  • Minimum match will not exit now if there are kept matches.

v1.0.1 - 31MAY2016

  • Added support for x64 code architecture. It will still work with x86 but the assembly won't be accurate. -a
  • Added ability to accept default values for prompts. -d
  • Added ability to override gold hash selection. -g
  • Fleshed out multi-section support and logic.
  • Added logic to hunt multiple matches across sections.
  • Modified the way it identifies sections for analysis (uses header information along with execute permission bit).

v1.0.0 - 20MAY2016

  • Initial release.

About

BinSequencer is a script designed to find a common pattern of bytes within a set of samples and generate a YARA rule from the identified pattern.

Resources

Releases

No releases published

Packages

No packages published

Languages