StringSifter is a machine learning tool that automatically ranks strings based on their relevance for malware analysis.
- Technical Blogpost - Learning to Rank Strings Output for Speedier Malware Analysis
- Announcement Blogpost - Open Sourcing StringSifter
- DerbyCon Talk - StringSifter: Learning to Rank Strings Output for Speedier Malware Analysis
StringSifter requires Python version 3.6 or newer. Run the following commands to get the code, run unit tests, and use the tool:
pip to get running immediately:
pip install stringsifter
Alternatively, you can install an editable version locally using
git clone https://github.com/fireeye/stringsifter.git cd stringsifter pip install -e .
Running Unit Tests
To run unit tests from the StringSifter installation directory:
Running from the Command Line
pip install -e <repo> command installs two runnable scripts
rank_strings into your python environment.
flarestrings mimics features of GNU binutils'
rank_strings accepts piped input, for example:
flarestrings <my_sample> | rank_strings
rank_strings supports a number of command line arguments. The positional argument
input_strings specifies a file of strings to rank. The optional arguments are:
|--scores (-s)||Include the rank scores in the output|
|--limit (-l)||Limit output to the top
|--min-score (-m)||Limit output to strings with score >=
|--batch (-b)||Specify a folder of
Ranked strings are written to standard output unless the
--batch option is specified, causing ranked outputs to be written to files named
flarestrings supports an option
--min-len) to print sequences of characters that are at least
min-len characters long, instead of the default 4. For example:
flarestrings -n 8 <my_sample> | rank_strings
will print and rank only strings of length 8 or greater.
Running from a Docker container
- After cloning the repo, build the container. From the the package's top level directory:
docker build -t stringsifter -f docker/Dockerfile .
- Run the container using the
-vflag to expose a host directory to the container:
docker run -v <my_malware>:/samples -it stringsifter
<my_malware> contains samples for analysis, for example:
docker run -v $HOME/malware/binaries:/samples -it stringsifter
- At the container prompt:
flarestrings /samples/<my_sample> | rank_strings <options>
All command line arguments are supported in the containerized script.
Running on FLOSS Output
StringSifter can be applied to arbitrary lists of strings, making it useful for practitioners looking to glean insights from alternative intelligence-gathering sources such as live memory dumps, sandbox runs, or binaries that contain obfuscated strings. For example, FireEye Labs Obfuscated Strings Solver (FLOSS) extracts printable strings just as Strings does, but additionally reveals obfuscated strings that have been encoded, packed, or manually constructed on the stack. It can be used as an in-line replacement for Strings, meaning that StringSifter can be similarly invoked on FLOSS output using the following command:
$PY2_VENV/bin/floss –q <options> <my_sample> | rank_strings <options>
–qargument suppresses headers and formatting to show only extracted strings. To learn more about additional FLOSS options, please see its Usage Docs.
- FLOSS requires Python 2, while StringSifter requires Python 3. In the example command at least one of
rank_stringsmust include a relative path referencing a python virtual enviroment.
Notes on running
This distribution includes the
flarestrings program to ensure predictable output across platforms. If you choose to run your system's installed
strings note that its options are not consistent across versions and platforms:
Most Linux distributions include the
strings program from GNU Binutils. To extract both "wide" and "narrow" strings the program must be run twice, piping to an output file:
strings <my_sample> > strs.txt # narrow strings strings -el <my_sample> >> strs.txt # wide strings. note the ">>"
Some versions of BSD
strings packaged with MacOS do not support wide strings. Also note that the
-a option to strings to scan the whole file may be disabled in the default configuration. Without
-a informative strings may be lost. We recommend installing GNU Binutils via Homebrew or MacPorts to get a version of
strings that supports wide characters. Use care to invoke the correct version of
This version of StringSifter was trained using Strings outputs from sampled malware binaries associated with the first EMBER dataset. Ordinal labels were generated using weak supervision procedures, and supervised learning is performed by Gradient Boosted Decision Trees with a learning-to-rank objective function. See Quick Links for further technical details. Please note that neither labeled data nor training code is currently available, though we may reconsider this approach in future releases.
We use GitHub Issues for posting bugs and feature requests.
- Thanks to the FireEye Data Science (FDS) and FireEye Labs Reverse Engineering (FLARE) teams for review and feedback.
- StringSifter was designed and developed by Philip Tully (FDS), Matthew Haigh (FLARE), Jay Gibble (FLARE), and Michael Sikorski (FLARE).
- The StringSifter logo was designed by Josh Langner (FLARE).
flarestringsis derived from the excellent tool FLOSS.