A tool for automatically generating static HELIX Components from library functions.
Writing HELIX Components can be time-consuming and before there exists a critical mass of useful and well-labeled HELIX Components, HELIX is limited in its dataset generation capabilities. This tool automatically parses symbols or exports for a given library and generates HELIX Components from each exported function, making use of link-time optimization for quick and easy program slicing on the target library. This makes generating a large number of HELIX Components fairly easy and improves HELIX's dataset generation capabilities.
Note: the generated Components can only be used for static analysis dataset generation - the library functions identified are not parameterized and are called with no arguments so the generated binaries will most likely crash.
Currently this supports C libraries.
Install Blind HELIX with pip, clone this repository and then (from the root of the repository) run:
pip install .
Blind HELIX includes a number of different parsers for different types of
libraries. As a basic example, to parse the zlib
library on Linux, run:
blind-helix parse generic-linux-library zlib \
--path /usr/lib/x86_64-linux-gnu/libz.a \
zlib.bhlx
This will parse the zlib
library, identify exported functions, build
Components for each export, test that the generated Components build
successfully (discarding any that fail to build), and finally save the
generated components to a file named zlib.bhlx
. This file can then be used to
load all of the verified Components which may then be used in HELIX scripts.
import blind_helix
with open("zlib.bhlx", "r") as f:
components = blind_helix.load(f)
# use `components` as necessary...
The generic-linux-library
parser requires that you specify the path to the
library file. To automatically find the library file, Blind HELIX is integrated
with Microsoft's VCPKG through the
vcpkg-linux-library
parser. To parse zlib
from VCPKG on Linux, simply run:
blind-helix parse vcpkg-linux-library zlib zlib.bhlx
Note: this requires that the zlib
library is already installed with VCPKG.
See the VCPKG documentation for information on how to install individual
libraries.
It's possible to parse many libraries in parallel using the parse-many
command. This command lets you specify all of the libraries you want to parse
by name, control the number of parallel workers for parsing, and captures parse
logging output. For example, to parse a collection of libraries from VCPKG:
blind-helix parse-many vcpkg-linux-library output/ \
zlib crc32c protobuf-c cpuinfo mbedtls jansson openjpg
Pull requests and GitHub issues are welcome.
DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited.
© 2021 Massachusetts Institute of Technology
- Subject to FAR 52.227-11 – Patent Rights – Ownership by the Contractor (May 2014)
- SPDX-License-Identifier: MIT
This material is based upon work supported by the Department of Defense under Air Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Department of Defense.