We present XTRIDE, an improved n-gram-based (cf. STRIDE) approach on type recovery for binaries that focuses on practicality:
highly optimized throughput and actionable confidence scores allow for deployment in automated pipelines.
When compared to the state of the art in struct recovery, our method achieves comparable performance while being between 70 and 2300× faster.
The CLI tool in ./bin requires a library of version 1.8.4 or later for hdf5 installed (as per the crate's docs).
Building with the latest version fails on MacOS, we recommend installing hdf5 v1.10, e.g., with
brew install hdf5@1.10
- create tokenized dataset of form, see Dataset Preparation.
- create dataset splits
cargo run --release -- create-dataset -i ../new_dataset/ -o ./ - build vocab
cargo run --release -- build-vocab ./xtride_plus_train.jsonl xtride_plus.vocab -t type - build ngram databases for n = {2, 4, 8, 12, 48} (specify in
bin/src/db_creation.rs).cargo run --release -- build-all-dbs -t type -k 5 --flanking -o xtride_plus_dbs/ ./xtride_plus_train.jsonl xtride_plus.vocab - Evaluate on the test set split
cargo run --release -- evaluate --threshold-sweep ./xtride_plus_test.jsonl xtride_plus.vocab ./out_xtride.json --flanking --db-dir ./xtride_plus_dbs
Use recover to run best-effort type recovery on a single decompiled function listing (plain text input).
- one function per file
- decompiler-style symbol names are recommended (e.g.,
var*,param*,stack*,iVar*,sub_*) - predictions are only as good as alignment between your input style and the training data distribution
cargo run --release -- recover ./decompiled_function.c \
--vocab ./xtride_plus.vocab \
--db-dir ./xtride_plus_dbs \
--flanking \
--top-k 5 \--fn-vocab <path>: explicit function vocabulary path (if omitted,recovertries<vocab_stem>.fn.vocab)--strip: enable legacy full strip mode (DIRT / STRIDE backwards-compatibility, use with caution)--threshold <float>: hide predictions below score cutoff (1.0disables filtering)--top-k <int>: number of candidates shown per symbol (default:5)
The presented scores are confidence-style ranking scores from the model pipeline. They are useful for relative ranking and filtering, not calibrated probabilities. The summary reports detected symbols, filtered symbols, and symbols with no model output.
We include the preprocessed data to replicate the ./data directory.
The JSONL files can be directly used to extract a vocabulary and train the model (steps 3 and onwards, choose the 16-db configuration in bin/src/db_creation.rs).
While the training dataset includes a large amount of data from a wide variety of binaries, we want to reiterate that the generalizability of n-gram-based approaches is limited.
We always recommend adding domain-specific samples to the dataset, depending on where you plan to employ the model.
The provided dataset contains samples that are
- stripped
- ELF binaries
- collected from Ghidra
Trying to run inference on samples that diverge from this distribution will most likely result in unusable predictions.
Further information on how to extract data for new datasets or to retrain and evaluate on the DIRT dataset are included in Dataset Preparation Docs.
The retyper module showcases a reference implementation for a deep integration of the XTRIDE type recovery system with a decompiler.
The functionality is gated behind a feature flag and can be activate with cargo build --features retyper.
We make use of Binarly's BIAS framework for program analysis that was published as part of VulHunt. The framework features an expressive typing system that integrates seamlessly with the fork of the Ghidra decompiler backend that is used to lift internally recovered representations to pseudo C. We extended this fork and its ffi with interfaces that allow to directly modify variable types in the decompiler. This enables direct application of inferred types within the decompiler context, incl. propagation of field types and similar.
| Before: | After: |
![]() |
![]() |
For further information and examples check out our blog post.
In general, any decompiler integration requires a translation layer from text-based predictions (from the vocabulary) into a tool-specific representation.
The format used in DIRT is expressive enough to allow for this but requires recursive resolution of types (e.g., in structs) and manual computation of offsets and sizes (all necessary info is there, incl. padding annotations).
For the retyper module, the types in the vocab (and thus, in the training dataset) are required to be serialized BIAS types.
We are currently not planning on publishing a full pipeline for data extraction and dataset creation and thus deem this a reference implementation rather than a full PoC.
If you use the code, techniques or results provided with this repository and the corresponding paper, please cite our work as follows:
@inproceedings{Seidel_Practical_Type_Inference_2026,
author = {Seidel, Lukas and Thomas, Sam L. and Rieck, Konrad},
title = {{Practical Type Inference: High-Throughput Recovery of Real-World Structures and Function Signatures}},
series = {The 16th ACM Conference on Data and Application Security and Privacy},
month = jun,
year = {2026},
url = {https://arxiv.org/abs/2603.08225},
}

