Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
output
README.md
instr_sim.py
query.py
senvec.py
tsne2.py

README.md

cross-arch-instr-model.github.io

Thank you for looking at our work! The programs included here were created for the following paper:

"A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis"

Kimberly Redmond, Lannan Luo, and Qiang Zeng

The NDSS Workshop on Binary Analysis Research (BAR), 2019.

############################

The trained cross-architecture instruction embedding model used in our paper are included in the output/ directory. Please remember to unzip the four output files.

Our embeddings were trained on the model Bivec, which is based on Word2Vec. You may find it here:

https://github.com/lmthang/bivec

############################

ABOUT THESE PROGRAMS

All file paths and instruction selections are hard-coded into these programs. For your convenience, they are listed in variables near the top; feel free to modify them for your use.

./senvec.py

Returns ROC plots and AUC scores for cross-architecture basic block similarity tests. Basic block embeddings are calculated by summing instruction embeddings within a block

Similarity is computed using Cosine similarity

./tsne2.py

Returns 2 t-SNE figures with different displays: 1) an unlabeled figure displaying all instructions in one vector space 2) a labeled figure displaying selected instructions in one vector space

./instr_sim.py

Returns 2 ROC plots and AUC scores for instruction-level similarity tests. Instructions are evaluated in pairs, in 2 ways: 1) mono-architecture 2) cross-architecture

The similarity metric used is cosine similarity.

./query.py

Returns a list of the top-5 most similar instructions, given an instruction. Each instruction returns the top 6 instructions from its own architecture (#1 is itself), and the top 5 instructions from the other architecture, according to cosine similarity.

You can’t perform that action at this time.