Shape-Shifting Malicious Code in Software Backdoors via Language Models

Overview

This repository contains the code we developed, along with cover files generated by our approach. It also includes the real development files used in our experiments. Since no public implementation of the English Shellcode method is available, we tried our best to implement this method and provide it in the repository as well.

The code in this reporitory is divided into two part:

The prompts: The prompts used in our experiments.
The encoder: Scripts that use an LLM to embed the payload into the cover data.
The decoder: Several decoder implementations, including those described in the paper as well as customized versions that handle padding. The decoders works with block size 2 by default.

if you want encode only the payload, you need only encoder and can skip the padding. In the case that you wante to see all the encoding pipline as it is defined in the paper, you need to use a customized decoders (more readble) that in placed in the decoder (sub)directory.

If you only want to encode the payload, you can use the encoder component and skip the padding step. If you want to reproduce the full encoding pipeline as described in the paper, you should use the customized decoder provided in the decoder directory.

Dependencies

As it is defind in the apptainer.def, you need the following dependencies:

apptainer
python:3.10-slim
torch==2.6.0
accelerate>=0.26.0
xformers==0.0.29.post2
vllm
pandas
transformers

Run the whole pipline

From 'encoder' directory run:

./run_all.sh

It provides you the cover file with the natural ending point.

Encoder only

From encoder directory run:

./run_encoder.sh

It provides you the cover file without the natural ending point.

Payload Reconstruction

Several decoders are provided in Shell, Python, and Perl, along with a customized decoder for padded cover files. You can run

cat poem.txt | ./decoder/decoder.sh | xxd

to verify the correctness of the payload encoded in the poem example shown in the paper in Figure 1.

More details can be found in the directory decoder.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
apptainer		apptainer
decoders		decoders
detection		detection
prompts		prompts
samples		samples
.gitignore		.gitignore
.json		.json
README.md		README.md
_padded.md		_padded.md
poem.txt		poem.txt
shellcode.bin		shellcode.bin
test_padded.md		test_padded.md
tmp.py		tmp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shape-Shifting Malicious Code in Software Backdoors via Language Models

Overview

Dependencies

Run the whole pipline

Encoder only

Payload Reconstruction

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Shape-Shifting Malicious Code in Software Backdoors via Language Models

Overview

Dependencies

Run the whole pipline

Encoder only

Payload Reconstruction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages