This repository contains the code we developed, along with cover files generated by our approach. It also includes the real development files used in our experiments. Since no public implementation of the English Shellcode method is available, we tried our best to implement this method and provide it in the repository as well.
The code in this reporitory is divided into two part:
- The
prompts: The prompts used in our experiments. - The
encoder: Scripts that use an LLM to embed the payload into the cover data. - The
decoder: Several decoder implementations, including those described in the paper as well as customized versions that handle padding. The decoders works with block size 2 by default.
if you want encode only the payload, you need only encoder and can skip the padding. In the case that you wante to see all the encoding pipline as it is defined in the paper, you need to use a customized decoders (more readble) that in placed in the decoder (sub)directory.
If you only want to encode the payload, you can use the encoder component and skip the padding step. If you want to reproduce the full encoding pipeline as described in the paper, you should use the customized decoder provided in the decoder directory.
As it is defind in the apptainer.def, you need the following dependencies:
- apptainer
- python:3.10-slim
- torch==2.6.0
- accelerate>=0.26.0
- xformers==0.0.29.post2
- vllm
- pandas
- transformers
From 'encoder' directory run:
./run_all.sh
It provides you the cover file with the natural ending point.
From encoder directory run:
./run_encoder.sh
It provides you the cover file without the natural ending point.
Several decoders are provided in Shell, Python, and Perl, along with a customized decoder for padded cover files. You can run
cat poem.txt | ./decoder/decoder.sh | xxd
to verify the correctness of the payload encoded in the poem example shown in the paper in Figure 1.
More details can be found in the directory decoder.