# Detect the payload size for LSB steganography

In this notebook, different methods to detect the payload size are presented. 
The payload size is the size of the data that is being transmitted inside the stego image
that was hidden using LSB steganography.

Three methods will be presented that can be used depending on the information that is available:
- Known message attack: If the original message is known, it is possible to detect the payload size by identifying the message in the extracted payload. This is the most reliable method but may take a while if the used LSBs are high.
- Known stego image attack: If the stego image is known, it is possible to detect the payload size by checking how the file size changes and find the correlation between the payload size and the file size.
- Statistical attack: Using RS analysis, it is possible to detect the payload size by analyzing the distribution of the pixel values.

## Initialization

First we need to import the extraction functions of [Extract LSBs](./extract-lsbs.ipynb).

In [7]:
from tqdm.notebook import tqdm
from pathlib import Path

In [8]:
%run extract-lsbs.ipynb

Note: you may need to restart the kernel to use updated packages.
usage: jupyter [-h] [--version] [--config-dir] [--data-dir] [--runtime-dir]
               [--paths] [--json] [--debug]
               [subcommand]

Jupyter: Interactive Computing

positional arguments:
  subcommand     the subcommand to launch

options:
  -h, --help     show this help message and exit
  --version      show the versions of core jupyter packages and exit
  --config-dir   show Jupyter config dir
  --data-dir     show Jupyter data dir
  --runtime-dir  show Jupyter runtime dir
  --paths        show all Jupyter paths. Add --json for machine-readable
                 format.
  --json         output paths as machine-readable json
  --debug        output debug information about paths

Available subcommands: dejavu events execute kernel kernelspec lab
labextension labhub migrate nbconvert notebook run server troubleshoot trust

Jupyter command `jupyter-nbextension` not found.
Found the fol

In [9]:
async def for_each_image(func, take=None, filter_methods=None):
    """Helper function to iterate over all stego images and apply a function to them."""
    filter_methods = (lambda x: True) if filter_methods is None else filter_methods

    for method, stego_images in stego_images_by_method.items():
        if not filter_methods(method):
            continue

        for channels in ['RGB', 'RGBA', 'A', 'R', 'G', 'B']: # Sorted by most likely occurence; other combination possible
            for bits in [1, 2, 4]: # Other bits (e.g. 3) possible but higher computation time
                for direction in ['msb', 'lsb']:
                    yield await func(stego_images[:take] if take is not None else stego_images, (method, channels, bits, direction))

## Known message attack

In [10]:
EMBEDDED_MESSAGES_DIR = Path('./data/embedded_messages')


async def save_embedded_messages(base_dir=EMBEDDED_MESSAGES_DIR):
    async def handler(stego_images, path_parts):
        method, channels, bits, direction = path_parts
        sub_dir = base_dir / method / channels / f'ls{bits}b' / direction
        if sub_dir.exists():
            return

        sub_dir.mkdir(parents=True, exist_ok=True)
        async for stego_img, msg in extract_messages(stego_images, bits, direction, method):
            img_name = stego_img.stem
            msg_file = sub_dir / f'{img_name}.txt'
            if msg_file.exists():
                continue

            msg_file.write_bytes(msg.tobytes())

    _ = [_ async for _ in for_each_image(handler)]

In [11]:
MESSAGE_DIR = Path('../datasets/StegoAppDB_stegos_20240309-030352/message_dictionary')


def find_nth_substring(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start + len(needle))
        n -= 1
    return start


async def get_original_message(stego_img):
    img_row = info_file[info_file['image_filename'] == stego_img.name]
    msg_name = img_row['message_dictionary'].values[0]
    starting_line_index = img_row['message_starting_index'].values[0]
    msg_len = img_row['message_length'].values[0]
    full_msg = (MESSAGE_DIR / msg_name).read_text()
    start_index = find_nth_substring(full_msg, '\n', starting_line_index - 1) + 1
    return full_msg[start_index:start_index + msg_len].encode('utf-8')


async def get_embedded_message(stego_img, bits, direction, channels):
    return extract_message(stego_img, bits, direction, channels=channels).tobytes()


async def detect_used_method_and_bits(stego_images, path_parts):
    method, channels, bits, direction = path_parts
    results = []
    for stego_img in tqdm(stego_images, desc=f'Cycling through {method} {channels} {bits}-LSB {direction.upper()}'):
        original_msg = await get_original_message(stego_img)
        extracted_msg = await get_embedded_message(stego_img, bits, direction, channels)
        index = extracted_msg.find(original_msg)
        if index != -1:
            results.append((method, bits, direction, index))

    rate = len(results) / len(stego_images)
    results = set(results)
    if len(results) == 1:
        return rate, results.pop()
    elif len(results) > 1:
        return rate, results
    else:
        return rate, None


#detected_used_method_and_bits = await detect_used_method_and_bits(
#    stego_images_by_method['PocketStego'][:10],
#    ('PocketStego', 'B', 1, 'MSB')
#)
detected_used_method_and_bits = [
    (rate, values) 
    async for rate, values in for_each_image(detect_used_method_and_bits, take=10, filter_methods=lambda m: m == 'PocketStego') 
    if values is not None
]
detected_used_method_and_bits

Cycling through PocketStego RGB 1-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego RGB 1-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego RGB 2-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego RGB 2-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego RGB 4-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego RGB 4-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego RGBA 1-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego RGBA 1-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego RGBA 2-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego RGBA 2-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego RGBA 4-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego RGBA 4-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego A 1-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego A 1-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego A 2-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego A 2-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego A 4-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego A 4-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego R 1-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego R 1-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego R 2-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego R 2-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego R 4-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego R 4-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego G 1-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego G 1-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego G 2-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego G 2-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego G 4-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego G 4-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego B 1-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego B 1-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego B 2-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego B 2-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego B 4-LSB MSB:   0%|          | 0/10 [00:00<?, ?it/s]

Cycling through PocketStego B 4-LSB LSB:   0%|          | 0/10 [00:00<?, ?it/s]

[]

### Overlap the payloads

To detect a signature, we will overlap the messages of the stego images by doing a bitwise and-operation.
This naive approach can only detect a leading signature in the payloads.
For detecting a signature at the end of the payloads, we need to know the payload length
which can be calculated approximately with e.g. the RS analysis.

After collecting the messages, we will overlap them by doing a bitwise and-operation and
strip all surrounding zeros to find the signature.

In [12]:
def _overlap_message(acc, msg):
    if acc.shape != msg.shape:
        acc, msg = (acc, np.resize(msg, acc.shape)) if acc.size < msg.size else (np.resize(acc, msg.shape), msg)
    return np.bitwise_and(acc, msg)


async def extract_leading_sig(stego_images, bits: int = 1, direction='msb', embedding_method=None):
    messages = (msg async for msg in extract_messages(stego_images, bits, direction, embedding_method))
    reduced_msg = await afn.reduce(_overlap_message, messages)
    return np.trim_zeros(reduced_msg.ravel()).tobytes()


leading_signatures = {}
for method, stego_images in stego_images_by_method.items():
    leading_signatures[method] = {
        'MSB': {}, 'LSB': {}
    }
    for bits in [1, 2, 4]:
        leading_signatures[method]['MSB'][bits] = await extract_leading_sig(stego_images, bits, 'msb', method)
        leading_signatures[method]['LSB'][bits] = await extract_leading_sig(stego_images, bits, 'lsb', method)

leading_signatures

TypeError: argument of type 'NoneType' is not iterable