# What is Steganography?
Steganography is the practice of hiding a file, message, image, or video within another file, message, image, or video. The word Steganography is derived from the Greek words "steganos" (meaning hidden or covered) and "graphe" (meaning writing). Steganography is the use of various methods to hide information from unwanted eyes. In ancient times, steganography was mostly done physically.

The oldest documented case of steganography dates to 500 BC, in which Histiaeus, the ruler of Milteus, tattooed a message on the shaved head of one of his slaves and let the hair grow back. He then sent the slave to the Aristagoras, his son-in-law, who shaved the slave’s head again and revealed the message.

In the centuries that followed, more modern forms of steganography were invented, such as invisible inks. Today, steganography has moved to the digital world.

Hackers often use it to hide secret messages or data within media files such as images, videos, or audio files. Even though there are many legitimate uses for Steganography, such as watermarking, malware programmers have also been found to use it to obscure the transmission of malicious code.

# What is an Image
In the illustration bellow, imagine the picture on the left only has a size of 5x5 pixels. Thus, the image consist of 25 pixels in total. In reality, we would barley see a picture this small, but it is a good size for illustration purposes. As you can see the image has a grid. This grid can be used to access each pixel. On the right, we see how the python library OpenCV (cv2) stores this particular picture, namley in matrices with a shape of [5,5,3]. The last index (3) indicates the three different colours Red, Green, Blue. If we now access one particular pixel, for instance at location [4,4] we receive the pixel values [24,23,34]. The first value depicts the intensity for the colour red, the second one represents the intensity for the colour green and the last one for blue. In combination, those three colours yield a new colour which is depicted at this particular pixel location. Those values range from 0–255.

![image_rgb.png](image_rgb.png)



# What is the Least Significant Bit?
Least Significant Bit (LSB) is a technique in which the last bit of each pixel is modified and replaced with the data bit. This method only works on Lossless-compression images, which means the files are stored in a compressed format. However, this compression does not result in the data being lost or modified. PNG, TIFF, and BMP are examples of lossless-compression image file formats.

As you may already know, an image consists of several pixels, each containing three values (Red, Green, and Blue); these values range from 0 to 255. In other words, they are 8-bit values. For example, a value of 225 is 11100001 in binary, and so on.

# Ascii Table
https://www.asciitable.com/
# Original
```
[[(225, 12, 99), (155, 2, 50), (99, 51, 15), (15, 55, 22)],
[(155, 61, 87), (63, 30, 17), (1, 55, 19), (99, 81, 66)],
[(219, 77, 91), (69, 39, 50), (18, 200, 33), (25, 54, 190)]]
```

```
0110100 0110101 --> Hi in hexadecimal
```
```
225 --> 1110 0001 --> 1110 0000 -->224
12 --> 1100 --> 1101 --> 13
99 --> 0110 0011 --> 0110 0011 --> 99
155 --> 1001 1011 --> 1001 1010 --> 154
2 --> 10 --> 11 --> 3
50 --> 0011 0010 --> 0011 0010 -->50

```
# Encoded 1 bits
```
[[(224, 13, 99), (154, 3, 50), (98, 50, 15), (15, 54, 23)],
[(154, 61, 87), (63, 30, 17), (1, 55, 19), (99, 81, 66)],
[(219, 77, 91), (69, 39, 50), (18, 200, 33), (25, 54, 190)]]
```




In [10]:
import cv2
import numpy as np

In [11]:
def to_bin(data):
    """Convert `data` to binary format as string"""
    if isinstance(data, str):
        return ''.join([ format(ord(i), "08b") for i in data ])
    elif isinstance(data, bytes):
        return ''.join([ format(i, "08b") for i in data ])
    elif isinstance(data, np.ndarray):
        return [ format(i, "08b") for i in data ]
    elif isinstance(data, int) or isinstance(data, np.uint8):
        return format(data, "08b")
    else:
        raise TypeError("Type not supported.")

def to_binary_array(image):
    """Efficiently converts image pixels to binary representation."""

    binary_image = np.vectorize(lambda x: format(x, '08b'))(image)  # Vectorized conversion

    binary_data = ""  # Initialize outside the loop
    for row in binary_image:
        for pixel in row:
            r, g, b = pixel  # Assuming pixel is already a string of 8 bits.
            binary_data += r[-1] + g[-1] + b[-1] # Efficient string concatenation
    return binary_data

In [12]:
def encode(image_name, secret_data):
    # read the image
    image = cv2.imread(image_name)
    # maximum bytes to encode
    n_bytes = image.shape[0] * image.shape[1] * 3 // 8
    print("[*] Maximum bytes to encode:", n_bytes)
    if len(secret_data) > n_bytes:
        raise ValueError("[!] Insufficient bytes, need bigger image or less data.")
    print("[*] Encoding data...")
    # add stopping criteria
    secret_data += "====="
    data_index = 0
    # convert data to binary
    binary_secret_data = to_bin(secret_data)
    # size of data to hide
    data_len = len(binary_secret_data)
    for row in image:
        for pixel in row:
            # convert RGB values to binary format
            r, g, b = to_bin(pixel)
            # modify the least significant bit only if there is still data to store
            if data_index < data_len:
                # least significant red pixel bit
                pixel[0] = int(r[:-1] + binary_secret_data[data_index], 2)
                data_index += 1
            if data_index < data_len:
                # least significant green pixel bit
                pixel[1] = int(g[:-1] + binary_secret_data[data_index], 2)
                data_index += 1
            if data_index < data_len:
                # least significant blue pixel bit
                pixel[2] = int(b[:-1] + binary_secret_data[data_index], 2)
                data_index += 1
            # if data is encoded, just break out of the loop
            if data_index >= data_len:
                break
    return image

In [13]:
def decode(image_name):
    print("[+] Decoding...")
    # read the image
    image = cv2.imread(image_name)
    # Use vectorization to decodify the Image
    binary_data = to_binary_array(image)
    # split by 8-bits
    print("split by 8-bits")
    all_bytes = [ binary_data[i: i+8] for i in range(0, len(binary_data), 8) ]
    # convert from bits to characters
    decoded_data = ""
    for byte in all_bytes:
        decoded_data += chr(int(byte, 2))
        if decoded_data[-5:] == "=====":
            break
    return decoded_data[:-5]

In [14]:
input_image = "desk.png"

In [21]:
image = cv2.imread(input_image)
height, width, channels = image.shape  # Get image dimensions
total_pixels = height * width
print(f"Total pixels: {total_pixels}")

Total pixels: 1917100


In [22]:
image.shape 

(1009, 1900, 3)

In [23]:
input_image = "desk.png"
output_image = "desk_encoded.png"
secret_data = "This is a top secret message."
# encode the data into the image
encoded_image = encode(image_name=input_image, secret_data=secret_data)
# save the output image (encoded image)
cv2.imwrite(output_image, encoded_image)


[*] Maximum bytes to encode: 718912
[*] Encoding data...


True

In [24]:
# decode the secret data from the image
decoded_data = decode(output_image)
print("[+] Decoded data:", decoded_data)

[+] Decoding...
split by 8-bits
[+] Decoded data: This is a top secret message.


# Encoding Files

In [14]:
import cv2
import numpy as np

def encode(image_name, secret_data, n_bits=2):
    image = cv2.imread(image_name).astype(np.uint8)
    n_bytes = image.shape[0] * image.shape[1] * 3 * n_bits // 8

    if isinstance(secret_data, str):
        secret_data += "====="
        binary_secret_data = ''.join(format(ord(char), '08b') for char in secret_data)
    elif isinstance(secret_data, bytes):
        secret_data += b"====="
        binary_secret_data = ''.join(format(byte, '08b') for byte in secret_data)  # Corrected variable name here
    else:
        raise TypeError("Secret data must be a string or bytes.")

    data_len = len(binary_secret_data)
    if data_len > n_bytes * 8:
        raise ValueError(f"[!] Insufficient bytes ({data_len // 8}), need bigger image or less data.")

    print(f"[*] Maximum bits to encode: {n_bytes * 8}")
    print(f"[*] Data size (bits): {data_len}")
    print("[*] Encoding data...")

    # Calculate padding needed
    padding_needed = (n_bytes * 8 - data_len % (n_bytes * 8)) % (n_bytes * 8)
    binary_secret_data += '0' * padding_needed
    data_len = len(binary_secret_data)

    data_index = 0
    for bit in range(1, n_bits + 1):
        for c in range(3):
            channel = image[:, :, c]
            mask = 1 << (bit - 1)
            lsb_mask = ~mask

            bits_to_embed_count = min(channel.size, data_len - data_index)

            # Create bits to embed, padding with zeros if necessary
            bits_to_embed = np.array([int(binary_secret_data[i]) for i in range(data_index, data_index + bits_to_embed_count)])

            # Correctly reshape bits_to_embed
            bits_to_embed = np.pad(bits_to_embed, (0, channel.size - bits_to_embed_count), 'constant').reshape(channel.shape)

            channel[:] = (channel & lsb_mask) | (bits_to_embed << (bit - 1))
            data_index += bits_to_embed_count

            if data_index >= data_len:
                break
        if data_index >= data_len:
            break

    return image


def decode(image_name, n_bits=1, in_bytes=False):
    print("[+] Decoding...")
    image = cv2.imread(image_name).astype(np.uint8)
    binary_data = ""

    for bit in range(1, n_bits + 1):
        for c in range(3):
            channel = image[:, :, c]
            extracted_bits = (channel >> (bit - 1)) & 1
            binary_data += "".join(extracted_bits.flatten().astype(str))

    all_bytes = [binary_data[i:i + 8] for i in range(0, len(binary_data), 8)]

    if in_bytes:
        decoded_data = bytearray()
        for byte in all_bytes:
            decoded_data.append(int(byte, 2))
            if decoded_data[-5:] == b"=====":
                break
    else:
        decoded_data = ""
        for byte in all_bytes:
            decoded_data += chr(int(byte, 2))
            if decoded_data[-5:] == "=====":
                break
    return decoded_data[:-5]

In [15]:
with open("sample.pdf", "rb") as f:
    secret_data = f.read()

In [16]:
secret_data[:20]

b'%PDF-1.7\r\n%\xb5\xb5\xb5\xb5\r\n1 0'

In [17]:
output_image ="desk1_with_file.png"
input_image = "desk1.png"

In [18]:
encoded_image = encode(image_name=input_image, secret_data=secret_data, n_bits=2)
cv2.imwrite(output_image, encoded_image)
print("[+] Saved encoded image.")

[*] Maximum bits to encode: 11282688
[*] Data size (bits): 489272
[*] Encoding data...
[+] Saved encoded image.


In [19]:
pdf = decode(output_image, n_bits=2,in_bytes=True)

[+] Decoding...


In [20]:
pdf[:20]

bytearray(b'%PDF-1.7\r\n%\xb5\xb5\xb5\xb5\r\n1 0')

In [21]:
with open("pdf_decode.pdf", "wb") as f:
    f.write(pdf)