**DeapSECURE module 5: Cryptography for Privacy-Preserving Computation (part A: Data Protection)**

# Session 1 Solution: Encoding - Representing Data on Computers

This notebook provides complete solutions for the hands-on learning activities of the Cryptography module, Episode 3.
It covers encoding, data representation, and fundamental concepts for cryptographic operations.

## 1. Setup and Imports

In [2]:
# Standard library imports
import json
import re
import os
import time
import numpy

## 2. Integers, Bits, and Hexadecimal Numbers

Understanding binary representation and hexadecimal notation is fundamental to cryptography.

In [3]:
# Demonstrate binary to hexadecimal conversion
print("Binary to Hexadecimal Conversion:")
print("8-bits        hexadecimal   decimal value")
for i in [0, 1, 2, 3, 254, 255]:
    print(f"{i:08b}   =>     {hex(i)}    =>     {i}")

Binary to Hexadecimal Conversion:
8-bits        hexadecimal   decimal value
00000000   =>     0x0    =>     0
00000001   =>     0x1    =>     1
00000010   =>     0x2    =>     2
00000011   =>     0x3    =>     3
11111110   =>     0xfe    =>     254
11111111   =>     0xff    =>     255


## 3. Strings and Bytes

Encryption operates on byte-level representations of data. Understanding string-to-bytes conversion is essential.

In [4]:
# Create a "Hello world" string
S = "Hello world"
print(f"String: {S}")
print(f"Length: {len(S)} characters")

String: Hello world
Length: 11 characters


In [5]:
# Create a string with non-Latin characters
S_ZH = "早上好, friend!"
print(f"String with Chinese: {S_ZH}")
print(f"Length: {len(S_ZH)} characters")

String with Chinese: 早上好, friend!
Length: 12 characters


In [6]:
# Encode strings to bytes using UTF-8
B = bytes(S, encoding="utf-8")
print(f"Encoded 'Hello world': {B}")
print(f"Length in bytes: {len(B)}")

# Alternative encoding method
B_alt = S.encode("utf-8")
print(f"Same result: {B == B_alt}")

Encoded 'Hello world': b'Hello world'
Length in bytes: 11
Same result: True


In [7]:
# Encode non-Latin string
B_ZH = S_ZH.encode("utf-8")
print(f"Encoded Chinese string: {B_ZH}")
print(f"String length: {len(S_ZH)} characters")
print(f"Byte length: {len(B_ZH)} bytes")
print(f"\nNote: Chinese characters require 3 bytes each in UTF-8")

Encoded Chinese string: b'\xe6\x97\xa9\xe4\xb8\x8a\xe5\xa5\xbd, friend!'
String length: 12 characters
Byte length: 18 bytes

Note: Chinese characters require 3 bytes each in UTF-8


In [8]:
# Decode bytes back to string
decoded = B_ZH.decode()
print(f"Decoded: {decoded}")
print(f"Matches original: {decoded == S_ZH}")

Decoded: 早上好, friend!
Matches original: True


In [9]:
# ASCII character codes
print("ASCII Character Codes:")
print(f"'A' = {ord('A')} = {hex(ord('A'))}")
print(f"'a' = {ord('a')} = {hex(ord('a'))}")
print(f"'0' = {ord('0')} = {hex(ord('0'))}")

ASCII Character Codes:
'A' = 65 = 0x41
'a' = 97 = 0x61
'0' = 48 = 0x30


## 4. Hexadecimal Representation

Hexadecimal representation provides a compact way to display binary data.

In [10]:
# Convert bytes to hexadecimal
hex_string = B.hex()
print(f"'Hello world' in hex: {hex_string}")
print(f"Byte length: {len(B)}")
print(f"Hex string length: {len(hex_string)}")
print(f"Note: Hex string is exactly 2x the byte length")

'Hello world' in hex: 48656c6c6f20776f726c64
Byte length: 11
Hex string length: 22
Note: Hex string is exactly 2x the byte length


In [11]:
# Verify hex encoding
print(f"\nHex to ASCII mapping:")
print(f"48 (hex) = {int('48', 16)} (decimal) = '{chr(int('48', 16))}' (ASCII)")
print(f"65 (hex) = {int('65', 16)} (decimal) = '{chr(int('65', 16))}' (ASCII)")


Hex to ASCII mapping:
48 (hex) = 72 (decimal) = 'H' (ASCII)
65 (hex) = 101 (decimal) = 'e' (ASCII)


In [12]:
# Create bytes from hex string
S2 = bytes.fromhex("476f6f64206d6f726e696e67")
print(f"From hex string: {S2}")

# Create bytes from array of integers
S3 = bytes([0x47, 0x6f, 0x6f, 0x64, 0x20, 0x6d, 0x6f, 0x72, 0x6e, 0x69, 0x6e, 0x67])
print(f"From integer array: {S3}")
print(f"Both methods produce same result: {S2 == S3}")

From hex string: b'Good morning'
From integer array: b'Good morning'
Both methods produce same result: True


## 5. Encoding and Decoding Integers

In [13]:
def encode_int(C, minlength=16):
    """Encodes an arbitrarily long integer into a bytes object.
    The minimum length is by default 16 bytes (128 bits)."""
    C_hex = hex(C)[2:]  # Remove '0x' prefix
    if len(C_hex) % 2:
        C_hex = '0' + C_hex  # Ensure even number of hex digits
    C_bytes = bytes.fromhex(C_hex)
    if len(C_bytes) < minlength:
        # Pad the left side with NULLs
        C_bytes = C_bytes.rjust(minlength, b'\x00')
    return C_bytes

In [14]:
def decode_int(B):
    """Decodes a bytes object into a long integer.
    This is the converse of the encode_int function."""
    return int(B.hex(), 16)

In [15]:
# Example: Encode and decode an integer
A = 0x2A3749
print(f"Original integer (decimal): {A}")
print(f"Original integer (hex): {hex(A)}")

A_bytes = encode_int(A)
print(f"\nEncoded to bytes: {A_bytes}")
print(f"Hex representation: {A_bytes.hex()}")
print(f"Length: {len(A_bytes)} bytes")

Original integer (decimal): 2766665
Original integer (hex): 0x2a3749

Encoded to bytes: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00*7I'
Hex representation: 000000000000000000000000002a3749
Length: 16 bytes


In [16]:
# Decode back to integer
A_decoded = decode_int(A_bytes)
print(f"Decoded integer: {A_decoded}")
print(f"Matches original: {A_decoded == A}")

Decoded integer: 2766665
Matches original: True


In [17]:
# Alternative using Python's built-in to_bytes method
A_bytes_alt = A.to_bytes(16, 'big')
print(f"Using to_bytes: {A_bytes_alt}")
print(f"Same result: {A_bytes == A_bytes_alt}")

Using to_bytes: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00*7I'
Same result: True


## 6. Strings as Long Integers

Strings can be represented as very large integers through their byte representation.

In [18]:
# Convert string to integer
B = b'Hello world'
B_hex = B.hex()
B_int = int(B_hex, 16)

print(f"String: {B}")
print(f"Hex representation: {B_hex}")
print(f"Integer representation: {B_int}")
print(f"\nConclusion: The same data can be represented as:")
print(f"  - Unicode string: 'Hello world'")
print(f"  - Byte string: {B}")
print(f"  - Hex string: {B_hex}")
print(f"  - Long integer: {B_int}")

String: b'Hello world'
Hex representation: 48656c6c6f20776f726c64
Integer representation: 87521618088882671231069284

Conclusion: The same data can be represented as:
  - Unicode string: 'Hello world'
  - Byte string: b'Hello world'
  - Hex string: 48656c6c6f20776f726c64
  - Long integer: 87521618088882671231069284


## 7. Padding and Unpadding

AES is a block cipher that operates on 16-byte blocks. Padding ensures data conforms to this requirement.

In [19]:
def leftpad16(B):
    """Pad a bytes array from the left with NULL chars so that
    the length is a multiple of 16 bytes."""
    padlength = len(B) % 16
    if padlength > 0:
        return (b'\x00' * (16 - padlength)) + B
    else:
        return B

In [20]:
# Example: Padding messages
msg = b'ODU is great'
msg_pad = leftpad16(msg)

print(f"Original message: {msg}")
print(f"Original length: {len(msg)} bytes")
print(f"\nPadded message: {msg_pad}")
print(f"Padded length: {len(msg_pad)} bytes")

Original message: b'ODU is great'
Original length: 12 bytes

Padded message: b'\x00\x00\x00\x00ODU is great'
Padded length: 16 bytes


In [21]:
# Example with longer message
msg2 = b'We need padding because AES is a block cipher'
msg2_pad = leftpad16(msg2)

print(f"Original message: {msg2}")
print(f"Original length: {len(msg2)} bytes")
print(f"Padded length: {len(msg2_pad)} bytes")
print(f"\nNote: {len(msg2_pad)} is a multiple of 16")

Original message: b'We need padding because AES is a block cipher'
Original length: 45 bytes
Padded length: 48 bytes

Note: 48 is a multiple of 16


In [22]:
# Unpadding: Remove leading NULLs
msg_unpadded = msg_pad.lstrip(b'\x00')
print(f"Unpadded message: {msg_unpadded}")
print(f"Matches original: {msg_unpadded == msg}")

msg2_unpadded = msg2_pad.lstrip(b'\x00')
print(f"\nUnpadded message 2: {msg2_unpadded}")
print(f"Matches original: {msg2_unpadded == msg2}")

Unpadded message: b'ODU is great'
Matches original: True

Unpadded message 2: b'We need padding because AES is a block cipher'
Matches original: True


## 8. Summary: Data Representation Conversions

This notebook demonstrated the fundamental conversions needed for cryptographic operations.

In [23]:
# Comprehensive example showing all conversions
original_text = "Crypto"

print("=" * 60)
print("COMPREHENSIVE DATA REPRESENTATION EXAMPLE")
print("=" * 60)

# 1. String
print(f"\n1. Unicode String: '{original_text}'")

# 2. Bytes
bytes_repr = original_text.encode('utf-8')
print(f"2. Bytes: {bytes_repr}")

# 3. Hexadecimal
hex_repr = bytes_repr.hex()
print(f"3. Hexadecimal: {hex_repr}")

# 4. Integer
int_repr = int(hex_repr, 16)
print(f"4. Integer: {int_repr}")

# 5. Padded bytes
padded = leftpad16(bytes_repr)
print(f"5. Padded (16-byte block): {padded}")
print(f"   Length: {len(padded)} bytes")

print("\n" + "=" * 60)
print("All representations encode the same information!")
print("=" * 60)

COMPREHENSIVE DATA REPRESENTATION EXAMPLE

1. Unicode String: 'Crypto'
2. Bytes: b'Crypto'
3. Hexadecimal: 43727970746f
4. Integer: 74158942745711
5. Padded (16-byte block): b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00Crypto'
   Length: 16 bytes

All representations encode the same information!
