# üñºÔ∏è BMP Files & Binary Data Management in Python

A comprehensive guide to understanding binary file handling, string prefixes, the `struct` module, and BMP file structure.

---

## Table of Contents

1. [String Prefixes in Python](#1-string-prefixes-in-python)
2. [Reading Binary Files](#2-reading-binary-files)
3. [The struct Module](#3-the-struct-module)
4. [BMP File Structure](#4-bmp-file-structure)
5. [Reading BMP Files](#5-reading-bmp-files)
6. [Writing BMP Files](#6-writing-bmp-files)
7. [Practical Examples: Image Manipulation](#7-practical-examples-image-manipulation)

---

## 1. String Prefixes in Python

Python has several string prefixes, each with a specific purpose:

| Prefix | Name | Purpose |
|--------|------|---------|  
| `""` | Regular string | Normal text |
| `r""` | Raw string | No escape processing |
| `b""` | Bytes literal | Raw binary data |
| `f""` | F-string | Formatted string with variables |
| `u""` | Unicode string | Explicit Unicode (default in Python 3) |

### 1.1 Regular Strings `""`

Backslashes trigger escape sequences:

In [1]:
# Common escape sequences
print("Hello\nWorld")   # \n = newline
print("Tab\there")      # \t = tab
print("Quote: \"Hi\"") # \" = double quote

Hello
World
Tab	here
Quote: "Hi"


**Escape Sequences Reference:**

| Escape | Meaning |
|--------|---------|  
| `\n` | Newline |
| `\t` | Tab |
| `\\` | Literal backslash |
| `\'` | Single quote |
| `\"` | Double quote |

### 1.2 Raw Strings `r""`

Backslashes are treated as **literal characters** ‚Äî no escape processing:

In [2]:
# Problem with regular strings for Windows paths
path_broken = "C:\new_folder\test"
print("Regular string (broken):")
print(path_broken)  # \n and \t get interpreted!

print()

# Raw string solution
path_correct = r"C:\new_folder\test"
print("Raw string (correct):")
print(path_correct)  # Backslashes preserved

Regular string (broken):
C:
ew_folder	est

Raw string (correct):
C:\new_folder\test


In [3]:
# Common use: Regular expressions
import re

# Without raw string ‚Äî need to escape backslashes
pattern_messy = "\\d{3}-\\d{4}"  # Messy!

# With raw string ‚Äî clean and readable
pattern_clean = r"\d{3}-\d{4}"   # Much better!

phone = "555-1234"
print(f"Pattern matches phone: {bool(re.match(pattern_clean, phone))}")

Pattern matches phone: True


### 1.3 Bytes Literal `b""`

Creates a `bytes` object instead of a `str` ‚Äî raw binary data:

In [4]:
# String (text)
text = "WAVE"
print(f"Type: {type(text)}")
print(f"Value: {text}")
print(f"Index [0]: {text[0]} (character)")

print()

# Bytes (binary)
data = b"WAVE"
print(f"Type: {type(data)}")
print(f"Value: {data}")
print(f"Index [0]: {data[0]} (ASCII code for 'W')")

Type: <class 'str'>
Value: WAVE
Index [0]: W (character)

Type: <class 'bytes'>
Value: b'WAVE'
Index [0]: 87 (ASCII code for 'W')


In [5]:
# CRITICAL: bytes != str
print(f"b'WAVE' == 'WAVE': {b'WAVE' == 'WAVE'}")   # Always False!
print(f"b'WAVE' == b'WAVE': {b'WAVE' == b'WAVE'}") # True

b'WAVE' == 'WAVE': False
b'WAVE' == b'WAVE': True


**Key Differences:**

| Feature | `str` | `bytes` |
|---------|-------|---------|
| Type | `<class 'str'>` | `<class 'bytes'>` |
| Content | Unicode characters | Raw bytes (0-255) |
| Indexing | Returns character | Returns integer |
| Use case | Text | Binary files, network data |

### 1.4 F-Strings `f""`

Embed variables and expressions directly in strings:

In [6]:
name = "Manuel"
age = 30
x, y = 10, 5

# Without f-string (old way)
message_old = "Hello, " + name + "! You are " + str(age) + " years old."

# With f-string ‚Äî cleaner!
message_new = f"Hello, {name}! You are {age} years old."

print(message_new)
print(f"{x} + {y} = {x + y}")       # Expressions allowed
print(f"Uppercase: {name.upper()}") # Method calls allowed

Hello, Manuel! You are 30 years old.
10 + 5 = 15
Uppercase: MANUEL


### 1.5 Combining Prefixes

In [7]:
# Raw f-string ‚Äî useful for regex with variables
search_term = "Manuel"
pattern = rf"\b{search_term}\b"  # Word boundary around name
print(f"Pattern: {pattern}")

# Note: fb"" (f-string + bytes) is NOT allowed

Pattern: \bManuel\b


### üìã String Prefixes Quick Reference

| Prefix | Type | Escapes? | Variables? | Use Case |
|--------|------|----------|------------|----------|
| `""` | str | Yes | No | Normal text |
| `r""` | str | No | No | Regex, Windows paths |
| `b""` | bytes | Yes | No | Binary files, network data |
| `f""` | str | Yes | Yes | String formatting |
| `rf""` | str | No | Yes | Regex with variables |

---

## 2. Reading Binary Files

Binary mode (`"rb"`) reads raw bytes instead of text:

In [8]:
# Creating a sample binary file for demonstration
with open("sample.bin", "wb") as f:
    # Write some bytes: RIFF marker + 8 bytes of data
    f.write(b"RIFF")
    f.write(b"\x00\x01\x02\x03")  # Hex bytes
    f.write(b"WAVE")
    f.write(b"\xff\xfe\xfd\xfc")  # More hex bytes

print("Created sample.bin")

Created sample.bin


In [9]:
# Reading binary file
with open("sample.bin", "rb") as f:
    # Read all content
    content = f.read()
    print(f"Type: {type(content)}")
    print(f"Content: {content}")
    print(f"Length: {len(content)} bytes")

Type: <class 'bytes'>
Content: b'RIFF\x00\x01\x02\x03WAVE\xff\xfe\xfd\xfc'
Length: 16 bytes


In [10]:
# Reading specific number of bytes
with open("sample.bin", "rb") as f:
    first_4 = f.read(4)    # Read 4 bytes
    next_4 = f.read(4)     # Read next 4 bytes
    marker = f.read(4)     # Read next 4 bytes
    
    print(f"First 4 bytes: {first_4}")
    print(f"Next 4 bytes: {next_4}")
    print(f"Marker: {marker}")

First 4 bytes: b'RIFF'
Next 4 bytes: b'\x00\x01\x02\x03'
Marker: b'WAVE'


In [12]:
# Slicing bytes (like strings)
with open("sample.bin", "rb") as f:
    content = f.read()
    
    print(f"Bytes 0-3: {content[0:4]}")
    print(f"Bytes 8-11: {content[8:12]}")
    print(f"Single byte slice [0:1]: {content[0:1]}")
    print(f"Single byte index [0]: {content[0]} (integer)")
    print(f"Character: {chr(content[0])}")
# =============================================================================
# BYTES INDEXING VS SLICING
# =============================================================================
#
# data = b"RIFF"
#
# | Operation    | Result    | Type   | Why                          |
# |--------------|-----------|--------|------------------------------|
# | data[0:4]    | b'RIFF'   | bytes  | Slice ‚Üí subsequence          |
# | data[0:1]    | b'R'      | bytes  | Slice ‚Üí subsequence (len 1)  |
# | data[0]      | 82        | int    | Index ‚Üí single byte value    |
# | chr(data[0]) | 'R'       | str    | Convert int to character     |
#
# =============================================================================

Bytes 0-3: b'RIFF'
Bytes 8-11: b'WAVE'
Single byte slice [0:1]: b'R'
Single byte index [0]: 82 (integer)
Character: R


---

## 3. The struct Module

The `struct` module converts between Python values and C structs (binary data).

### 3.1 Format Characters

| Format | C Type | Python Type | Size |
|--------|--------|-------------|------|
| `b` | signed char | int | 1 byte |
| `B` | unsigned char | int | 1 byte |
| `h` | short | int | 2 bytes |
| `H` | unsigned short | int | 2 bytes |
| `i` | int | int | 4 bytes |
| `I` | unsigned int | int | 4 bytes |
| `f` | float | float | 4 bytes |
| `d` | double | float | 8 bytes |

### 3.2 Byte Order Prefixes

| Prefix | Meaning | Example |
|--------|---------|---------|  
| `<` | Little-endian | `0x1234` stored as `34 12` |
| `>` | Big-endian | `0x1234` stored as `12 34` |
| `=` | Native | System default |

In [None]:
import struct

# Demonstrate endianness
value = 0x1234  # 4660 in decimal

# Pack as little-endian (least significant byte first)
little = struct.pack('<H', value)
print(f"Little-endian: {little.hex()} (34 12)")

# Pack as big-endian (most significant byte first)
big = struct.pack('>H', value)
print(f"Big-endian:    {big.hex()} (12 34)")

# =============================================================================
# SIGNED VS UNSIGNED
# =============================================================================
#
# | Format | Type            | Range                        |
# |--------|-----------------|------------------------------|
# | B      | unsigned byte   | 0 to 255                     |
# | b      | signed byte     | -128 to 127                  |
# | H      | unsigned short  | 0 to 65,535                  |
# | h      | signed short    | -32,768 to 32,767            |
# | I      | unsigned int    | 0 to 4,294,967,295           |
# | i      | signed int      | -2,147,483,648 to 2,147,483,647 |
#
# Use signed when values can be negative (like BMP height)
# Use unsigned when values are always positive (like file size)
#
# =============================================================================
# BYTE ORDER (ENDIANNESS)
# =============================================================================
#
# | Prefix | Name          | Byte Order        | Used By           |
# |--------|---------------|-------------------|-------------------|
# | <      | Little-endian | Least sig. first  | Intel, BMP, WAV   |
# | >      | Big-endian    | Most sig. first   | Network, Java     |
# | =      | Native        | System default    | Local processing  |
#
# BMP files: Always use '<' (little-endian)
# WAV files: Always use '<' (little-endian)
# Network:   Usually use '>' (big-endian)
#
# =============================================================================

Little-endian: 3412 (34 12)
Big-endian:    1234 (12 34)


### 3.3 struct.pack() ‚Äî Python ‚Üí Bytes

In [14]:
import struct

# Pack a single unsigned short (2 bytes)
value = 1000
packed = struct.pack('<H', value)
print(f"Value: {value}")
print(f"Packed: {packed}")
print(f"Hex: {packed.hex()}")
print(f"Size: {len(packed)} bytes")

Value: 1000
Packed: b'\xe8\x03'
Hex: e803
Size: 2 bytes


In [15]:
# Pack multiple values
width = 800
height = 600

# Two signed 32-bit integers
packed = struct.pack('<ii', width, height)
print(f"Width: {width}, Height: {height}")
print(f"Packed: {packed}")
print(f"Size: {len(packed)} bytes (4 + 4)")

Width: 800, Height: 600
Packed: b' \x03\x00\x00X\x02\x00\x00'
Size: 8 bytes (4 + 4)


In [16]:
# Pack RGB values (3 unsigned bytes)
r, g, b = 255, 128, 64

packed = struct.pack('<BBB', b, g, r)  # Note: BMP stores BGR!
print(f"RGB: ({r}, {g}, {b})")
print(f"Packed as BGR: {packed}")
print(f"Hex: {packed.hex()}")

RGB: (255, 128, 64)
Packed as BGR: b'@\x80\xff'
Hex: 4080ff


### 3.4 struct.unpack() ‚Äî Bytes ‚Üí Python

In [17]:
import struct

# Unpack bytes to integers
data = b'\xe8\x03'  # 1000 in little-endian

result = struct.unpack('<H', data)
print(f"Data: {data}")
print(f"Unpacked: {result}")      # Returns tuple!
print(f"Value: {result[0]}")       # Get first element

Data: b'\xe8\x03'
Unpacked: (1000,)
Value: 1000


In [18]:
# Unpack multiple values
data = b'\x20\x03\x00\x00\x58\x02\x00\x00'  # 800 and 600 as 32-bit ints

width, height = struct.unpack('<ii', data)
print(f"Data: {data.hex()}")
print(f"Width: {width}")
print(f"Height: {height}")

Data: 2003000058020000
Width: 800
Height: 600


In [19]:
# Unpack BGR pixel
pixel_data = b'\x40\x80\xff'  # Blue=64, Green=128, Red=255

b, g, r = struct.unpack('<BBB', pixel_data)
print(f"Pixel data: {pixel_data.hex()}")
print(f"Blue: {b}")
print(f"Green: {g}")
print(f"Red: {r}")

Pixel data: 4080ff
Blue: 64
Green: 128
Red: 255


### 3.5 Signed vs Unsigned

BMP uses signed integers for height (negative = top-down):

In [20]:
import struct

# Same bytes, different interpretation
data = b'\xff\xff\xff\xff'  # All 1s in binary

# As unsigned 32-bit
unsigned = struct.unpack('<I', data)[0]
print(f"Unsigned: {unsigned}")  # 4294967295

# As signed 32-bit
signed = struct.unpack('<i', data)[0]
print(f"Signed: {signed}")      # -1 (two's complement)

Unsigned: 4294967295
Signed: -1


In [21]:
# Why BMP height can be negative
positive_height = struct.pack('<i', 600)
negative_height = struct.pack('<i', -600)

print(f"Height 600:  {positive_height.hex()}")
print(f"Height -600: {negative_height.hex()}")

# Negative height = rows stored top-to-bottom (not the default)

Height 600:  58020000
Height -600: a8fdffff


### üìã struct Module Quick Reference

```python
import struct

# PACKING (Python ‚Üí bytes)
struct.pack('<H', 1000)           # One unsigned short
struct.pack('<ii', 800, 600)      # Two signed ints
struct.pack('<BBB', 64, 128, 255) # Three unsigned bytes

# UNPACKING (bytes ‚Üí Python)
struct.unpack('<H', data)         # Returns tuple: (1000,)
struct.unpack('<ii', data)        # Returns tuple: (800, 600)
struct.unpack('<BBB', data)       # Returns tuple: (64, 128, 255)

# Always use [0] for single values:
value = struct.unpack('<H', data)[0]
```

---

## 4. BMP File Structure

A BMP file has three main sections:

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  BMP Header (14 bytes)              ‚îÇ  ‚Üê File info, pixel data offset
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ  DIB Header (40+ bytes)             ‚îÇ  ‚Üê Image info (width, height, etc.)
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ  Pixel Data (variable)              ‚îÇ  ‚Üê Actual image pixels
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
```

### 4.1 BMP Header (14 bytes)

| Offset | Size | Field | Description |
|--------|------|-------|-------------|
| 0-1 | 2 | Signature | `"BM"` magic number |
| 2-5 | 4 | File Size | Total file size in bytes |
| 6-9 | 4 | Reserved | Usually zeros |
| 10-13 | 4 | Pixel Offset | Where pixel data starts |

### 4.2 DIB Header (40 bytes for BITMAPINFOHEADER)

| Offset | Size | Field | Description |
|--------|------|-------|-------------|
| 0-3 | 4 | Header Size | Size of DIB header (40) |
| 4-7 | 4 | Width | Image width in pixels |
| 8-11 | 4 | Height | Image height (negative = top-down) |
| 12-13 | 2 | Color Planes | Always 1 |
| 14-15 | 2 | Bits/Pixel | 24 for RGB (8 bits each) |
| 16-19 | 4 | Compression | 0 = uncompressed |
| 20-23 | 4 | Image Size | Size of pixel data |
| 24-27 | 4 | X Pixels/Meter | Horizontal resolution |
| 28-31 | 4 | Y Pixels/Meter | Vertical resolution |
| 32-35 | 4 | Colors Used | Number of colors (0 = all) |
| 36-39 | 4 | Important Colors | Important colors (0 = all) |

### 4.3 Pixel Data

**Key points:**
- Each pixel is 3 bytes: **Blue, Green, Red** (BGR, not RGB!)
- Rows are padded to multiples of **4 bytes**
- Rows are stored **bottom to top** (unless height is negative)

In [22]:
# Understanding row padding
def calculate_padding(width: int, bytes_per_pixel: int = 3) -> int:
    """
    Calculate padding bytes needed for BMP row alignment.
    
    BMP rows must be multiples of 4 bytes.
    """
    row_bytes = width * bytes_per_pixel
    padding = (4 - (row_bytes % 4)) % 4
    return padding

# Examples
for width in [10, 11, 12, 13]:
    row_bytes = width * 3
    padding = calculate_padding(width)
    total = row_bytes + padding
    print(f"Width: {width:2d} | Row bytes: {row_bytes:2d} | Padding: {padding} | Total: {total:2d} (√∑4 = {total/4})")

Width: 10 | Row bytes: 30 | Padding: 2 | Total: 32 (√∑4 = 8.0)
Width: 11 | Row bytes: 33 | Padding: 3 | Total: 36 (√∑4 = 9.0)
Width: 12 | Row bytes: 36 | Padding: 0 | Total: 36 (√∑4 = 9.0)
Width: 13 | Row bytes: 39 | Padding: 1 | Total: 40 (√∑4 = 10.0)


### 4.4 Visual Example: 3x2 BMP

```
BMP File (3-pixel wide, 2-pixel tall image):
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ BM [file info] [pixel offset = 54]           ‚îÇ  ‚Üê 14 bytes (BMP header)
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ [40] [width=3] [height=2] [bpp=24] [...]     ‚îÇ  ‚Üê 40 bytes (DIB header)
‚îú‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î§
‚îÇ B G R ‚îÇ B G R ‚îÇ B G R ‚îÇ PAD PAD PAD ‚îÇ        ‚îÇ  ‚Üê Row 0 (bottom row)
‚îÇ B G R ‚îÇ B G R ‚îÇ B G R ‚îÇ PAD PAD PAD ‚îÇ        ‚îÇ  ‚Üê Row 1 (top row)
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò

Row bytes: 3 pixels √ó 3 bytes = 9 bytes
Padding: (4 - 9 % 4) % 4 = 3 bytes
Total per row: 9 + 3 = 12 bytes (divisible by 4 ‚úì)
```

---

## 5. Reading BMP Files

In [23]:
import struct
from typing import Final

# Constants
BMP_HEADER_SIZE: Final[int] = 14
DIB_HEADER_SIZE: Final[int] = 40
BMP_SIGNATURE: Final[bytes] = b"BM"


def read_bmp(filename: str) -> tuple[int, int, list, bytes]:
    """
    Read a 24-bit uncompressed BMP file.
    
    Parameters
    ----------
    filename : str
        Path to the BMP file.
    
    Returns
    -------
    tuple
        (width, height, pixels, full_header)
        - width: Image width in pixels
        - height: Image height in pixels  
        - pixels: 2D list of [B, G, R] values
        - full_header: Original headers for writing back
    """
    with open(filename, "rb") as f:
        # =====================================================
        # STEP 1: Read and validate BMP Header (14 bytes)
        # =====================================================
        bmp_header = f.read(BMP_HEADER_SIZE)
        
        # Check for "BM" signature
        if bmp_header[0:2] != BMP_SIGNATURE:
            raise ValueError("Not a valid BMP file. Must start with 'BM'.")
        
        # Extract pixel data offset (bytes 10-13)
        # '<I' = little-endian unsigned 32-bit integer
        pixel_offset = struct.unpack('<I', bmp_header[10:14])[0]
        
        # =====================================================
        # STEP 2: Read DIB Header
        # =====================================================
        dib_header_size = pixel_offset - BMP_HEADER_SIZE
        dib_header = f.read(dib_header_size)
        
        # Extract dimensions
        # '<i' = little-endian SIGNED 32-bit (height can be negative)
        width = struct.unpack('<i', dib_header[4:8])[0]
        height = struct.unpack('<i', dib_header[8:12])[0]
        
        # Extract bits per pixel (bytes 14-15 in DIB)
        # '<H' = little-endian unsigned 16-bit
        bpp = struct.unpack('<H', dib_header[14:16])[0]
        
        if bpp != 24:
            raise ValueError(f"Only 24-bit BMPs supported. Got {bpp}-bit.")
        
        # =====================================================
        # STEP 3: Calculate row padding
        # =====================================================
        # BMP rows must be multiples of 4 bytes
        padding = (4 - (width * 3) % 4) % 4
        
        # =====================================================
        # STEP 4: Read pixel data
        # =====================================================
        pixels = []
        
        for _ in range(abs(height)):
            row = []
            for _ in range(width):
                # Read 3 bytes as BGR
                # '<BBB' = three unsigned bytes
                b, g, r = struct.unpack('<BBB', f.read(3))
                row.append([b, g, r])
            
            # Skip padding bytes at end of row
            f.read(padding)
            pixels.append(row)
    
    # Return headers for easy writing later
    full_header = bmp_header + dib_header
    return width, height, pixels, full_header

In [24]:
# Let's create a simple test BMP file first
def create_test_bmp(filename: str, width: int, height: int) -> None:
    """
    Create a simple test BMP with a gradient pattern.
    """
    import struct
    
    padding = (4 - (width * 3) % 4) % 4
    row_size = width * 3 + padding
    pixel_data_size = row_size * height
    file_size = 54 + pixel_data_size  # 14 + 40 + pixel data
    
    with open(filename, "wb") as f:
        # BMP Header (14 bytes)
        f.write(b"BM")                              # Signature
        f.write(struct.pack('<I', file_size))       # File size
        f.write(struct.pack('<HH', 0, 0))           # Reserved
        f.write(struct.pack('<I', 54))              # Pixel offset
        
        # DIB Header (40 bytes)
        f.write(struct.pack('<I', 40))              # DIB header size
        f.write(struct.pack('<i', width))           # Width
        f.write(struct.pack('<i', height))          # Height
        f.write(struct.pack('<H', 1))               # Color planes
        f.write(struct.pack('<H', 24))              # Bits per pixel
        f.write(struct.pack('<I', 0))               # Compression
        f.write(struct.pack('<I', pixel_data_size)) # Image size
        f.write(struct.pack('<i', 2835))            # X pixels/meter
        f.write(struct.pack('<i', 2835))            # Y pixels/meter
        f.write(struct.pack('<I', 0))               # Colors used
        f.write(struct.pack('<I', 0))               # Important colors
        
        # Pixel data (bottom to top)
        for y in range(height):
            for x in range(width):
                # Create a gradient pattern
                r = int((x / width) * 255)
                g = int((y / height) * 255)
                b = 128
                f.write(struct.pack('<BBB', b, g, r))  # BGR order!
            
            # Write padding
            f.write(b'\x00' * padding)
    
    print(f"Created {filename} ({width}x{height})")


# Create test file
create_test_bmp("test_image.bmp", 10, 8)

Created test_image.bmp (10x8)


In [25]:
# Now read it back
width, height, pixels, header = read_bmp("test_image.bmp")

print(f"Width: {width}")
print(f"Height: {height}")
print(f"Header size: {len(header)} bytes")
print(f"Pixel rows: {len(pixels)}")
print(f"Pixels per row: {len(pixels[0])}")
print(f"\nFirst pixel (BGR): {pixels[0][0]}")
print(f"Last pixel (BGR): {pixels[-1][-1]}")

Width: 10
Height: 8
Header size: 54 bytes
Pixel rows: 8
Pixels per row: 10

First pixel (BGR): [128, 0, 0]
Last pixel (BGR): [128, 223, 229]


---

## 6. Writing BMP Files

In [27]:
import struct


def write_bmp(
    filename: str,
    width: int,
    height: int,
    pixels: list,
    header: bytes,
) -> None:
    """
    Write a 24-bit BMP file.
    
    Parameters
    ----------
    filename : str
        Output file path.
    width : int
        Image width.
    height : int
        Image height.
    pixels : list
        2D list of [B, G, R] pixel values.
    header : bytes
        Original BMP + DIB header.
    """
    padding = (4 - (width * 3) % 4) % 4
    
    with open(filename, "wb") as f:
        # Write original header
        f.write(header)
        
        # Write pixel data
        for row in pixels:
            for pixel in row:
                b, g, r = pixel
                f.write(struct.pack('<BBB', b, g, r))
            
            # Write padding
            f.write(b'\x00' * padding)
    
    print(f"Wrote {filename}")

---

## 7. Practical Examples: Image Manipulation

### 7.1 Grayscale Conversion

In [28]:
def grayscale(pixels: list) -> list:
    """
    Convert image to grayscale.
    
    Uses luminosity formula: 0.299*R + 0.587*G + 0.114*B
    """
    new_pixels = []
    
    for row in pixels:
        new_row = []
        for pixel in row:
            b, g, r = pixel
            # Luminosity formula (human eye is most sensitive to green)
            gray = int(0.299 * r + 0.587 * g + 0.114 * b)
            new_row.append([gray, gray, gray])
        new_pixels.append(new_row)
    
    return new_pixels


# Test it
width, height, pixels, header = read_bmp("test_image.bmp")
gray_pixels = grayscale(pixels)
write_bmp("test_grayscale.bmp", width, height, gray_pixels, header)

print(f"Original first pixel: {pixels[0][0]}")
print(f"Grayscale first pixel: {gray_pixels[0][0]}")

Wrote test_grayscale.bmp
Original first pixel: [128, 0, 0]
Grayscale first pixel: [14, 14, 14]


### 7.2 Invert Colors (Negative)

In [None]:
def invert(pixels: list) -> list:
    """
    Invert all pixel colors (create negative image).
    """
    new_pixels = []
    
    for row in pixels:
        new_row = []
        for pixel in row:
            b, g, r = pixel
            # Invert: 255 - value
            new_row.append([255 - b, 255 - g, 255 - r])
        new_pixels.append(new_row)
    
    return new_pixels


# Test it
width, height, pixels, header = read_bmp("test_image.bmp")
inverted = invert(pixels)
write_bmp("test_inverted.bmp", width, height, inverted, header)

print(f"Original first pixel: {pixels[0][0]}")
print(f"Inverted first pixel: {inverted[0][0]}")

### 7.3 Sepia Tone

In [None]:
def sepia(pixels: list) -> list:
    """
    Apply sepia tone filter.
    
    Sepia formula:
        newR = 0.393*R + 0.769*G + 0.189*B
        newG = 0.349*R + 0.686*G + 0.168*B  
        newB = 0.272*R + 0.534*G + 0.131*B
    """
    new_pixels = []
    
    for row in pixels:
        new_row = []
        for pixel in row:
            b, g, r = pixel
            
            # Apply sepia formula
            new_r = int(0.393 * r + 0.769 * g + 0.189 * b)
            new_g = int(0.349 * r + 0.686 * g + 0.168 * b)
            new_b = int(0.272 * r + 0.534 * g + 0.131 * b)
            
            # Clamp to 0-255
            new_r = min(255, new_r)
            new_g = min(255, new_g)
            new_b = min(255, new_b)
            
            new_row.append([new_b, new_g, new_r])
        new_pixels.append(new_row)
    
    return new_pixels


# Test it
width, height, pixels, header = read_bmp("test_image.bmp")
sepia_pixels = sepia(pixels)
write_bmp("test_sepia.bmp", width, height, sepia_pixels, header)

print(f"Original first pixel: {pixels[0][0]}")
print(f"Sepia first pixel: {sepia_pixels[0][0]}")

### 7.4 Horizontal Flip

In [29]:
def flip_horizontal(pixels: list) -> list:
    """
    Flip image horizontally (mirror).
    """
    new_pixels = []
    
    for row in pixels:
        # Reverse each row
        new_pixels.append(row[::-1])
    
    return new_pixels


# Test it
width, height, pixels, header = read_bmp("test_image.bmp")
flipped = flip_horizontal(pixels)
write_bmp("test_flipped.bmp", width, height, flipped, header)

print(f"Original first row, first pixel: {pixels[0][0]}")
print(f"Original first row, last pixel: {pixels[0][-1]}")
print(f"Flipped first row, first pixel: {flipped[0][0]}")

Wrote test_flipped.bmp
Original first row, first pixel: [128, 0, 0]
Original first row, last pixel: [128, 0, 229]
Flipped first row, first pixel: [128, 0, 229]


### 7.5 Blur Filter (Box Blur)

In [30]:
def blur(pixels: list) -> list:
    """
    Apply 3x3 box blur filter.
    
    Each pixel becomes the average of itself and its 8 neighbors.
    """
    height = len(pixels)
    width = len(pixels[0])
    
    # Create a copy to avoid modifying while reading
    new_pixels = [[None for _ in range(width)] for _ in range(height)]
    
    for y in range(height):
        for x in range(width):
            total_b, total_g, total_r = 0, 0, 0
            count = 0
            
            # Check all 9 positions in 3x3 grid
            for dy in [-1, 0, 1]:
                for dx in [-1, 0, 1]:
                    ny, nx = y + dy, x + dx
                    
                    # Check bounds
                    if 0 <= ny < height and 0 <= nx < width:
                        b, g, r = pixels[ny][nx]
                        total_b += b
                        total_g += g
                        total_r += r
                        count += 1
            
            # Average
            new_pixels[y][x] = [
                round(total_b / count),
                round(total_g / count),
                round(total_r / count),
            ]
    
    return new_pixels


# Test it
width, height, pixels, header = read_bmp("test_image.bmp")
blurred = blur(pixels)
write_bmp("test_blurred.bmp", width, height, blurred, header)

print(f"Original center pixel: {pixels[4][5]}")
print(f"Blurred center pixel: {blurred[4][5]}")

Wrote test_blurred.bmp
Original center pixel: [128, 127, 127]
Blurred center pixel: [128, 127, 127]


---

## üìã Quick Reference Summary

### String Prefixes
```python
"text"      # Regular string (escapes processed)
r"text"     # Raw string (backslashes literal)
b"text"     # Bytes literal (binary data)
f"text"     # F-string (variable interpolation)
```

### struct Module
```python
import struct

# Pack: Python ‚Üí bytes
struct.pack('<H', 1000)           # Unsigned short (2 bytes)
struct.pack('<i', -500)           # Signed int (4 bytes)
struct.pack('<BBB', 64, 128, 255) # Three bytes

# Unpack: bytes ‚Üí Python
value = struct.unpack('<H', data)[0]   # [0] for single value
width, height = struct.unpack('<ii', data)  # Multiple values
```

### Format Characters
| Char | Type | Size |
|------|------|------|
| `B` | unsigned byte | 1 |
| `b` | signed byte | 1 |
| `H` | unsigned short | 2 |
| `h` | signed short | 2 |
| `I` | unsigned int | 4 |
| `i` | signed int | 4 |

### BMP File Structure
```
BMP Header (14 bytes)  ‚Üí Signature, file size, pixel offset
DIB Header (40 bytes)  ‚Üí Width, height, bits/pixel
Pixel Data (variable)  ‚Üí BGR order, rows padded to 4 bytes, bottom-to-top
```

In [None]:
# Cleanup test files
import os

test_files = [
    "sample.bin",
    "test_image.bmp",
    "test_grayscale.bmp",
    "test_inverted.bmp",
    "test_sepia.bmp",
    "test_flipped.bmp",
    "test_blurred.bmp",
]

for f in test_files:
    if os.path.exists(f):
        os.remove(f)
        print(f"Removed {f}")

print("\nCleanup complete!")