# Extract mean RGB features (Sky vs Ground)

This notebook scans a folder of webcam images, crops fixed top/bottom ribbons, splits the remaining image into **sky** and **ground**, and exports mean RGB features to a CSV file.


## Imports

Libraries used:
- `pathlib`: safer filesystem paths
- `PIL` + `numpy`: image loading and pixel math
- `re` + `datetime`: parse timestamps from filenames
- `csv`: write tabular output


In [None]:
import re
from pathlib import Path
from PIL import Image
import numpy as np
import csv
from datetime import datetime

## Helper functions

- Load each image as an RGB NumPy array of shape `(H, W, 3)`.
- Crop by row indices to remove overlays/borders.
- Compute mean RGB values for any region.

In [None]:
def load_rgb_array(p: Path) -> np.ndarray:
    """Load image and return an (H, W, 3) RGB array."""
    img = Image.open(p).convert("RGB")
    return np.array(img)


def crop_by_rows(arr: np.ndarray, y0: int, y1: int) -> np.ndarray:
    """Crop vertically: keep rows [y0:y1), all columns, all channels."""
    return arr[y0:y1, :, :]


def mean_rgb(arr: np.ndarray):
    """
    Mean of R,G,B channels for a region.
    Returns NaNs for empty regions (prevents errors and preserves row count logic).
    """
    if arr.size == 0:
        return (np.nan, np.nan, np.nan)

    r = arr[:, :, 0].mean()
    g = arr[:, :, 1].mean()
    b = arr[:, :, 2].mean()
    return (float(r), float(g), float(b))

## Parameters and paths

### I/O
- Input folder: `Images/` (scanned recursively)
- Output file: `Processing Outputs/image_feature.csv`

### Cropping logic (row indices)
- `Y_START`: remove top ribbon (overlay)
- `Y_END`: remove bottom ribbon
- `SPLIT_Y`: split cropped image into:
  - rows `[0:SPLIT_Y]` = sky
  - rows `[SPLIT_Y:]` = ground

### Filenames
Only files named like `YYYYMMDD-HHMM.jpg` are processed.
Example: `20250103-0915.jpg`

In [None]:
root = Path("Images")
output_csv = Path("Processing Outputs/image_feature.csv")

Y_START = 30
Y_END = 960
SPLIT_Y = 430

IMG_EXTS = {".jpg", ".jpeg", ".JPG", ".JPEG"}

# Enforce: 8 digits, hyphen, 4 digits (e.g., 20250103-0915)
name_re = re.compile(r"^(?P<date>\d{8})-(?P<hour>\d{4})$")

## Feature extraction loop

For each valid image:
1. Validate extension and filename pattern.
2. Parse `date` and `hour` from the filename.
3. Load image â†’ check it is large enough for cropping.
4. Crop rows `[Y_START:Y_END]`.
5. Split into sky/ground by `SPLIT_Y`.
6. Compute mean RGB for each region.
7. Append one output row (one row per image).

In [None]:
rows = []

for p in root.rglob("*"):
    # Filter to expected image extensions only
    if not (p.is_file() and p.suffix in IMG_EXTS):
        continue

    # Validate naming convention (timestamp in stem)
    stem = p.stem
    if not name_re.match(stem):
        continue

    # Parse datetime from filename
    dt = datetime.strptime(stem, "%Y%m%d-%H%M")
    date_csv = dt.strftime("%Y-%m-%d")
    time_csv = dt.strftime("%H:%M")

    # Load image and validate dimensions/channels
    arr = load_rgb_array(p)
    H, W, C = arr.shape
    if H < Y_END or W < 1 or C != 3:
        # Skip images that cannot be safely cropped or are not RGB-like
        continue

    # Crop out ribbons
    cropped = crop_by_rows(arr, Y_START, Y_END)

    # Split into sky and ground regions
    sky = cropped[0:SPLIT_Y, :, :]
    ground = cropped[SPLIT_Y:, :, :]

    # Compute features
    r_s_m, g_s_m, b_s_m = mean_rgb(sky)
    r_g_m, g_g_m, b_g_m = mean_rgb(ground)

    # Store row
    rows.append({
        "date": date_csv,
        "hour": time_csv,
        "R_S_M": r_s_m,
        "G_S_M": g_s_m,
        "B_S_M": b_s_m,
        "R_G_M": r_g_m,
        "G_G_M": g_g_m,
        "B_G_M": b_g_m,
    })


## Export to CSV

- Creates the output folder if missing.
- Writes a header + one row per processed image.

In [None]:
output_csv.parent.mkdir(parents=True, exist_ok=True)

with output_csv.open("w", newline="") as f:
    writer = csv.DictWriter(
        f,
        fieldnames=["date", "hour", "R_S_M", "G_S_M", "B_S_M", "R_G_M", "G_G_M", "B_G_M"]
    )
    writer.writeheader()
    writer.writerows(rows)

print(f"Done. Wrote {len(rows)} rows to {output_csv}")