Preflight order in preprocess_image deviates from IEP-0004 normative order

## Summary

`iscc_sci.preprocess_image` applies `trim_border` **before** `remove_transparency`, which deviates from the normative image-normalization order in [IEP-0004](https://github.com/iscc/iscc-ieps/blob/main/ieps/iep-0004.md) (Content-Code Image) and from `iscc-sdk`'s `image_normalize`. The two libraries should agree on the preflight steps they have in common; today they don't. Proposing a one-function fix in `code_semantic_image.py` that aligns the order without requiring model retraining.

## Current behavior

`iscc_sci/code_semantic_image.py`, `preprocess_image` (lines ~127–138):

```python
def preprocess_image(image):
    with sci.metrics(name="Image preprocessing time {seconds:.4f} seconds"):
        image = ImageOps.exif_transpose(image)
        image = trim_border(image)                       # ← trim BEFORE fill
        image = image.resize((512, 512), Resampling.BILINEAR)
        image = remove_transparency(image)               # ← fill AFTER resize
        image = np.array(image, dtype=np.float32)
        image /= 255.0
        mean = np.array([0.5, 0.5, 0.5], dtype=np.float32)
        std  = np.array([0.5, 0.5, 0.5], dtype=np.float32)
        image = (image - mean) / std
        image = np.expand_dims(np.transpose(image, (2, 0, 1)), axis=0)
    return image.astype(np.float32)
```

Order: **`exif → trim → resize → fill → normalize`**.

## Expected behavior (per IEP-0004)

IEP-0004 §Processing documents the normative Image-Code preflight order:

1. EXIF Transpose
2. **Alpha Compositing** ("Add white background to image if it contains alpha transparency.")
3. **Border Trimming** ("Crop uniformly colored borders if applicable.")
4. Grayscale Conversion *(Content-Code only — not relevant here)*
5. Resize *(Content-Code 32×32, Semantic-Code 512×512)*

`iscc_sdk/image.py:image_normalize` implements steps 1–3 in that order:

```python
# iscc_sdk/iscc_sdk/image.py:47-56
if idk.sdk_opts.image_exif_transpose:
    img = image_exif_transpose(img)
if idk.sdk_opts.image_fill_transparency:
    img = image_fill_transparency(img)
if idk.sdk_opts.image_trim_border:
    img = image_trim_border(img)
```

`iscc-sci` should match steps 1–3.

## Why this matters

1. **Deterministic trim reference.** `trim_border` uses `img.getpixel((0,0))` as the reference color. For a fully-transparent corner pixel, PIL exposes whatever the encoder wrote into the RGB channels — that's spec-undefined; different PNG encoders write different RGB values under `alpha=0`. Running `remove_transparency` first normalizes transparent pixels to `(255, 255, 255)`, so the trim reference is a fixed, encoder-independent color. Two encoders that disagree on the corner pixel's hidden RGB values would today produce different Semantic-Codes for the same visible image.

2. **No alpha-edge resize halos.** Bilinear interpolation across an RGBA edge with straight (non-premultiplied) alpha pulls in arbitrary RGB values from under transparent pixels, producing visible color halos. Filling first means resize operates on opaque RGB and cannot produce halos.

3. **Alignment with the standard.** IEP-0004 is the normative spec for Image-Code preprocessing. `iscc-sdk` follows it; `iscc-sci` should too for the steps both pipelines share. A shared, IEP-0004-compliant prefix simplifies downstream tooling (`iscc-gen`, `iscc_sdk.code_iscc(experimental=True)`) that wants to run one preflight feeding both pipelines.

## Why retraining is **not** required

The ONNX model is robust to preprocessing-order changes by construction:

- The ISC21-derived training distribution uses standard heavy augmentation (crops, rotations, color jitter). `trim_border` and `remove_transparency` are inference-time normalizations layered on top — they were never part of the training distribution, regardless of order.
- For opaque inputs (the vast majority — photographs, JPEGs, opaque PNGs), the tensor at the model input is **bit-identical** before and after the swap.
- For inputs with **both** alpha and a uniform-colored border, the tensors differ slightly, but the differences fall well inside the model's invariance envelope. Embedding-space similarity behavior is unchanged on those inputs in practical use.

What does change: the exact embedding-bit output for the alpha+border subset of inputs. Per the project README, this kind of breakage is sanctioned pre-1.0:

> All releases with version numbers below v1.0.0 may break backward compatibility and produce incompatible Semantic Image-Codes.

## Proposed patch

```diff
 def preprocess_image(image):
     """Preprocess image for inference."""
     with sci.metrics(name="Image preprocessing time {seconds:.4f} seconds"):
         image = ImageOps.exif_transpose(image)
-        image = trim_border(image)
-        image = image.resize((512, 512), Resampling.BILINEAR)
         image = remove_transparency(image)
+        image = trim_border(image)
+        image = image.resize((512, 512), Resampling.BILINEAR)
         image = np.array(image, dtype=np.float32)
         image /= 255.0
         mean = np.array([0.5, 0.5, 0.5], dtype=np.float32)
         std = np.array([0.5, 0.5, 0.5], dtype=np.float32)
         image = (image - mean) / std
         image = np.expand_dims(np.transpose(image, (2, 0, 1)), axis=0)
     return image.astype(np.float32)
```

`remove_transparency` is already a no-op for `mode == "RGB"` inputs (early-return at the top of the function), so the new ordering is safe for images that don't carry alpha.

## Output-byte impact

| Input class | Tensor diff vs. current | Embedding diff | Code diff |
|---|---|---|---|
| Opaque (RGB, no alpha) | none | none | none |
| Alpha, no uniform border | resize quality (no halos) | small | possible — usually 0–2 bits |
| Alpha + uniform border | trim bbox + resize | small–moderate | possible — usually 0–4 bits |

Backward compatibility is not a constraint here — the project is pre-1.0 and explicitly allows incompatible Semantic-Codes between releases.

## Acceptance criteria

- `preprocess_image` matches IEP-0004 step order for steps 1–3 (`exif_transpose` → `remove_transparency` → `trim_border`).
- Existing tests pass; regenerate fixtures for any alpha+border cases.

## References

- IEP-0004 (normative): https://github.com/iscc/iscc-ieps/blob/main/ieps/iep-0004.md
- `iscc-sdk/iscc_sdk/image.py:37-67` (`image_normalize` — reference order)
- `iscc-sci/iscc_sci/code_semantic_image.py:123-151` (`preprocess_image` — current order)
- iscc-sci README pre-1.0 compatibility notice


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Preflight order in preprocess_image deviates from IEP-0004 normative order #1

Summary

Current behavior

Expected behavior (per IEP-0004)

Why this matters

Why retraining is not required

Proposed patch

Output-byte impact

Acceptance criteria

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Input class	Tensor diff vs. current	Embedding diff	Code diff
Opaque (RGB, no alpha)	none	none	none
Alpha, no uniform border	resize quality (no halos)	small	possible — usually 0–2 bits
Alpha + uniform border	trim bbox + resize	small–moderate	possible — usually 0–4 bits

Uh oh!

Preflight order in preprocess_image deviates from IEP-0004 normative order #1

Description

Summary

Current behavior

Expected behavior (per IEP-0004)

Why this matters

Why retraining is not required

Proposed patch

Output-byte impact

Acceptance criteria

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions