Skip to content

fix(indexing): gate image indexing on pixel dimensions, not file bytes#269

Merged
lstein merged 6 commits into
masterfrom
lstein/fix/min-image-dimensions
May 22, 2026
Merged

fix(indexing): gate image indexing on pixel dimensions, not file bytes#269
lstein merged 6 commits into
masterfrom
lstein/fix/min-image-dimensions

Conversation

@lstein
Copy link
Copy Markdown
Owner

@lstein lstein commented May 22, 2026

Summary

  • Replaces the silent 100 KB file-size filter in the indexer with a per-album pixel-dimension gate (default 256×256). The old byte-count heuristic was silently dropping ~25% of a real photo library — highly-compressed JPEGs, web images, phone shots — with no log line and no UI surface.
  • Surfaces the new min_image_dimension field in the Album Manager's Edit Album form ("Exclude thumbnails and other images below this size (px)"), placed between Image Folder(s) and Encoder. Round-trips through /update_album/ + /available_albums/.
  • Brightens the Add New Album panel background from #333#505050 so it no longer blends into the non-editing album cards behind it.
  • Documents the new gate (and how to tune it) in docs/user-guide/albums.md.

Why the dimension gate

A user-supplied folder of 100 images was producing an index of 75. Every missing file was under 100 KB on disk but well above 256×256 in pixels — i.e. real photos, not thumbnails. File bytes are a bad proxy for "is this a thumbnail" because compression and content both confound it; pixel dimensions are the question we actually mean.

The new gate reads only the image header via Image.open(...).size (a few-KB read), so the scan-phase overhead is small compared to CLIP encoding. A summary log line now reports how many images were skipped per scan, so future drops are visible instead of silent.

Behind the UI

  • New per-album field Album.min_image_dimension: int = 256 (Pydantic ge=1).
  • Embeddings.min_image_dimension mirrors it; get_image_files_from_directory gates each candidate via a _passes_dimension_gate helper.
  • Embeddings.minimum_image_size (the old ClassVar) is gone. Tests that monkeypatched it down to 10 KB are simplified — bundled fixtures are 384×512+, well above 256.
  • create_album helper, /update_album/ body, and /available_albums/ listing all thread the new field.
  • Update Index applies a changed threshold both ways (adds previously-rejected images, removes now-ineligible ones).

Test plan

  • ruff check photomap tests clean
  • npm run lint + npm run format:check clean
  • pytest tests/backend281 passed (includes new test_min_image_dimension_filters_small_images + test_min_image_dimension_round_trips)
  • npm test293 passed
  • mkdocs build --strict clean
  • Verified rendered HTML via FastAPI TestClient: new input present in Edit Album template, label exact, ordering image-paths → min-dimension → encoder
  • Verified on real user library /home/lstein/Downloads: 99/99 images now indexed (was 75/99 before)
  • Visual smoke test in browser — Open Album Management, click Edit on an album; new field should appear with 256 prefilled. The Add Album panel should be visibly brighter than the surrounding cards.

🤖 Generated with Claude Code

lstein and others added 5 commits May 22, 2026 14:49
The previous indexer dropped any file under 100 KB on disk as a presumed
thumbnail. On a real user library this silently excluded ~25% of normal
photos (highly-compressed JPEGs, downloaded web images, phone shots) with
no log line — the only symptom was a smaller-than-expected index.

Replace the byte-size gate with a pixel-dimension gate:
- New per-album ``min_image_dimension`` in config.yml (default 256).
  An image is kept only if width AND height are >= the threshold.
- ``Embeddings`` reads each candidate file's header via ``Image.open(...).size``;
  header reads are a few KB so the scan overhead is small relative to
  CLIP encoding time.
- A summary log line now reports how many images were skipped per scan,
  so silent drops are visible.

Tests: ``build_index`` no longer needs the ``minimum_image_size``
monkeypatch (bundled fixtures are all 384x512+, well above 256). Removed
the now-pointless monkeypatches from the search tests too. Added a
focused boundary test for the new gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a "Skipping Small Images" subsection under Indexing Albums covering:
- What the filter does and the 256-pixel default
- How to change ``min_image_dimension`` per album in config.yaml
- Practical reference values
- How an Update Index pass applies the new threshold both ways
  (adding now-eligible images, removing now-ineligible ones)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a number input ("Exclude thumbnails and other images below this
size (px)") to the Edit Album form, sitting between Image Folder(s) and
Encoder. Defaults to 256, the backend Album default. The value is
loaded from the album record on edit-form open and posted back through
/update_album/ on save. Album-level pixel gate now reachable without
hand-editing config.yaml.

The Album listing serializer (``/available_albums/``) and the
``update_album`` endpoint were not yet round-tripping the field; both
now thread it through. ``create_album`` helper gains the optional kwarg
in the same shape as the other per-album controls.

Also brightens the "Add New Album" panel's background from #333 to
#505050 to match ``.album-card.editing`` — the old colour was identical
to the non-edited album cards behind it, making the Add panel hard to
distinguish from the surrounding list.

New test exercises the full round-trip (default surfaces as 256,
explicit updates persist, ``ge=1`` validator rejects zero without
clobbering prior state).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pixel-dimension input was inheriting the .form-group full-width
default and stretching across the row. Four digits + spinner controls
fit comfortably in 6em; override the width just for that input.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lstein lstein enabled auto-merge (squash) May 22, 2026 22:37
@lstein lstein merged commit 9443cf2 into master May 22, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant