fix(indexing): gate image indexing on pixel dimensions, not file bytes#269
Merged
Conversation
The previous indexer dropped any file under 100 KB on disk as a presumed thumbnail. On a real user library this silently excluded ~25% of normal photos (highly-compressed JPEGs, downloaded web images, phone shots) with no log line — the only symptom was a smaller-than-expected index. Replace the byte-size gate with a pixel-dimension gate: - New per-album ``min_image_dimension`` in config.yml (default 256). An image is kept only if width AND height are >= the threshold. - ``Embeddings`` reads each candidate file's header via ``Image.open(...).size``; header reads are a few KB so the scan overhead is small relative to CLIP encoding time. - A summary log line now reports how many images were skipped per scan, so silent drops are visible. Tests: ``build_index`` no longer needs the ``minimum_image_size`` monkeypatch (bundled fixtures are all 384x512+, well above 256). Removed the now-pointless monkeypatches from the search tests too. Added a focused boundary test for the new gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a "Skipping Small Images" subsection under Indexing Albums covering: - What the filter does and the 256-pixel default - How to change ``min_image_dimension`` per album in config.yaml - Practical reference values - How an Update Index pass applies the new threshold both ways (adding now-eligible images, removing now-ineligible ones) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a number input ("Exclude thumbnails and other images below this
size (px)") to the Edit Album form, sitting between Image Folder(s) and
Encoder. Defaults to 256, the backend Album default. The value is
loaded from the album record on edit-form open and posted back through
/update_album/ on save. Album-level pixel gate now reachable without
hand-editing config.yaml.
The Album listing serializer (``/available_albums/``) and the
``update_album`` endpoint were not yet round-tripping the field; both
now thread it through. ``create_album`` helper gains the optional kwarg
in the same shape as the other per-album controls.
Also brightens the "Add New Album" panel's background from #333 to
#505050 to match ``.album-card.editing`` — the old colour was identical
to the non-edited album cards behind it, making the Add panel hard to
distinguish from the surrounding list.
New test exercises the full round-trip (default surfaces as 256,
explicit updates persist, ``ge=1`` validator rejects zero without
clobbering prior state).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pixel-dimension input was inheriting the .form-group full-width default and stretching across the row. Four digits + spinner controls fit comfortably in 6em; override the width just for that input. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
min_image_dimensionfield in the Album Manager's Edit Album form ("Exclude thumbnails and other images below this size (px)"), placed between Image Folder(s) and Encoder. Round-trips through/update_album/+/available_albums/.#333→#505050so it no longer blends into the non-editing album cards behind it.docs/user-guide/albums.md.Why the dimension gate
A user-supplied folder of 100 images was producing an index of 75. Every missing file was under 100 KB on disk but well above 256×256 in pixels — i.e. real photos, not thumbnails. File bytes are a bad proxy for "is this a thumbnail" because compression and content both confound it; pixel dimensions are the question we actually mean.
The new gate reads only the image header via
Image.open(...).size(a few-KB read), so the scan-phase overhead is small compared to CLIP encoding. A summary log line now reports how many images were skipped per scan, so future drops are visible instead of silent.Behind the UI
Album.min_image_dimension: int = 256(Pydanticge=1).Embeddings.min_image_dimensionmirrors it;get_image_files_from_directorygates each candidate via a_passes_dimension_gatehelper.Embeddings.minimum_image_size(the oldClassVar) is gone. Tests that monkeypatched it down to 10 KB are simplified — bundled fixtures are 384×512+, well above 256.create_albumhelper,/update_album/body, and/available_albums/listing all thread the new field.Test plan
ruff check photomap testscleannpm run lint+npm run format:checkcleanpytest tests/backend— 281 passed (includes newtest_min_image_dimension_filters_small_images+test_min_image_dimension_round_trips)npm test— 293 passedmkdocs build --strictcleanTestClient: new input present in Edit Album template, label exact, ordering image-paths → min-dimension → encoder/home/lstein/Downloads: 99/99 images now indexed (was 75/99 before)🤖 Generated with Claude Code