Data Browser UI — visually inspect & spot-check the catalog by mrjunos · Pull Request #11 · mrjunos/almendra

mrjunos · 2026-05-26T15:23:30Z

Stacked on #10 (base data-curation). Merge order: #9 → #10 → this.

What

A new Streamlit Data page to browse the catalog and manually check the data — the visual tool requested before/alongside Step 3's labeling work.

Filters: source · split · primary defect · provenance · quality (good / not-good) · label-trust bucket.
Gallery: paginated thumbnails with class + ext_id captions; not-good beans flagged ⚠️.
Detail: per-bean — all image views, every defect (class / primary / label_source / trust), and the lot provenance (species, variety, process, farm, altitude, dates…).

Read queries live in almendra.db.queries (Streamlit-free → unit-tested). The page degrades gracefully if the catalog extra or data/catalog.db is missing.

Also fixes CI: installs --extra catalog so the catalog/curation/browse tests run (they were importorskip-skipped before).

Screenshot

Gallery on the real 1507-bean catalog (filters across the top, ⚠️ on duplicate beans, per-bean detail below):

(captured locally via the UI on the real catalog)

Verification

uv run pytest -m "not e2e" → 82 passed; ruff + format clean.
New tests/test_browse_queries.py: filters, pagination, bean detail.
Smoke test renders the page in ES + EN.

New Streamlit "Data" page to browse the catalog and manually check the data: - Filters: source, split, primary defect, provenance, quality (good/not-good), and label-trust bucket. - Thumbnail gallery (paginated) with class + ext_id captions; not-good beans flagged ⚠️. - Per-bean detail: all views, every defect (class/primary/label_source/trust), and the lot provenance (species, variety, process, farm, altitude, dates…). Read queries live in `almendra.db.queries` (Streamlit-free, unit-tested). The page degrades gracefully if the `catalog` extra or the DB file is absent. Also: CI now installs `--extra catalog` so the catalog/curation/browse tests actually run (they were importorskip-skipped before). Tests: tests/test_browse_queries.py (filters, pagination, detail) + browse added to the UI smoke set. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The E2E sandbox now builds its catalog from the fixture manifest (harness `build_catalog` runs `almendra db migrate`), and the flow navigates to the new Data page after Predict, asserting the browser renders with beans ("Data browser" + "Showing N of M"). Exercises `db migrate` end-to-end too. Passes in ~34s; recording covers the added step. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Reconcile stack: bring Curation (#10) + Data Browser (#11) to main

mrjunos and others added 2 commits May 26, 2026 10:23

mrjunos merged commit 0739dd5 into data-curation May 26, 2026
2 checks passed

mrjunos deleted the data-step3 branch May 26, 2026 15:36

mrjunos mentioned this pull request May 26, 2026

Reconcile stack: bring Curation (#10) + Data Browser (#11) to main #12

Merged

mrjunos added a commit that referenced this pull request May 26, 2026

Merge pull request #12 from mrjunos/data-curation

0f84542

Reconcile stack: bring Curation (#10) + Data Browser (#11) to main

mrjunos mentioned this pull request May 28, 2026

Step 3A: classification ingester + coffee-adik & arabica-washed adapters #15

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Browser UI — visually inspect & spot-check the catalog#11

Data Browser UI — visually inspect & spot-check the catalog#11
mrjunos merged 2 commits into
data-curationfrom
data-step3

mrjunos commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mrjunos commented May 26, 2026

What

Screenshot

Verification

Next

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant