Skip to content

Data Browser UI — visually inspect & spot-check the catalog#11

Merged
mrjunos merged 2 commits into
data-curationfrom
data-step3
May 26, 2026
Merged

Data Browser UI — visually inspect & spot-check the catalog#11
mrjunos merged 2 commits into
data-curationfrom
data-step3

Conversation

@mrjunos

@mrjunos mrjunos commented May 26, 2026

Copy link
Copy Markdown
Owner

Stacked on #10 (base data-curation). Merge order: #9#10 → this.

What

A new Streamlit Data page to browse the catalog and manually check the data — the visual tool requested before/alongside Step 3's labeling work.

  • Filters: source · split · primary defect · provenance · quality (good / not-good) · label-trust bucket.
  • Gallery: paginated thumbnails with class + ext_id captions; not-good beans flagged ⚠️.
  • Detail: per-bean — all image views, every defect (class / primary / label_source / trust), and the lot provenance (species, variety, process, farm, altitude, dates…).

Read queries live in almendra.db.queries (Streamlit-free → unit-tested). The page degrades gracefully if the catalog extra or data/catalog.db is missing.

Also fixes CI: installs --extra catalog so the catalog/curation/browse tests run (they were importorskip-skipped before).

Screenshot

Gallery on the real 1507-bean catalog (filters across the top, ⚠️ on duplicate beans, per-bean detail below):

(captured locally via the UI on the real catalog)

Verification

  • uv run pytest -m "not e2e" → 82 passed; ruff + format clean.
  • New tests/test_browse_queries.py: filters, pagination, bean detail.
  • Smoke test renders the page in ES + EN.

Next

Step 3 proper — new dataset adapters (coffee-adik ⭐, arabica-washed) + the multi-label model migration — comes in a follow-up branch stacked on this.

🤖 Generated with Claude Code

mrjunos and others added 2 commits May 26, 2026 10:23
New Streamlit "Data" page to browse the catalog and manually check the data:
- Filters: source, split, primary defect, provenance, quality (good/not-good),
  and label-trust bucket.
- Thumbnail gallery (paginated) with class + ext_id captions; not-good beans
  flagged ⚠️.
- Per-bean detail: all views, every defect (class/primary/label_source/trust),
  and the lot provenance (species, variety, process, farm, altitude, dates…).

Read queries live in `almendra.db.queries` (Streamlit-free, unit-tested). The
page degrades gracefully if the `catalog` extra or the DB file is absent.

Also: CI now installs `--extra catalog` so the catalog/curation/browse tests
actually run (they were importorskip-skipped before).

Tests: tests/test_browse_queries.py (filters, pagination, detail) + browse added
to the UI smoke set.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The E2E sandbox now builds its catalog from the fixture manifest (harness
`build_catalog` runs `almendra db migrate`), and the flow navigates to the new
Data page after Predict, asserting the browser renders with beans ("Data
browser" + "Showing N of M"). Exercises `db migrate` end-to-end too.

Passes in ~34s; recording covers the added step.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mrjunos mrjunos merged commit 0739dd5 into data-curation May 26, 2026
2 checks passed
@mrjunos mrjunos deleted the data-step3 branch May 26, 2026 15:36
mrjunos added a commit that referenced this pull request May 26, 2026
Reconcile stack: bring Curation (#10) + Data Browser (#11) to main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant