Add Sidemantic webapp builder skill#143
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 934bc192f5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| def _execute_sample(layer: Any, sql: str, *, sample_rows: int) -> dict[str, Any]: | ||
| result = layer.adapter.execute(sql) | ||
| columns = [desc[0] for desc in result.description] | ||
| rows = result.fetchmany(sample_rows) |
There was a problem hiding this comment.
Use adapter-compatible fetching for samples
When inspect_layer.py --execute is used with --connection for Postgres, BigQuery, Snowflake, Databricks, Spark, ClickHouse, or ADBC, this call raises AttributeError because those result wrappers expose fetchone()/fetchall() but not fetchmany() (I checked sidemantic/db/*.py). That makes every executed query look like an execution failure even though the SQL may be valid; only the raw DuckDB cursor supports this method. Fetch up to sample_rows via the adapter-neutral API or add fetchmany() to the result wrappers.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 645b879b79
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const response = await fetch("data/app-spec.json"); | ||
| if (!response.ok) throw new Error(`Failed to load app spec: ${response.status}`); | ||
| const spec = await response.json(); | ||
| const candidate = spec.app_candidates?.[0]; |
There was a problem hiding this comment.
Respect --model when choosing app candidate
The scaffold command accepts --model and validates that specific candidate in scaffold_static_app.py, but the generated app always loads spec.app_candidates?.[0]. When the chosen model is not the first entry, the dashboard renders metrics/leaderboard for the wrong model, so --model silently produces incorrect output instead of the requested candidate.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e3de7b253c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return report | ||
|
|
||
| spec = _load_json(spec_path) | ||
| candidate = _first_candidate(spec) |
There was a problem hiding this comment.
Select the scaffolded model when verifying app specs
Use the scaffold’s selected model instead of always validating app_candidates[0]: verify_static_app.py currently ignores index.html's data-model and computes checks against the first candidate only, so --model dashboards can be marked failed (or passed) based on a different model’s query results. Fresh evidence after the earlier app.js fix: this verifier still calls _first_candidate(spec) unconditionally, which I reproduced by making the first candidate unexecuted and the selected candidate executed.
Useful? React with 👍 / 👎.
| const before = await snapshot(page); | ||
| await reset.first().click({ timeout }); | ||
| const after = await snapshot(page); | ||
| await expectChange("Clicking reset", before, after, ["metricText", "previewText", "filterCount", "selectedMetricCount"]); |
There was a problem hiding this comment.
Assert reset against selected metric identity, not count
The reset smoke check can false-fail because it only looks for changes in selectedMetricCount; switching from one selected metric to another keeps that count at 1, so a valid reset that restores the default metric may be reported as no-op when text/filter counts stay the same. This affects dashboards where reset primarily changes which metric is selected rather than row counts.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e04d47fa8f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| data_dir = output_dir / "data" | ||
| data_dir.mkdir(parents=True, exist_ok=True) | ||
|
|
||
| shutil.copyfile(spec_path, data_dir / "app-spec.json") |
There was a problem hiding this comment.
Redact connection URL before copying app spec
The scaffold copies the inspector JSON verbatim into a public data/app-spec.json, but inspect_layer.py includes the raw connection string in that payload. If users run inspect_layer.py --connection ... (for Postgres/Snowflake/etc.), credentials or tokens embedded in the URL are published to the browser and can be exfiltrated. Sanitize/remove sensitive fields (at least connection) before writing the static app artifact.
Useful? React with 👍 / 👎.
| checks["leaderboard_executed"] = bool(leaderboard.get("result", {}).get("columns")) and bool( | ||
| leaderboard.get("result", {}).get("sample_rows") |
There was a problem hiding this comment.
Accept empty sample_rows as a valid executed query
These checks treat execution as failed whenever sample_rows is empty, even if the query executed successfully and returned zero rows (e.g., an empty source table or restrictive filters). That creates false negatives in verify_static_app.py, causing valid dashboards to fail verification solely due to data cardinality. The executed check should key off result presence/schema (and optionally sample_row_count), not truthiness of sample_rows.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c988cea8c1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| preferred_leaderboard_dimensions or all_leaderboard_dimensions | ||
| ) | ||
| if leaderboard_dimension: | ||
| requested = [ | ||
| dim |
There was a problem hiding this comment.
Honor explicit --leaderboard-dimension before id filtering
When --leaderboard-dimension is provided, matching is done against candidate_leaderboard_dimensions, but that list is built from preferred_leaderboard_dimensions first, which excludes identifier-like fields. If a model has any preferred dimensions, an explicitly requested id-like dimension (for example user_id) is never considered and the flag is silently ignored, so the generated default leaderboard can target a different dimension than the one the user requested.
Useful? React with 👍 / 👎.
|
|
||
| try { | ||
| await page.goto(args.url, { waitUntil: "load", timeout: args.timeout }); | ||
| await page.locator('[data-testid="metric-totals"]').waitFor({ timeout: args.timeout }); |
There was a problem hiding this comment.
Wait for rendered metric cards before interaction checks
The smoke test waits only for the metric-totals container, which already exists in the static template before app-spec.json is fetched and rendered. On slower loads, the script can snapshot an uninitialized dashboard, skip filter/leaderboard/metric checks because no interactive elements are present yet, and still return ok: true, so broken interaction wiring can pass verification nondeterministically.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 20acbc6495
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const filter = await clickFirstFilterRemove(page, args.timeout); | ||
| const leaderboard = await clickLeaderboardRow(page, args.timeout); | ||
| const metric = await clickMetricCard(page, args.timeout); | ||
| const reset = await clickReset(page, args.timeout); |
There was a problem hiding this comment.
Fail verification when required interactions are skipped
The interaction smoke test records filter, leaderboard, metric, and reset results but never fails when those helpers return { skipped: true } (for example, no filter pills, no clickable leaderboard rows, fewer than two metric cards, or no reset control). In those cases the script still prints ok: true, so non-interactive dashboards can pass verification even though the test is supposed to prove real interaction wiring.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36d62d1d32
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| async function clickMetricCard(page, timeout) { | ||
| const metrics = page.locator('[data-testid="metric-totals"] button[data-metric]'); | ||
| if ((await metrics.count()) < 2) return { skipped: true, reason: "fewer than two interactive metric cards" }; |
There was a problem hiding this comment.
Allow single-metric dashboards in interaction smoke test
Do not hard-fail metric interaction when a dashboard only has one metric card. clickMetricCard returns { skipped: true } whenever fewer than two metric buttons exist, and main() then unconditionally treats skipped interactions as failures via assertRequiredInteraction, so valid single-metric dashboards fail verification even though there is no alternative metric to select. This creates false negatives for models where the inspected candidate legitimately exposes just one metric.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2e5d7e867c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await expectChange("Clicking a leaderboard row", before, after, [ | ||
| "metricText", | ||
| "previewText", | ||
| "filterCount", | ||
| "selectedRowCount", |
There was a problem hiding this comment.
Require data/filter changes on leaderboard row click
Tighten the leaderboard interaction assertion so it cannot pass on selection styling alone. In clickLeaderboardRow, expectChange allows selectedRowCount as a sufficient change, so an implementation that only toggles data-selected (without applying filters or recomputing data) can satisfy this step and still contribute to an overall passing smoke test when other controls happen to change state. This weakens the script’s stated goal of verifying real interaction wiring for leaderboard filters.
Useful? React with 👍 / 👎.
| const max = Math.max(...data.map((row) => Number(row[valueKey]) || 0), 1); | ||
| const slot = (width - padX * 2) / Math.max(data.length, 1); |
There was a problem hiding this comment.
Handle negative values in static column chart scaling
The current scaling assumes non-negative metrics by forcing max >= 1. When chart data contains negative values (for example net-loss or delta metrics), downstream height calculation becomes negative, producing invalid/empty SVG bars and a misleading chart. Use a zero baseline or min/max domain that supports negatives instead of clamping max to 1.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 39075b71f2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const rows = page.locator('[data-testid="leaderboard-rows"] button[data-dimension]'); | ||
| if ((await rows.count()) === 0) return { skipped: true, reason: "no interactive leaderboard rows" }; | ||
| const before = await snapshot(page); | ||
| await rows.first().click({ timeout }); |
There was a problem hiding this comment.
Click a non-selected leaderboard row during smoke test
clickLeaderboardRow always clicks rows.first(), so dashboards that initialize with the first leaderboard row already selected will show no state/data delta and fail expectChange even when leaderboard filtering is wired correctly. This creates false negatives for valid apps with default row selection; the verifier should choose a row that is not currently selected (or otherwise different from the active selection) before asserting changes.
Useful? React with 👍 / 👎.
| } | ||
| if ((await metrics.count()) < 2) return { skipped: true, reason: "fewer than two interactive metric cards" }; | ||
| const before = await snapshot(page); | ||
| await metrics.nth(1).click({ timeout }); |
There was a problem hiding this comment.
Click a non-selected metric card during smoke test
clickMetricCard always clicks metrics.nth(1), which can be a no-op when the second metric is already selected (for example, when apps restore prior selection or use a custom default). In that case expectChange fails even though metric switching is functional, causing false verification failures. Select a metric card that is not currently marked data-selected="true" instead of relying on a fixed index.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8f2481feb6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await rows.nth(rowIndex).click({ timeout }); | ||
| const after = await snapshot(page); |
There was a problem hiding this comment.
Wait for async dashboard updates before asserting changes
The interaction smoke test snapshots state immediately after each click (for example in clickLeaderboardRow), so dashboards that update via async fetch/render can be marked failed even when interactions are wired correctly. In any app where filter/metric changes are applied after a promise resolves, after will often match before and expectChange throws a false negative. Add a post-click wait/poll for one of the tracked fields to change (or network/DOM idle) before taking the after snapshot.
Useful? React with 👍 / 👎.
| const max = Math.max(...rows.map((row) => Number(row[metricKey]) || 0), 1); | ||
|
|
There was a problem hiding this comment.
Support negative metrics in leaderboard bar scaling
Leaderboard bar width is computed as metricValue / max with max clamped to at least 1, which breaks when metric values are negative (e.g., net-loss deltas): widths become negative or otherwise invalid, so bars disappear or render misleadingly. This affects any dashboard ranking on a metric that can drop below zero; use a min/max domain (or absolute scaling with a zero baseline) instead of assuming non-negative values.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3c8efe1fa8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| renderMetricCards(totalsEl, queries.metric_totals); | ||
| renderLeaderboard(leaderboardEl, queries.dimension_leaderboard, { | ||
| titleEl: leaderboardTitleEl, | ||
| subtitleEl: leaderboardSubtitleEl, | ||
| }); |
There was a problem hiding this comment.
Generate interactive leaderboard wiring in scaffolded app
The scaffolded app.js always renders dimension_leaderboard in non-interactive mode, so rows are emitted as static <div>s and no selection callback is wired. In verify_static_interactions.mjs, clickLeaderboardRow() requires button[data-dimension] rows and is treated as a required interaction, so every scaffolded dashboard will fail the smoke test before any real interaction checks can run. This breaks the documented verify workflow for generated static apps unless users hand-edit the scaffold first.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abe590c194
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| renderMetricCards(totalsEl, queries.metric_totals, { | ||
| selectedMetric: state.selectedMetric, |
There was a problem hiding this comment.
Recompute metric totals when filters change
This render path always feeds renderMetricCards the original metric_totals query, so after a user selects or removes a filter, the active filter state is reflected in pills/preview but not in the headline metrics. In scaffolded dashboards, that produces inconsistent analytics (filtered detail rows with unfiltered KPI totals), which is misleading for any workflow that relies on the metric cards as the primary numbers.
Useful? React with 👍 / 👎.
|
|
||
| leaderboard_dimension = (leaderboard.get("dimensions") or [""])[0] | ||
| dimension_type = _dimension_type(spec, model_name, leaderboard_dimension) | ||
| checks["leaderboard_non_id"] = _is_non_id_dimension(spec, model_name, leaderboard_dimension) |
There was a problem hiding this comment.
Allow explicit id-like leaderboard dimensions in verification
The verifier hard-fails any leaderboard dimension that looks identifier-like, but the inspector intentionally supports explicit --leaderboard-dimension overrides (including id-like fields) when users request them. As a result, a valid app generated from an explicit override can fail verification solely because of this unconditional check, creating a false negative in the documented inspect→scaffold→verify workflow.
Useful? React with 👍 / 👎.
Adds a Sidemantic webapp builder skill with a component-first workflow for building analytics apps from semantic models.
The skill includes copyable static and React/Tailwind components, semantic model inspection and app-spec generation, a static dashboard scaffold, static verification, and a Playwright interaction smoke test so dashboards must prove real filter, leaderboard, metric, reset, and chart-bounds behavior.
Validated locally with the repository lint/format/test suite and the bundled static app and interaction verifiers.