feat: map field opt-in, ARRAY JOIN SQL fixes, field selection continuity by acmeguy · Pull Request #35 · smartdataHQ/synmetrix

acmeguy · 2026-03-31T08:58:52Z

Summary

Map-expanded, nested (ARRAY JOIN), and AI-generated fields default to unchecked (opt-in) in change preview
ARRAY JOIN cube SQL uses explicit SELECT (no SELECT *) to prevent Array/scalar column ambiguity in ClickHouse subqueries
After field exclusion, ARRAY JOIN SQL is pruned to only include surviving columns (performance)
Non-ARRAY-JOINed nested groups (e.g. location.*) restored with FILTER_PARAMS lookup-index service
Removed paired filtered count measures (count_dimensions_* etc) and pre-aggregation granularity (Cube.js v1.6)
Skip LLM toggle, required fields (rewrite rules + filters), AI metrics empty-selection bug fix

Changes

File	Change
`smartGenerate.js`	`skip_llm` param, `required_fields` in response, ARRAY JOIN SQL pruning after exclusion, summary recount, nested column preservation in `selected_columns` filter
`cubeBuilder.js`	Explicit SELECT builder, non-AJ nested group support, AJ group column exclusion, paired counts removed, `granularity` removed
`diffModels.js`	Source tagging (`map`, `nested`, `ai`) on diff field entries
`fieldProcessors.js`	Backtick-quote dotted names in `generateSqlExpression`
`yamlGenerator.js`	Remove `granularity`/`partition_granularity` from pre-agg JS output

Test plan

Fresh generation on semantic_events with commerce.products ARRAY JOIN + entry_type = Line Item filter
Verify only selected nested fields appear in final model SQL
Verify location.* fields appear as FILTER_PARAMS dimensions
Verify map fields default unchecked, regular columns default checked
Verify "Skip LLM" toggle disables AI enrichment and advisory passes
Verify empty AI metrics selection produces model with no AI metrics
Run query against generated model — no Array comparison errors

🤖 Generated with Claude Code

Adds a new POST endpoint that detects nested (GROUPED) column structures in a ClickHouse table and returns discriminator columns with their distinct values, enabling the frontend to show filter options in the Smart Generate dialog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ion, cleanup - Use cubejs.options.driverFactory({ securityContext }) instead of cubejs.driverFactory() - Add SAFE_IDENTIFIER regex validation on schema/table params to prevent SQL injection - Add driver.release() cleanup in catch block - Use { code, message } error response shape matching other routes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… cube names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Restore AS alias clause in legacy ARRAY JOIN path SQL with partition WHERE - Use ClickHouse-standard doubled single quotes ('') instead of backslash escaping - Remove redundant template literal wrapping in arrayJoinGroups map - Add warning when groupColumns is empty but arrayJoinGroups were requested Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Insert LLM polishing step after AI enrichment and before final JS code generation. The polisher rewrites cube definitions per modeling principles while preserving original SQL. Polish results are included in all response payloads (dry-run, no-changes, and apply). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ndpoint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…selected Without this, profiling ran against the base table and reported empty columns for nested array sub-columns. Now the profiler uses LEFT ARRAY JOIN so column stats reflect the expanded array-joined rows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ClickHouse Nested columns (stored as parallel arrays with dotted names) require enumerating each sub-column in the ARRAY JOIN clause: ARRAY JOIN `parent.child1` AS child1_alias, `parent.child2` AS child2_alias Previously used `ARRAY JOIN parent` which is invalid for this column type. Fixes both profiler (for accurate column stats on expanded rows) and cubeBuilder (for correct cube SQL generation). Non-array-join profiling path is unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace dots with underscores in the full column name (e.g. commerce.products.entry_type → commerce_products_entry_type) for both the ARRAY JOIN alias and the nested WHERE filter clause. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The frontend sends nestedFilters in the profile-table POST body but the route wasn't extracting or passing them to the profiler function. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Profiler: filter ARRAY JOIN to columns where rawType starts with Array( Scalar dotted columns (e.g. commerce.details Nullable(String)) excluded - Profiler + CubeBuilder: use full column name with dots→underscores as alias (e.g. commerce.products.entry_type → commerce_products_entry_type) - CubeBuilder: dimension/measure SQL uses the aliased column name - WHERE clauses use the aliased names consistently Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Query rewrite rules (e.g. partition scoping by team properties) are now loaded and translated to raw SQL filters before profiling. This ensures the profiler respects the same row-level access controls as the Cube.js query layer. Applied in both profileTable and smartGenerate routes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…alidate After the LLM returns polished cubes, generates JS and runs validateModelSyntax. If validation fails, sends errors back to the LLM for correction, up to 2 cycles. Also mounts first-principles path and checks multiple principle file locations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Zod schemas are now built inside an async getSchemas() function that imports zod dynamically, avoiding the undefined 'z' at module load time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…for Zod 4 compat zodResponseFormat fails with z.any() as a record value type in Zod 4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… compat OpenAI structured output requires every field to have an explicit type. Replaced z.any().nullable() for rollingWindow, timeShift, refresh_key, and meta with fully typed schemas. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Preview shows the raw generated model for fast feedback. Polishing runs only when the user clicks Apply Changes, avoiding timeouts during preview. Also increased polisher timeout to 180s for large models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Plan 1: Single-line fix for lcFrom missing arrayJoinClause (4 broken queries) Plan 2: 6-task plan for principle-compliant cubeBuilder heuristics (titles, meta, paired counts, format, public:false, drill members, pre-aggregations) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ries Single-line fix: the lcFrom variable (used by 4 downstream queries for Map numeric stats, Map string stats, and LC value probe) was missing the arrayJoinClause. All nested filter WHERE conditions referenced aliased column names that only exist after ARRAY JOIN. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- titleFromName: snake_case → Title Case on all fields and cubes - Partition-first dimension ordering - Complete meta block: grain, grain_description, time_dimension, time_zone, refresh_cadence - Paired filtered counts for LC dimensions (max 10 values) - Drill members on primary count measure - Format inference: currency/percent by column name pattern - public: false on plumbing fields (GIDs, write_key, etc.) - Default pre-aggregations: daily + monthly rollups with ClickHouse indexes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…c, drill_members in yamlGenerator Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

6-task plan: fix Hasura timeout, create modelAdvisor with 4 focused micro-prompts (descriptions, segments, metrics, pre-aggregations), integrate into pipeline, update frontend, delete old polisher, full end-to-end testing including Cube.js compiler validation and Explore page query verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…r debug logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…enerate pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rays Cube.js expects pre_aggregations as named keys with indexes as nested named objects. The yamlGenerator was using JSON.stringify which produced arrays with 'name' fields — invalid Cube.js syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previously buildCubes always emitted the raw base-table cube AND the array-joined cube. When the user selects an array join with filters, only the filtered array-joined cube should be produced — one cube, one file, one intent. The raw cube is still built internally (for field processing and as a base for the array join cube's inherited dimensions) but is not emitted. All heuristics (partition-first, grain/meta, drill members, format inference, public:false, pre-aggregations) are now applied to the array-joined cube when it's the sole output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When a new model is merged with an existing one, FILTER_PARAMS expressions from the old model may reference the previous cube name. This replaces all FILTER_PARAMS.old_cube_name references with the actual cube name from the current generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…models When nestedFilters are active: 1. Force mergeStrategy='replace' — FILTER_PARAMS from old model are incompatible with ARRAY JOIN (indexOf on scalar columns) 2. Use the cube name for the file name — ensures file name matches cube name for Cube.js resolution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1. SQL now uses newlines + indentation for readability in model editor 2. Removed count_distinct_approx from pre-agg filters and advisor schema — not supported by ClickHouse driver Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

FILTER_PARAMS dimensions use indexOf on array columns which become scalars after ARRAY JOIN. These dimensions cause runtime ClickHouse errors and must be excluded from the array-joined cube. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ions When FILTER_PARAMS dimensions are stripped from the array-joined cube, paired count measures that reference those dimensions must also be removed, and drill_members lists must be cleaned. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When the user deselects columns in the profile preview, the filter column (e.g. commerce.products.entry_type) might be removed from the columns Map. But the WHERE clause still references it. Ensure filter columns are always in the ARRAY JOIN regardless of selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… preview Backend: smartGenerate strips excluded dimensions/measures/segments from cubes before generating JS. excluded_fields flows through Hasura action → RPC handler → CubeJS route. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When user deselects fields in change preview, all references must be cleaned: drill_members, paired counts, pre-aggregation measures/dimensions, and derived metrics that reference excluded fields via {name} syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1. ARRAY JOIN SQL now uses SELECT *, alias1, alias2... instead of just SELECT *. ClickHouse doesn't project ARRAY JOIN aliases into outer subquery scope with SELECT * alone. 2. Segments that reference excluded dimensions via {CUBE}.field_name are now stripped during field exclusion cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Runs cleanup in a loop until stable — each pass may remove fields that other fields depend on. Checks both {name} and {CUBE}.name reference patterns. Handles cascading dependencies (metric A references metric B which references excluded field C). Also adds debug logging for excluded_fields receipt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Smart Generation improvements: - Map-expanded fields default to unchecked (opt-in) in change preview - ARRAY JOIN nested fields default to unchecked (opt-in) in change preview - AI-generated metrics default to unchecked (opt-in) in change preview - Count measure and rewrite-rule dimensions always selected - Source tagging in diffModels (map, nested, ai) for frontend selection logic - Skip LLM toggle support (skip_llm parameter) - Required fields (rewrite rules + filter dims) passed to frontend ARRAY JOIN SQL generation: - Replace SELECT * with explicit column list to prevent Array/scalar ambiguity - ARRAY JOIN alias names projected in SELECT for Cube.js subquery visibility - Non-AJ nested groups (location.*) excluded from SELECT (no corresponding dims) - After excluded_fields, prune ARRAY JOIN SQL to only surviving columns - Recompute summary counts after field exclusion Field continuity fixes: - Non-AJ nested groups (location.*) pass through processColumns despite no profiling - FILTER_PARAMS dimensions for non-AJ groups preserved in AJ cube (indexOf still valid) - AJ group FILTER_PARAMS dimensions correctly excluded (indexOf breaks on scalars) - Backtick-quote dotted column names in NestedFieldProcessor SQL - AI metrics empty selection sends empty array (not undefined) to prevent include-all - SELECT pruning uses exact alias name tracking (not regex heuristic) Removed: - Paired filtered count measures (count_dimensions_* etc) - granularity/partition_granularity from pre-aggregations (Cube.js v1.6 compat) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

acmeguy and others added 30 commits March 31, 2026 08:57

feat: enhance cubeBuilder with nested filter support and auto-derived…

2eecef5

… cube names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: thread nestedFilters through smart-generate pipeline and profiler

faaf830

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: add default param for nestedFilters in buildWhereClause

26b3707

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add NestedFilterInput types to Hasura actions and RPC handler

b4adbf1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: return raw_type and value_type in discoverNested discriminators

c808cfc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add modelPolisher LLM module for cube-principles compliance

5be570d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: mount cube-principles.md for modelPolisher in dev environment

1a55821

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add polish field to SmartGenOutput

dd47242

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: use arrayJoin for nested column value lookups in column-values e…

08f1d6a

…ndpoint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: pass nestedFilters from profile-table route to profiler

51846dc

The frontend sends nestedFilters in the profile-table POST body but the route wasn't extracting or passing them to the profiler function. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: lazy-load Zod schemas to avoid module-level z reference error

9e0d561

Zod schemas are now built inside an async getSchemas() function that imports zod dynamically, avoiding the undefined 'z' at module load time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: replace z.record(z.any()) with z.record(z.string(), z.string()) …

ab9dfde

…for Zod 4 compat zodResponseFormat fails with z.any() as a record value type in Zod 4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

wip: smart gen improvements - plans, specs, principles copy

ae209f8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: serialize titles, descriptions, pre-aggregations, format, publi…

4d5d3f4

…c, drill_members in yamlGenerator Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: remove debug logging from profiler nested filter code

85a952d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

acmeguy and others added 18 commits March 31, 2026 08:57

fix: increase smart_gen_dataschemas timeout to 300s, add nested filte…

dc973d9

…r debug logging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add micro-prompt modelAdvisor replacing monolithic polisher

2607b30

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: replace monolithic polisher with micro-prompt advisor in smartG…

d802136

…enerate pipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: remove old monolithic modelPolisher, replaced by modelAdvisor

0a9138e

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

acmeguy mentioned this pull request Mar 31, 2026

feat: map field opt-in UI, skip LLM toggle, start over button smartdataHQ/client-v2#30

Merged

4 tasks

Valdegg approved these changes Mar 31, 2026

View reviewed changes

acmeguy merged commit c9897e9 into main Mar 31, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: map field opt-in, ARRAY JOIN SQL fixes, field selection continuity#35

feat: map field opt-in, ARRAY JOIN SQL fixes, field selection continuity#35
acmeguy merged 48 commits intomainfrom
feat/map-field-opt-in-and-array-join-fixes

acmeguy commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

acmeguy commented Mar 31, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants