Conversation
Tier 1 (critical parsing): - Dot-method aggregation (field.sum(), `number`.sum()) - Source inheritance via SQIDContext (Model.extends) - rename: statements -> Dimension(name=new, sql=old) - ?? null coalescing -> COALESCE - Apply-pick (field ? pick 'X' when < 5) -> proper CASE - count(field) -> count_distinct per Malloy semantics - Triple-quote string extraction ordering bug fix - Connection name preservation in metadata + export Tier 2 (structural completeness): - SQArrowContext pipeline sources (->) - SQRefinedQueryContext old + syntax with best-effort extraction - Multi-condition join on -> composite keys in metadata - accept:/except: field visibility handler - Inline source definitions in joins -> separate models - compose() sources -> first source processed Tier 3 (expression correctness): - ~ / !~ regex -> REGEXP_MATCHES() - ! type assertion stripped (func!type(args) -> func(args)) - & and-tree -> AND with base field expansion - | or-tree -> IN (...) for value matching - @ date literals -> DATE 'YYYY-MM-DD' - now -> CURRENT_TIMESTAMP - .granularity on aggregated measures preserved in metadata - Filtered measure refs inherit base aggregation - Unified _transform_malloy_expr applied to all expressions
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 80d593a895
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
sidemantic/adapters/malloy.py
Outdated
| r"(\S+)\s+~\s+r'([^']*)'", | ||
| r"REGEXP_MATCHES(\1, '\2')", | ||
| expr, | ||
| ) |
There was a problem hiding this comment.
Skip regex rewrite inside pick/when clauses
The ~ rewrite runs on the raw expression and matches when ~ r'...' lines inside pick blocks, so \1 becomes when and the SQL is corrupted to REGEXP_MATCHES(when, ...). This breaks real apply-pick dimensions (for example the faang dimension pattern) and prevents valid CASE generation downstream. Restrict this substitution to true binary expressions (or run it after pick-to-CASE transformation).
Useful? React with 👍 / 👎.
| if isinstance(stmt, MalloyParser.DefExplorePrimaryKeyContext): | ||
| field_name = stmt.fieldName() | ||
| if field_name: | ||
| self.current_primary_key = self._get_text(field_name) | ||
| elif isinstance(stmt, MalloyParser.DefExploreDimension_stubContext): |
There was a problem hiding this comment.
Parse refined
+ query statements using query contexts
This loop receives QueryStatementContext nodes, but the code checks them against DefExplore* classes, so the branches never fire. In practice, old base + { ... } refinements silently drop their contents (e.g., where: clauses are not turned into segments), leaving derived sources incomplete. You need to inspect stmt.whereStatement(), stmt.queryJoinStatement(), etc., on QueryStatementContext instead of relying on these isinstance checks.
Useful? React with 👍 / 👎.
Tier 4 (export/roundtrip): - Export segments as source-level where: clauses - Export renames as rename: (detect simple identifier dimensions) - Export full join on conditions from relationship metadata - join_cross: already handled in prior commit Tier 5 (metadata/annotations): - Non-description tags (# line_chart, # percent, etc.) stored in dimension/measure/model metadata["tags"] - #@ persist annotations stored in Model.metadata["persist"] - timezone: statements stored in Model.metadata["timezone"] - Standalone # annotations in extend blocks stored as model tags - declare: field declarations processed as dimensions in + syntax - _parse_annotations_full returns both description and tag list
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf9e0dd7eb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
sidemantic/adapters/malloy.py
Outdated
|
|
||
| # ~ regex match with r'' literal: expr ~ r'pattern' -> REGEXP_MATCHES(expr, 'pattern') | ||
| expr = re.sub( | ||
| r"(\S+)\s+~\s+r'([^']*)'", |
There was a problem hiding this comment.
Capture full regex LHS before rewriting
The ~ rewrite uses ("\\S+") for the left-hand side, so any valid expression with spaces/parentheses is truncated and rewritten into invalid SQL. For example, (first || ' ' || last) ~ r'x' is transformed into (first || ' ' || REGEXP_MATCHES(last), 'x'), which both breaks parsing and changes semantics. Fresh evidence beyond the existing pick-thread issue: this occurs even outside pick/when on ordinary concatenation expressions.
Useful? React with 👍 / 👎.
| saved = ( | ||
| self.current_model_name, | ||
| self.current_table, | ||
| self.current_sql, | ||
| self.current_primary_key, |
There was a problem hiding this comment.
Preserve metadata state across inline join extraction
Inline join extraction snapshots only the core current_* fields, but not metadata accumulators like _timezone and _model_tags that can be mutated while parsing the inline source's extend { ... }. If an inline joined source includes timezone: or standalone # annotations, those values leak back into the outer source and end up on the parent model's metadata, producing incorrect timezone/tags on unrelated models.
Useful? React with 👍 / 👎.
P1: Regex rewrite used \S+ for LHS, truncating expressions with spaces/parens like (first || ' ' || last) ~ r'x'. Changed to .+? to capture the full left-hand expression. P2: Inline join source extraction didn't save/restore _timezone, _model_tags, _accept_fields, _except_fields. If the inline source had timezone: or # annotations, they leaked into the parent model.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e500074c35
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
sidemantic/adapters/malloy.py
Outdated
| parts = re.split(r"\s*\?\?\s*", expr) | ||
| if len(parts) > 1: | ||
| return f"COALESCE({', '.join(p.strip() for p in parts)})" |
There was a problem hiding this comment.
Parse
?? only at top expression depth
_transform_null_coalesce currently splits on every ?? token and wraps the entire string in one COALESCE(...), which breaks nested/embedded uses. For example, concat(a ?? b, c) becomes COALESCE(concat(a, b, c)), and (a ?? b) + c becomes COALESCE((a, b) + c), so valid Malloy expressions are rewritten into changed or invalid SQL. Because this helper runs for dimensions, measures, and segments, this can corrupt many expression contexts.
Useful? React with 👍 / 👎.
| metadata = {} | ||
| if self.current_connection: | ||
| metadata["connection"] = self.current_connection | ||
| inline_model = Model( |
There was a problem hiding this comment.
Preserve parsed metadata on inline join models
Inline source extraction parses timezone: and standalone # annotations into visitor state, but when building the inline Model only connection is copied into metadata. This drops metadata that would be preserved for top-level sources, so inline models can silently lose timezone/tags during parse and then round-trip/export with incomplete semantics.
Useful? React with 👍 / 👎.
P1: _transform_null_coalesce now tracks parenthesis depth and only splits on ?? at the top level. concat(a ?? b, c) and (a ?? b) + c are no longer corrupted. P2: Inline join model creation now includes _timezone and _model_tags in metadata, matching the behavior of top-level source creation.
Summary
Comprehensive expansion of the Malloy adapter's parsing and expression handling, addressing 23 gaps identified by auditing all 21 fixture files against the compatibility doc. Tier 1 fixes critical parsing (dot-method aggregation, source inheritance, rename statements, null coalescing, apply-pick, count(field) semantics, connection preservation). Tier 2 adds structural completeness (pipeline sources, old + syntax, multi-condition joins, accept/except, inline join sources, compose). Tier 3 fixes expression correctness (regex operators, type assertions, &/| operators, date literals, now keyword, granularity on aggregated measures, filtered measure references).
All 81 Malloy tests pass (4 former xfails now passing), 3137 full suite passes, 0 regressions. Compatibility doc updated to reflect new status across all sections.