Skip to content

Expand Malloy adapter coverage#112

Merged
nicosuave merged 4 commits intomainfrom
nicosuave/malloy-coverage
Mar 28, 2026
Merged

Expand Malloy adapter coverage#112
nicosuave merged 4 commits intomainfrom
nicosuave/malloy-coverage

Conversation

@nicosuave
Copy link
Copy Markdown
Member

Summary

Comprehensive expansion of the Malloy adapter's parsing and expression handling, addressing 23 gaps identified by auditing all 21 fixture files against the compatibility doc. Tier 1 fixes critical parsing (dot-method aggregation, source inheritance, rename statements, null coalescing, apply-pick, count(field) semantics, connection preservation). Tier 2 adds structural completeness (pipeline sources, old + syntax, multi-condition joins, accept/except, inline join sources, compose). Tier 3 fixes expression correctness (regex operators, type assertions, &/| operators, date literals, now keyword, granularity on aggregated measures, filtered measure references).

All 81 Malloy tests pass (4 former xfails now passing), 3137 full suite passes, 0 regressions. Compatibility doc updated to reflect new status across all sections.

Tier 1 (critical parsing):
- Dot-method aggregation (field.sum(), `number`.sum())
- Source inheritance via SQIDContext (Model.extends)
- rename: statements -> Dimension(name=new, sql=old)
- ?? null coalescing -> COALESCE
- Apply-pick (field ? pick 'X' when < 5) -> proper CASE
- count(field) -> count_distinct per Malloy semantics
- Triple-quote string extraction ordering bug fix
- Connection name preservation in metadata + export

Tier 2 (structural completeness):
- SQArrowContext pipeline sources (->)
- SQRefinedQueryContext old + syntax with best-effort extraction
- Multi-condition join on -> composite keys in metadata
- accept:/except: field visibility handler
- Inline source definitions in joins -> separate models
- compose() sources -> first source processed

Tier 3 (expression correctness):
- ~ / !~ regex -> REGEXP_MATCHES()
- ! type assertion stripped (func!type(args) -> func(args))
- & and-tree -> AND with base field expansion
- | or-tree -> IN (...) for value matching
- @ date literals -> DATE 'YYYY-MM-DD'
- now -> CURRENT_TIMESTAMP
- .granularity on aggregated measures preserved in metadata
- Filtered measure refs inherit base aggregation
- Unified _transform_malloy_expr applied to all expressions
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80d593a895

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +327 to +330
r"(\S+)\s+~\s+r'([^']*)'",
r"REGEXP_MATCHES(\1, '\2')",
expr,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip regex rewrite inside pick/when clauses

The ~ rewrite runs on the raw expression and matches when ~ r'...' lines inside pick blocks, so \1 becomes when and the SQL is corrupted to REGEXP_MATCHES(when, ...). This breaks real apply-pick dimensions (for example the faang dimension pattern) and prevents valid CASE generation downstream. Restrict this substitution to true binary expressions (or run it after pick-to-CASE transformation).

Useful? React with 👍 / 👎.

Comment on lines +698 to +702
if isinstance(stmt, MalloyParser.DefExplorePrimaryKeyContext):
field_name = stmt.fieldName()
if field_name:
self.current_primary_key = self._get_text(field_name)
elif isinstance(stmt, MalloyParser.DefExploreDimension_stubContext):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse refined + query statements using query contexts

This loop receives QueryStatementContext nodes, but the code checks them against DefExplore* classes, so the branches never fire. In practice, old base + { ... } refinements silently drop their contents (e.g., where: clauses are not turned into segments), leaving derived sources incomplete. You need to inspect stmt.whereStatement(), stmt.queryJoinStatement(), etc., on QueryStatementContext instead of relying on these isinstance checks.

Useful? React with 👍 / 👎.

Tier 4 (export/roundtrip):
- Export segments as source-level where: clauses
- Export renames as rename: (detect simple identifier dimensions)
- Export full join on conditions from relationship metadata
- join_cross: already handled in prior commit

Tier 5 (metadata/annotations):
- Non-description tags (# line_chart, # percent, etc.) stored in
  dimension/measure/model metadata["tags"]
- #@ persist annotations stored in Model.metadata["persist"]
- timezone: statements stored in Model.metadata["timezone"]
- Standalone # annotations in extend blocks stored as model tags
- declare: field declarations processed as dimensions in + syntax
- _parse_annotations_full returns both description and tag list
@nicosuave nicosuave changed the title Expand Malloy adapter coverage: Tiers 1-3 Expand Malloy adapter coverage Mar 28, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf9e0dd7eb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


# ~ regex match with r'' literal: expr ~ r'pattern' -> REGEXP_MATCHES(expr, 'pattern')
expr = re.sub(
r"(\S+)\s+~\s+r'([^']*)'",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Capture full regex LHS before rewriting

The ~ rewrite uses ("\\S+") for the left-hand side, so any valid expression with spaces/parentheses is truncated and rewritten into invalid SQL. For example, (first || ' ' || last) ~ r'x' is transformed into (first || ' ' || REGEXP_MATCHES(last), 'x'), which both breaks parsing and changes semantics. Fresh evidence beyond the existing pick-thread issue: this occurs even outside pick/when on ordinary concatenation expressions.

Useful? React with 👍 / 👎.

Comment on lines +1120 to +1124
saved = (
self.current_model_name,
self.current_table,
self.current_sql,
self.current_primary_key,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve metadata state across inline join extraction

Inline join extraction snapshots only the core current_* fields, but not metadata accumulators like _timezone and _model_tags that can be mutated while parsing the inline source's extend { ... }. If an inline joined source includes timezone: or standalone # annotations, those values leak back into the outer source and end up on the parent model's metadata, producing incorrect timezone/tags on unrelated models.

Useful? React with 👍 / 👎.

P1: Regex rewrite used \S+ for LHS, truncating expressions with
spaces/parens like (first || ' ' || last) ~ r'x'. Changed to .+?
to capture the full left-hand expression.

P2: Inline join source extraction didn't save/restore _timezone,
_model_tags, _accept_fields, _except_fields. If the inline source
had timezone: or # annotations, they leaked into the parent model.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e500074c35

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +379 to +381
parts = re.split(r"\s*\?\?\s*", expr)
if len(parts) > 1:
return f"COALESCE({', '.join(p.strip() for p in parts)})"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse ?? only at top expression depth

_transform_null_coalesce currently splits on every ?? token and wraps the entire string in one COALESCE(...), which breaks nested/embedded uses. For example, concat(a ?? b, c) becomes COALESCE(concat(a, b, c)), and (a ?? b) + c becomes COALESCE((a, b) + c), so valid Malloy expressions are rewritten into changed or invalid SQL. Because this helper runs for dimensions, measures, and segments, this can corrupt many expression contexts.

Useful? React with 👍 / 👎.

Comment on lines +1160 to +1163
metadata = {}
if self.current_connection:
metadata["connection"] = self.current_connection
inline_model = Model(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve parsed metadata on inline join models

Inline source extraction parses timezone: and standalone # annotations into visitor state, but when building the inline Model only connection is copied into metadata. This drops metadata that would be preserved for top-level sources, so inline models can silently lose timezone/tags during parse and then round-trip/export with incomplete semantics.

Useful? React with 👍 / 👎.

P1: _transform_null_coalesce now tracks parenthesis depth and only
splits on ?? at the top level. concat(a ?? b, c) and (a ?? b) + c
are no longer corrupted.

P2: Inline join model creation now includes _timezone and _model_tags
in metadata, matching the behavior of top-level source creation.
@nicosuave nicosuave merged commit 9776a34 into main Mar 28, 2026
15 checks passed
@nicosuave nicosuave deleted the nicosuave/malloy-coverage branch March 28, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant