Skip to content

Fix BSL adapter: compound expressions, joins, filters, time dims#113

Merged
nicosuave merged 12 commits intomainfrom
nicosuave/fix-bsl-adapter
Mar 30, 2026
Merged

Fix BSL adapter: compound expressions, joins, filters, time dims#113
nicosuave merged 12 commits intomainfrom
nicosuave/fix-bsl-adapter

Conversation

@nicosuave
Copy link
Copy Markdown
Member

Summary

  • Extend BSL expression parser to handle compound arithmetic (_.a - _.b).sum() and boolean comparison (_.col == "val").sum() expressions that previously fell through to fallback
  • Add support for silently-ignored BSL YAML features: with: join syntax, model-level primary_key/time_dimension/smallest_time_grain, model-level filter: expressions, and calculated_measures: section
  • Store original BSL expressions in metric/dimension metadata for lossless roundtrip instead of fragile regex reconstruction
  • Add AST-based bsl_filter_to_sql for proper filter conversion (handles &/| as AND/OR)
  • Add second/minute to Model.default_grain Literal for consistency with Dimension.granularity

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 805d491b65

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +186 to +190
if isinstance(node.op, ast.BitAnd):
left = _filter_node_to_sql(node.left)
right = _filter_node_to_sql(node.right)
if left and right:
return f"{left} AND {right}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve boolean grouping when translating filter operators

The &/| translation drops grouping because it concatenates recursive results without parentheses, so mixed-precedence filters can change meaning. For example, (_.a == 1) & ((_.b == 2) | (_.c == 3)) is emitted as a = 1 AND b = 2 OR c = 3, which SQL evaluates as (a = 1 AND b = 2) OR c = 3 instead of a = 1 AND (b = 2 OR c = 3). This can silently broaden or narrow model-level filters and produce incorrect query results.

Useful? React with 👍 / 👎.

bsl_expr = f"_.{metric.name}"
else:
# Cross-format conversion: reconstruct BSL from SQL + aggregation
bsl_expr = _sql_to_bsl_expr(metric.sql or metric.name, metric.agg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Export COUNT(*) metrics without substituting metric name

Using metric.sql or metric.name here regresses count metric export: a metric defined as agg="count", sql=None now serializes through _sql_to_bsl_expr as _.<metric_name>.count() instead of _.count(). That changes semantics from row count to counting a column (or references a non-existent column), so cross-format exports to BSL produce incorrect measures for a very common metric pattern.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48928c5eca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if left is None or right is None or op is None:
return None

return f"{left} {op} {right}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve grouping in nested arithmetic conversion

When converting compound BSL measures, _binop_to_sql always emits left op right without wrapping child BinOps, so explicit grouping is lost. For example, (_.a - (_.b + _.c)).sum() is converted to a - b + c, and ((_.a - _.b) * _.c).sum() becomes a - b * c; both change metric semantics and can return incorrect aggregates whenever nested arithmetic is used.

Useful? React with 👍 / 👎.

Comment on lines +482 to +483
fk = rel.foreign_key if isinstance(rel.foreign_key, str) else rel.foreign_key[0]
join_def["with"] = f"_.{fk}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid truncating composite foreign keys in join export

The new with: export path drops all but the first foreign-key column when rel.foreign_key is a list and rel.primary_key is unset. That silently rewrites composite joins into single-column joins, which can broaden matches and produce incorrect query results for models that rely on multi-column relationship keys.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 28d7017003

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

filter_expr = model_def.get("filter")
if filter_expr:
filter_str = str(filter_expr)
segments.append(Segment(name="_default_filter", sql=bsl_filter_to_sql(filter_str)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Apply imported model filters automatically during queries

This change parses BSL filter: into a segment (_default_filter) but never wires it into default query execution, so imported models no longer enforce their model-level filter unless callers manually pass segments=["<model>._default_filter"]. In this codebase, SQLGenerator.generate_sql initializes segments to an empty list and only applies filters from explicitly requested segments, so a normal query against a filtered BSL model returns unfiltered results. That diverges from BSL semantics (model filter is intended to apply to all queries) and can silently broaden result sets.

Useful? React with 👍 / 👎.

except SyntaxError:
pass
# Fallback for unparseable expressions
return re.sub(r"_\.(\w+)", r"\1", str(expr))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Translate negated BSL filters to SQL instead of fallback text

Negated filter expressions are not handled by _filter_node_to_sql, so expressions like ~(_.origin == 'LAX') fall through to the regex fallback and are emitted as ~(origin == 'LAX'). That leaves Python operators (~, ==) in the generated SQL fragment, which can fail parsing or produce incorrect semantics depending on dialect. Since BSL/Ibis commonly uses ~ for logical NOT, this breaks a realistic class of model-level filters.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ba596732b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 196 to +197
table=table,
sql=model_sql,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve model-level filters when exporting to other adapters

When a BSL model includes filter:, _parse_model sets sql but also keeps table, and many exporters prioritize table over sql (for example, sidemantic/adapters/cube.py _export_cube uses if model.table before elif model.sql). In that common conversion path the baked filter is dropped, so a filtered BSL model is exported as an unfiltered base table and downstream queries return broader results than the source model defines.

Useful? React with 👍 / 👎.

Comment on lines +255 to +256
# Fallback for unparseable expressions
return re.sub(r"_\.(\w+)", r"\1", str(expr))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Translate membership filter calls instead of regex fallback

bsl_filter_to_sql does not handle filter method calls such as _.col.isin([...]), so _filter_node_to_sql returns None and the fallback leaves Python syntax (col.isin([...])) in the generated SQL fragment. Because _parse_model injects that output directly into WHERE, models using common Ibis membership filters now produce invalid SQL at query time instead of a valid IN (...) predicate.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: db4e983992

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

the adapter stores and reuses the original expression instead.
"""
# Simple column + aggregation (the common case for cross-format)
if not any(f" {op} " in sql for op in ["+", "-", "*", "/", "%", "=", "!=", "<", ">", "<="]):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include >= in SQL-to-BSL operator detection

_sql_to_bsl_expr uses a spaced-operator allowlist to decide whether an expression is compound, but that list omits >=. For inputs like a >= 5 with an aggregation, this path treats the whole string as a simple column and emits malformed BSL like _.a >= 5.sum(), which breaks exported metrics in cross-format conversions.

Useful? React with 👍 / 👎.

Comment on lines +317 to +321
parts = re.split(r"(\s*(?:[+\-*/%]|[<>!=]=?)\s*)", sql)
bsl_parts = []
for part in parts:
stripped = part.strip()
if re.match(r"^[a-zA-Z_]\w*(\.\w+)*$", stripped):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Rewrite SQL '=' to Python '==' in emitted BSL

The compound conversion branch preserves SQL comparison tokens verbatim, so expressions such as status = 'X' are emitted as (_.status = 'X').sum(). BSL expressions are parsed as Python in this adapter, and = inside an expression is a syntax error, so exported comparison measures cannot be parsed back correctly.

Useful? React with 👍 / 👎.

except SyntaxError:
pass
# Fallback for unparseable expressions
return re.sub(r"_\.(\w+)", r"\1", str(expr))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail on untranslatable filters instead of raw fallback

When filter AST translation fails, bsl_filter_to_sql falls back to a regex replacement that only strips _. and returns the rest unchanged, which can leave Python syntax (for example ==, method calls, or None) in the SQL fragment. Because _parse_model injects this directly into WHERE, affected models fail at query time instead of producing a clear parse-time error.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6457b1d391

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

filter_str = str(filter_expr)
filter_sql = bsl_filter_to_sql(filter_str)
model_sql = f"SELECT * FROM {table} WHERE {filter_sql}"
model_table = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve table mapping when applying model filters

Setting model_table = None for every model that has filter turns a table-backed model into a SQL-backed model, but Migrator only indexes models by model.table in its table_to_model map. As a result, filtered BSL models become invisible to migrator analysis and queries against the physical table (for example flights_tbl) are reported as missing/unrewritable even though the model exists. Keep the physical table available (or map from metadata['bsl_table']) so adding model-level filter support doesn’t break migrator coverage workflows.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2013f6362f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +123 to +127
if isinstance(node, ast.Constant):
if isinstance(node.value, str):
escaped = node.value.replace("'", "''")
return f"'{escaped}'"
if isinstance(node.value, (int, float)):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Support signed numeric literals in filter translation

bsl_filter_to_sql cannot translate comparisons against negative numbers because Python parses -5 as ast.UnaryOp, while _expr_to_sql only accepts ast.Constant literals here. A filter like _.delta > -5 therefore yields None on the right-hand side, and _parse_model will raise ValueError when importing the model filter. This makes valid BSL filters with signed thresholds fail at parse time.

Useful? React with 👍 / 👎.

nicosuave and others added 9 commits March 29, 2026 23:18
…ensions

Extend the BSL expression parser to handle compound arithmetic
(_.a - _.b).sum() and boolean comparison (_.col == "val").sum()
expressions that previously fell through to the fallback.

Add support for BSL YAML features that were silently ignored:
- with: join syntax (shorthand for left_on/right_on)
- Model-level primary_key, time_dimension, smallest_time_grain
- Model-level filter expressions (parsed via AST, stored as Segment)
- calculated_measures section

Store original BSL expressions in metadata for lossless roundtrip
instead of fragile regex reconstruction. Add bsl_filter_to_sql with
proper AST-based conversion. Add second/minute to Model.default_grain
for consistency with Dimension.granularity.
Wrap filter operands in parens when translating & / | to AND / OR
so mixed-precedence filters preserve grouping.

Handle COUNT(*) metrics (agg="count", sql=None) in cross-format
export path to emit _.count() instead of _.name.count().
Wrap nested BinOp children in parens in _binop_to_sql so
(_.a - (_.b + _.c)).sum() produces "a - (b + c)" not "a - b + c".

Only emit with: join shorthand for single-string foreign keys,
fall back to left_on/right_on for list-typed composite keys.
Instead of creating a Segment that callers must explicitly apply,
bake BSL filter: expressions into model.sql as a filtered subquery
(SELECT * FROM table WHERE filter). This matches BSL semantics
where model filters always apply to all queries.

Add ~ (ast.Invert) handling in _filter_node_to_sql so negated
filter expressions like ~(_.origin == 'LAX') produce NOT (origin = 'LAX')
instead of falling through to the regex fallback.
Clear model.table when filter is baked into model.sql so exporters
(Cube, MetricFlow) that prefer table over sql don't silently drop
the filter. Store original table name in metadata for BSL roundtrip.

Add isin/notin/between/isnull/notnull method call handling to
_filter_node_to_sql so _.col.isin(['a','b']) produces
col IN ('a', 'b') instead of falling through to regex fallback.
Add >= to _sql_to_bsl_expr operator detection so expressions like
a >= 5 are recognized as compound and wrapped correctly.

Convert SQL = to Python == and <> to != in _sql_to_bsl_expr output
so emitted BSL expressions are valid Python syntax.

Replace silent regex fallback in bsl_filter_to_sql with ValueError
so untranslatable filter expressions fail at parse time instead of
producing invalid SQL at query time.
Restore model.table on filtered BSL models so migrator's
table_to_model index stays intact. Swap export priority in Cube,
MetricFlow, and LookML adapters to check model.sql before
model.table, ensuring filtered subqueries are preserved in
cross-format conversion.
Python parses -5 as ast.UnaryOp(USub, Constant(5)), not as a negative
Constant. _expr_to_sql only handled Constant, so filters like
_.delta > -5 produced None on the RHS and failed at parse time.
Handle UnaryOp with USub to emit the negated value.
@nicosuave nicosuave force-pushed the nicosuave/fix-bsl-adapter branch from 68931bd to cc0b94b Compare March 30, 2026 06:18
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19c1a5ab7c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

None, description="Default time dimension for metrics (auto-included in queries)"
)
default_grain: Literal["hour", "day", "week", "month", "quarter", "year"] | None = Field(
default_grain: Literal["second", "minute", "hour", "day", "week", "month", "quarter", "year"] | None = Field(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align granularity validation with second/minute support

Adding "second" and "minute" to Model.default_grain makes these grains importable (including from BSL), but query validation still rejects them because validate_query only allows hour/day/week/month/quarter/year (sidemantic/validation.py). As a result, explicit queries like model.time_dim__second or model.time_dim__minute now fail with QueryValidationError even though the model/schema accepts them, leaving the newly added grains unusable in normal query APIs.

Useful? React with 👍 / 👎.

Comment on lines 198 to +199
table=table,
sql=model_sql,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid dropping filtered BSL models in non-updated exporters

When a BSL model has filter, this code now keeps both table and sql (table=table, sql=model_sql). Only a subset of exporters were updated to prefer sql; exporters that still do if model.table ... elif model.sql (for example sidemantic/adapters/hex.py lines 339-342) will emit the raw table and discard the baked filter subquery, silently broadening results after BSL→other-format conversion.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0832f2348

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +448 to +450
if model.sql:
result["sql"] = model.sql
elif model.table:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep model when exporting SQL-backed MetricFlow models

Exporting model.sql in preference to model.table here breaks MetricFlow roundtrips for models that now carry both fields (for example BSL models with a model-level filter). MetricFlowAdapter._parse_semantic_model only reconstructs the source from the model key and does not read sql, so re-importing this export yields a model with no usable source (table=None, sql=None) and downstream compiled queries emit invalid FROM None. Please continue emitting a model reference (or update parse to consume sql) when both are present.

Useful? React with 👍 / 👎.

@nicosuave nicosuave merged commit dde1c51 into main Mar 30, 2026
15 checks passed
@nicosuave nicosuave deleted the nicosuave/fix-bsl-adapter branch March 30, 2026 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant