Add Snowflake Cortex Semantic Model adapter#30
Conversation
Implement bidirectional adapter for Snowflake Cortex Semantic Model format: Parsing: - Tables with base_table references (database.schema.table) - Dimensions (categorical, numeric, boolean) - Time dimensions with granularity support - Facts with aggregations (sum, count, avg, min, max, median) - Complex metrics with derived expressions - Relationships between tables - Filters parsed as segments with column qualification Export: - Proper Snowflake YAML structure with name, tables, relationships - Dimension type mapping to Snowflake data types (TEXT, NUMBER, BOOLEAN, TIMESTAMP) - Time dimensions separated from regular dimensions - Simple metrics as facts, complex metrics with expr - Segments exported as filters Tests (60 tests): - Parsing: basic models, dimensions, metrics, relationships, filters - Roundtrip: parse -> export -> parse preserves semantics - Conversion: cross-format with Cube and MetricFlow - Query: SQL generation with joins and segments - Kitchen sink: integration tests with real DuckDB database
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 66e5d08ffc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return Metric( | ||
| name=name, | ||
| type="derived", | ||
| sql=expr, | ||
| description=metric_def.get("description"), |
There was a problem hiding this comment.
Rewrite complex metrics to reference model columns
When a Snowflake metric expression contains multiple aggregates, _parse_metric returns a derived metric with the raw expr string. The SQL generator treats derived metrics with inline aggregates as already‑valid SQL and does not rewrite column references. That means expressions like SUM(quantity * unit_price) / COUNT(DISTINCT order_id) (from the sample Snowflake models) will be emitted as‑is, but quantity/unit_price are not selected into the model CTE (only quantity_raw/unit_price_raw exist), so queries that include these metrics will fail with unknown/ambiguous column errors. This adapter should either translate those expressions to use model placeholders/measure references or convert them into simple aggregations built from raw columns so the generator can resolve them.
Useful? React with 👍 / 👎.
| if segment.sql: | ||
| filter_def["expr"] = segment.sql |
There was a problem hiding this comment.
Strip {model} placeholders when exporting filters
Sidemantic segments are expected to use {model} placeholders, but Snowflake filters require bare column names. _export_filter writes segment.sql verbatim, so round‑tripping a Snowflake model will emit filters like {model}.status = 'delivered', which are not valid Snowflake semantic model expressions. This will break export for any segment created by this adapter (or by users who follow Sidemantic’s {model} convention). The export should remove {model}./{model} before serializing.
Useful? React with 👍 / 👎.
Kitchen sink test revealed that derived metrics with inline SQL aggregations
(like `SUM(quantity * unit_price) / COUNT(DISTINCT order_id)`) weren't being
properly handled. The column references inside the aggregations need to be
qualified to point to the CTE columns.
Changes:
- Update _build_metric_sql to properly qualify column references in metrics
that contain inline aggregate functions
- Dimensions are qualified as {model}_cte.{dim} (no _raw suffix)
- Metrics/facts are qualified as {model}_cte.{name}_raw
- Skip complex SQL expressions when building the column map
- Skip adding dimension columns to the metrics map to avoid incorrect _raw suffix
- Add test_avg_order_value to kitchen sink tests to verify the fix
- Extract _qualify_columns() as shared function in Snowflake adapter
- Qualify column references at parse time with {model} placeholder
- Fix CTE generation to include measure columns referenced in derived metrics
- Simplify generator to just replace {model} and add _raw suffix for measures
Summary
Key features
{model}.columnformat)