Skip to content

Fix LookML filter-expression to SQL conversion#241

Open
nicosuave wants to merge 1 commit into
mainfrom
fix/lookml-filters
Open

Fix LookML filter-expression to SQL conversion#241
nicosuave wants to merge 1 commit into
mainfrom
fix/lookml-filters

Conversation

@nicosuave

Copy link
Copy Markdown
Member

Summary

Part of a series fixing correctness bugs in the LookML/Looker import adapter (sidemantic/adapters/lookml.py), found by a deep audit.

_convert_lookml_filter_to_sql treated Looker filter values as opaque strings, so anything past simple comparisons produced wrong or invalid SQL:

Input Before After
last 7 days f = 'last 7 days' (0 rows) warns; left literal (date grammar is a follow-up)
5 to 10 f = '5 to 10' f >= 5 AND f <= 10
[1,10] f IN ('[1','10]') f >= 1 AND f <= 10
NOT 5 f = 'NOT 5' f != 5
EMPTY f = '' (f IS NULL OR f = '')
-%foo% f != '%foo%' f NOT LIKE '%foo%'
O'Brien f = 'O'Brien' (broken / injection) f = 'O''Brien'

Changes

  • Rewrite the filter converter to implement the representable parts of Looker's filter expression language: numeric ranges (a to b, open-ended), interval brackets [](), NOT/- negation, EMPTY = NULL OR '', wildcard NOT LIKE, and mixed comma lists (includes OR'd, excludes AND'd).
  • Escape single quotes in all emitted string literals.
  • Date/interval expressions that aren't translated yet now log a warning instead of silently matching zero rows.
  • Add regression tests covering each case.

Notes

  • Date-filter translation (relative dates, this month, etc.) is deferred — it needs field-type + dialect handling.
  • Stacked series: this is the base PR.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 345f8c29cc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread sidemantic/adapters/lookml.py Outdated
# before / after <value> (date or numeric bound)
bm = re.match(r"(?i)^(before|after)\s+(.+)$", v)
if bm:
op = "<" if bm.group(1).lower() == "before" else ">"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve inclusive after-date filters

For LookML date filters, after 2018-10-05 includes the boundary date per Looker's filter docs (https://docs.cloud.google.com/looker/docs/filter-expressions#date_and_time), but this branch emits > for every after expression. Any imported measure using filters: [created_date: "after 2020-01-01"] will drop rows from 2020-01-01; after should be generated as >= while before remains exclusive.

Useful? React with 👍 / 👎.

Comment on lines +371 to +374
for p in parts:
cond = single(p)
is_exclude = (p.startswith("-") and not re.match(r"^-(\d|\.)", p)) or bool(re.match(r"(?i)^not\s", p))
(excludes if is_exclude else includes).append(cond)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply leading NOT to the whole numeric list

Looker's numeric filter syntax treats NOT 66, 99, 4 as excluding all three values (https://docs.cloud.google.com/looker/docs/filter-expressions#number), but this mixed-list path only marks the first NOT 66 token as an exclusion and ORs the remaining plain values as inclusions, producing (f = 99 OR f = 4) AND f != 66. Measures imported from that documented syntax will count only 99/4 instead of everything except 66/99/4; detect a leading NOT with no other NOTs and emit a negated list/range for all parts.

Useful? React with 👍 / 👎.

The LookML adapter treated Looker filter values as opaque strings, so
anything beyond simple comparisons produced wrong or invalid SQL:
- date/numeric ranges and intervals ([1,10], "5 to 10") became string
  equality or malformed IN lists (matched zero rows)
- "not X" / negation -> literal "= 'not X'"
- EMPTY -> omitted the NULL case
- single quotes were not escaped (broken SQL / injection risk)

Rewrite _convert_lookml_filter_to_sql to implement the representable parts
of Looker's filter expression language (numeric ranges, interval brackets,
NOT/negation, EMPTY = NULL OR '', wildcard NOT LIKE, mixed lists) and
escape quoted string literals. Untranslated date/interval expressions now
log a warning instead of silently emitting a zero-row equality.
@nicosuave nicosuave force-pushed the fix/lookml-filters branch from 345f8c2 to 00ff35d Compare June 26, 2026 19:40

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00ff35dc91

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +283 to +286
op = "<" if bm.group(1).lower() == "before" else ">="
operand = bm.group(2).strip()
rhs = operand if is_number(operand) else q(operand)
return f"{col} {op} {rhs}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard relative before/after date filters

When a LookML date filter uses documented relative bounds such as before 3 days ago or after Monday, this branch treats the operand as a literal SQL value before the date-expression warning can run, producing conditions like {model}.created_date < '3 days ago'. On dialects that don't parse those English phrases as dates, imported measures either fail or count the wrong rows; detect relative operands and leave them on the warning/fallback path instead of translating them as absolute bounds.

Useful? React with 👍 / 👎.

Comment on lines +289 to +291
cm = re.match(r"^(>=|<=|!=|<>|>|<)\s*(.+)$", v)
if cm:
operator, operand = cm.group(1), cm.group(2).strip()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse numeric AND ranges before single comparisons

When a numeric filter uses Looker's documented AND range syntax inside one condition, for example >1 AND <100, NOT 2, this regex captures 1 AND <100 as the operand for a single > comparison and quotes it because it isn't a number. The mixed-list path then emits a number-to-string comparison instead of field > 1 AND field < 100, so imported filtered measures fail or return nonsense for that supported syntax; split/parse AND subconditions before applying the single-comparison fallback.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant