Skip to content

fix(datafusion): coerce filter literals for dictionary-encoded columns#7003

Open
valkum wants to merge 1 commit into
lance-format:mainfrom
valkum:dict_expr
Open

fix(datafusion): coerce filter literals for dictionary-encoded columns#7003
valkum wants to merge 1 commit into
lance-format:mainfrom
valkum:dict_expr

Conversation

@valkum
Copy link
Copy Markdown
Contributor

@valkum valkum commented May 29, 2026

Closes #7002

SQL string filters on a dictionary-encoded column (e.g. Dictionary(Int16, Utf8)) failed to plan with "could not convert to literal of type 'Dictionary(...)'" because safe_coerce_scalar had no arm for a dictionary target and no ScalarValue::Dictionary input arm. This also silently lost scalar-index pushdown for =/IN predicates on dictionary columns.

Add two generic guard clauses to safe_coerce_scalar: unwrap a dictionary literal and coerce its inner value, and coerce a value to a dictionary target by recursing on the value type and re-wrapping. Nulls keep their untyped form, matching the existing behavior for all targets.

Enabling pushdown exposed a todo!() in OrderableScalarValue::cmp for Dictionary vs Dictionary (scalar indexes store dictionary columns as Dictionary scalar keys in a BTreeMap), which would have turned a previously-working full-scan query into a panic. Implement the comparison by delegating to the inner values.

This was created by Claude Code using Opus 4.8.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@github-actions github-actions Bot added the bug Something isn't working label May 29, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7df6035e4c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +24 to +25
if value.is_null() {
return Some(value.clone());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve dictionary type for typed null literals

When the filter literal is already a typed null, such as an API-built Expr::Literal(ScalarValue::Utf8(None), _) or an IN value that has been typed as Utf8, this early is_null() return leaves the literal as Utf8(None) instead of producing a Dictionary(Int16, Utf8) scalar for a dictionary column. resolve_value then installs that mismatched literal next to the dictionary column, so the same dictionary filter path fixed for non-null strings can still fail planning/evaluation for typed-null predicates; this guard should only preserve the untyped ScalarValue::Null case or coerce the inner null and wrap it in the dictionary type.

Useful? React with 👍 / 👎.

SQL string filters on a dictionary-encoded column (e.g.
`Dictionary(Int16, Utf8)`) failed to plan with "could not convert to
literal of type 'Dictionary(...)'" because `safe_coerce_scalar` had no
arm for a dictionary target and no `ScalarValue::Dictionary` input arm.
This also silently lost scalar-index pushdown for `=`/`IN` predicates on
dictionary columns.

Add two generic guard clauses to `safe_coerce_scalar`: unwrap a
dictionary literal and coerce its inner value, and coerce a value to a
dictionary target by recursing on the value type and re-wrapping. Nulls
keep their untyped form, matching the existing behavior for all targets.

Enabling pushdown exposed a `todo!()` in `OrderableScalarValue::cmp` for
`Dictionary` vs `Dictionary` (scalar indexes store dictionary columns as
`Dictionary` scalar keys in a BTreeMap), which would have turned a
previously-working full-scan query into a panic. Implement the
comparison by delegating to the inner values.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@valkum
Copy link
Copy Markdown
Contributor Author

valkum commented May 29, 2026

Rebased on top of the recent BinaryView PR

@codecov
Copy link
Copy Markdown

codecov Bot commented May 29, 2026

Codecov Report

❌ Patch coverage is 99.15966% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/scanner.rs 98.21% 0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@valkum
Copy link
Copy Markdown
Contributor Author

valkum commented May 29, 2026

Currently working on the feedback from codex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LanceDB's only_if("col IN ('x', 'y') fails if col is of type dictionary

1 participant