Skip to content

fix: widen narrow numeric literals in zonemap pruner#559

Merged
fangbo merged 1 commit into
lance-format:mainfrom
LuciferYang:fix-557-zonemap-int-literal
May 26, 2026
Merged

fix: widen narrow numeric literals in zonemap pruner#559
fangbo merged 1 commit into
lance-format:mainfrom
LuciferYang:fix-557-zonemap-int-literal

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang commented May 26, 2026

Closes #557.

ZonemapFragmentPruner throws ClassCastException: Long cannot be cast to Integer whenever a WHERE int32_col = N query hits a column with a zonemap index. The two sides of target.compareTo(min) use different Java boxes:

  • Lance JNI (scalar_value_to_java in lance-jni/utils.rs) materializes ZoneStats.min/max with every integer width as Long and every float width as Double.
  • Spark V2 Literal.value() keeps the Catalyst type — Integer for IntegerType, Short for ShortType, Byte for ByteType, Float for FloatType, Integer epoch days for DateType.

Integer.compareTo(Object) then rejects the Long. Same mismatch affects tinyint, smallint, float, and date predicates, plus the IN-list path. The existing catch (ClassCastException) in zoneMatchesComparison does fire on the reported workload but means we silently drop pruning instead of getting it.

Fix

Widen the narrow boxes inside normalizeLiteral to match the JNI wire types: Byte/Short/IntegerLong, FloatDouble. Both conversions are lossless and order-preserving; analyzeIn already routes each list element through the same normalizer so it gets the fix for free.

Tests

Seven new cases in ZonemapFragmentPrunerTest — one per affected literal type (Integer / Short / Byte / Float / Date) plus IN-list coverage (homogeneous and mixed-width). Verified by reverting normalizeLiteral locally and watching all seven fail through the conservative-catch fallback.

Out of scope (surfaced during review)

Pre-existing JNI gaps, not regressions from this PR — noting for follow-up:

  • ScalarValue::Decimal* and other unhandled types box to null for both min and max. The pruner then treats those zones as all-null and excludes them — risk of silent under-pruning on decimal predicates.
  • Date64 and the four Timestamp* variants are all flattened to Long without a unit tag, so columns written with non-Date32 / non-microsecond units cannot be safely pruned against Spark's days/micros literals.

Test plan

  • `make lint` clean
  • `./mvnw test -pl lance-spark-base_2.12 -Dtest=ZonemapFragmentPrunerTest` (37/37 passes locally)
  • CI green across all Spark/Scala modules

Lance JNI normalizes every integer width in ZoneStats.min/max to Long
and every float width to Double, while Spark V2 Literal.value() keeps
the Catalyst type (Integer for int32, Short for smallint, Byte for
tinyint, Float for float32, Integer epoch days for date). The boxes
disagree, and Integer.compareTo(Object) rejects the Long with a
ClassCastException — the existing conservative catch swallows it and
silently drops pruning instead of crashing the query.

Widen Byte/Short/Integer to Long and Float to Double inside
normalizeLiteral. Both conversions are lossless and order-preserving.
Also fixes the IN-list path, which routes each element through the
same normalizer.

Closes lance-format#557.
@github-actions github-actions Bot added the bug Something isn't working label May 26, 2026
@LuciferYang
Copy link
Copy Markdown
Contributor Author

cc @fangbo FYI

@fangbo
Copy link
Copy Markdown
Collaborator

fangbo commented May 26, 2026

+1, Thanks for fixing it !

@fangbo fangbo merged commit fd2e132 into lance-format:main May 26, 2026
19 of 23 checks passed
@LuciferYang
Copy link
Copy Markdown
Contributor Author

Thank you @fangbo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

exception throws when zonemap pruner on int column

2 participants