Skip to content

test: add unit tests for utility classes#416

Open
LuciferYang wants to merge 3 commits intolance-format:mainfrom
LuciferYang:add-utils-unit-tests
Open

test: add unit tests for utility classes#416
LuciferYang wants to merge 3 commits intolance-format:mainfrom
LuciferYang:add-utils-unit-tests

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

Summary

  • Add unit tests for BlobUtils, LargeVarCharUtils, Utils, and VectorUtils covering all reachable branches in both Spark and Arrow field-checking methods
  • Tests cover null inputs, missing/wrong metadata, case-insensitive value matching, type-check isolation (non-String for LargeVarChar, non-numeric for Vector), and edge cases like empty version lists and malformed metadata
  • UtilsTest helper builds ZonedDateTime directly instead of re-implementing the production micros-to-Instant conversion, keeping test logic independent from source logic

Test coverage details

Class Methods tested Tests
BlobUtils isBlobSparkField, isBlobArrowField 12
LargeVarCharUtils isLargeVarCharSparkField, isLargeVarCharArrowField, createPropertyKey 14
Utils parseVersion, findVersion 11
VectorUtils isVectorField, getVectorDimension, isVectorArrowField, getVectorArrowDimension, shouldBeFixedSizeList, createVectorSizePropertyKey 25

Test plan

  • mvn test -pl lance-spark-base_2.12 -Dtest="org.lance.spark.utils.*Test" — 108 tests pass

@github-actions github-actions Bot added the chore Features related to test, build, style improvements label Apr 10, 2026
Copy link
Copy Markdown
Collaborator

@hamersaw hamersaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the majority of these tests are essentially "x=10; assertTrue(isXTen(x))". It doesn't feel like there is a great deal of added value, the regressions would occur if upstream began setting X to something else, which we really can't test. I'm wondering if there's a better way to detect these kinds of issues.

LuciferYang added a commit to LuciferYang/lance-spark that referenced this pull request Apr 20, 2026
Addresses review feedback on PR lance-format#416: positive-match tests that used the
util's own constants on both sides ("x=KEY; assertTrue(isKey(x))") were
pure tautologies.

- Drop the `*Valid` tests from BlobUtilsTest and LargeVarCharUtilsTest;
  the `CaseInsensitive` tests already cover the positive path with a
  non-tautological literal input.
- Replace constant references in test inputs with string literals
  (`"lance-encoding:blob"`, `"arrow:large-var-char"`,
  `"arrow.fixed-size-list.size"`) so a rename of the production constant
  breaks the test.
- Fix `testCreateVectorSizePropertyKey` to use a hardcoded expected
  literal instead of concatenating the constant.

Upstream-convention drift remains an integration-test concern
(BaseBlobCreateTableTest, SchemaConverterTest, LanceArrowUtilsSuite),
noted in the updated class javadocs.
@LuciferYang
Copy link
Copy Markdown
Contributor Author

@hamersaw Fair point — pushed 5b8bb14 trimming this:

  • Dropped the *Valid tests from BlobUtilsTest and LargeVarCharUtilsTest — they were the literal "x=KEY; assertTrue(isKey(x))" tautology you called out. The CaseInsensitive cases already cover the positive path with non-tautological input (and would catch an accidental .equals regression).
  • Replaced constant references in test inputs with string literals ("lance-encoding:blob", "arrow:large-var-char", "arrow.fixed-size-list.size"). A rename of the production constant now breaks the test — so the wire-format contract is pinned on the test side, which it wasn't before.
  • Fixed testCreateVectorSizePropertyKey the same way (was concatenating the constant into the expected value).

58 tests total, all pass locally. Javadoc on each class now notes that upstream-convention drift is the integration layer's job — BaseBlobCreateTableTest, SchemaConverterTest, and the LanceArrow* suites already round-trip through real lance data and would catch that.

Let me know if this is the direction you had in mind, or if you'd rather drop the unit tests entirely and lean on the integration coverage.

Addresses review feedback on PR lance-format#416: positive-match tests that used the
util's own constants on both sides ("x=KEY; assertTrue(isKey(x))") were
pure tautologies.

- Drop the `*Valid` tests from BlobUtilsTest and LargeVarCharUtilsTest;
  the `CaseInsensitive` tests already cover the positive path with a
  non-tautological literal input.
- Replace constant references in test inputs with string literals
  (`"lance-encoding:blob"`, `"arrow:large-var-char"`,
  `"arrow.fixed-size-list.size"`) so a rename of the production constant
  breaks the test.
- Fix `testCreateVectorSizePropertyKey` to use a hardcoded expected
  literal instead of concatenating the constant.

Upstream-convention drift remains an integration-test concern
(BaseBlobCreateTableTest, SchemaConverterTest, LanceArrowUtilsSuite),
noted in the updated class javadocs.
@LuciferYang LuciferYang force-pushed the add-utils-unit-tests branch from 5b8bb14 to 75ec172 Compare April 20, 2026 03:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Features related to test, build, style improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants