Skip to content

[DataLoader] Support more literal types in filters DSL to DataFusion conversion#569

Merged
robreeves merged 10 commits into
linkedin:mainfrom
robreeves:datetime
May 5, 2026
Merged

[DataLoader] Support more literal types in filters DSL to DataFusion conversion#569
robreeves merged 10 commits into
linkedin:mainfrom
robreeves:datetime

Conversation

@robreeves
Copy link
Copy Markdown
Collaborator

@robreeves robreeves commented May 4, 2026

Summary

Filter expressions using datetime, date, time, Decimal, or UUID values fail with TypeError when a TableTransformer is active because _literal_to_sql() only handled str, bool, int, and float. This extends support for other types.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

All 234 tests pass. make verify clean.

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

robreeves added 2 commits May 4, 2026 23:27
…ter conversion

_literal_to_sql() only handled str, bool, int, and float, causing a
TypeError when datetime/date filter values were converted to DataFusion
SQL (the code path used when a TableTransformer is active).
@robreeves robreeves changed the title [DataLoader] Support datetime and other literal types in DataFusion SQL filters [DataLoader] Support more literal types in filters DSL May 4, 2026
@robreeves robreeves changed the title [DataLoader] Support more literal types in filters DSL [DataLoader] Support more literal types in filters DSL to DataFusion conversion May 4, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes DataFusion SQL generation for the DataLoader filters DSL so filters containing additional Python literal types (datetime/date/time/Decimal/UUID) no longer raise TypeError when a DataFusion-based TableTransformer path is used.

Changes:

  • Extend _literal_to_sql() to convert datetime, date, time, Decimal, and UUID values into DataFusion-compatible SQL literals.
  • Add unit tests asserting the generated SQL for the new literal types.
  • Add integration-style tests that execute the generated SQL against a real DataFusion SessionContext.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
integrations/python/dataloader/src/openhouse/dataloader/filters.py Adds DataFusion literal serialization for additional Python literal types used in filter expressions.
integrations/python/dataloader/tests/test_filters.py Adds coverage for the new literal conversions (string-level assertions + execution against DataFusion).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread integrations/python/dataloader/src/openhouse/dataloader/filters.py Outdated
Copy link
Copy Markdown
Collaborator

@ShreyeshArangath ShreyeshArangath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a couple of questions

Comment thread integrations/python/dataloader/src/openhouse/dataloader/filters.py
Comment thread integrations/python/dataloader/src/openhouse/dataloader/filters.py
…literals

DataFusion does not support TIME WITH TIME ZONE, so tz-aware Python
time values are converted to UTC before formatting the SQL literal.
Comment thread integrations/python/dataloader/src/openhouse/dataloader/filters.py Outdated
robreeves added 2 commits May 5, 2026 18:54
…erals

DataFusion interprets bare NaN/inf as column names, so emit
CAST('...' AS DOUBLE) for non-finite float and Decimal values.
DataFusion does not support timezones for TIME. Rather than converting
to UTC (which assumes the data is stored in UTC), raise a TypeError
telling the user to match the timezone used in the dataset.
@robreeves robreeves merged commit 7431342 into linkedin:main May 5, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants