[DataLoader] Support more literal types in filters DSL to DataFusion conversion#569
Merged
Conversation
…ter conversion _literal_to_sql() only handled str, bool, int, and float, causing a TypeError when datetime/date filter values were converted to DataFusion SQL (the code path used when a TableTransformer is active).
…QL filter conversion
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes DataFusion SQL generation for the DataLoader filters DSL so filters containing additional Python literal types (datetime/date/time/Decimal/UUID) no longer raise TypeError when a DataFusion-based TableTransformer path is used.
Changes:
- Extend
_literal_to_sql()to convertdatetime,date,time,Decimal, andUUIDvalues into DataFusion-compatible SQL literals. - Add unit tests asserting the generated SQL for the new literal types.
- Add integration-style tests that execute the generated SQL against a real DataFusion
SessionContext.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| integrations/python/dataloader/src/openhouse/dataloader/filters.py | Adds DataFusion literal serialization for additional Python literal types used in filter expressions. |
| integrations/python/dataloader/tests/test_filters.py | Adds coverage for the new literal conversions (string-level assertions + execution against DataFusion). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ShreyeshArangath
previously approved these changes
May 5, 2026
Collaborator
ShreyeshArangath
left a comment
There was a problem hiding this comment.
LGTM, just a couple of questions
…literals DataFusion does not support TIME WITH TIME ZONE, so tz-aware Python time values are converted to UTC before formatting the SQL literal.
…erals
DataFusion interprets bare NaN/inf as column names, so emit
CAST('...' AS DOUBLE) for non-finite float and Decimal values.
DataFusion does not support timezones for TIME. Rather than converting to UTC (which assumes the data is stored in UTC), raise a TypeError telling the user to match the timezone used in the dataset.
ShreyeshArangath
approved these changes
May 5, 2026
10 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Filter expressions using
datetime,date,time,Decimal, orUUIDvalues fail withTypeErrorwhen aTableTransformeris active because_literal_to_sql()only handledstr,bool,int, andfloat. This extends support for other types.Changes
Testing Done
All 234 tests pass.
make verifyclean.Additional Information