-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add support for date/time operations in PySpark backend #1974
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. I think there are a few places we can simplify the implementation. Please see comments.
7a12a1d
to
4465f9a
Compare
|
@hjoo One small comments here #1974 (comment) otherwise LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@toryhaavik @DiegoAlbertoTorres can you look over and help merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some small changes
…andas dt functions
4534824
to
8ee0578
Compare
Implemented date/time operations for PySpark backend to pass all supported tests in `ibis/tests/all/test_temporal.py`. Features that are supported: - `Date` - `Timestamp` - Date and time extraction - Date and timestamp truncation - Timestamp now - String to timestamp - Timestamp to formatted string - `strftime` - Day of week index - Day of week name - Date add, subtract - Timestamp add, subtract - Interval add, subtract Additional operations implemented to pass `test_temporal.py`: - `cast` - `limit` - `array` index Features that are not supported: - `ms`, `us`, `ns` time units - unsupported in native PySpark sql functions - non-literal `interval` (e.g. casting an integer column to an interval column) - this is because there is no timedelta type in PySpark - Date and timestamp diff - again, no support for time delta type Author: Hyonjee <hyonjee.joo@twosigma.com> Closes ibis-project#1974 from hjoo/pyspark-temporal and squashes the following commits: 8ee0578 [Hyonjee] xfail failing param mutate test for pyspark backend 142daae [Hyonjee] minor refactor in pyspark compile_limit 2dc7e62 [Hyonjee] pyspark backend: rename extract_x_from_datetime to extract_component_from_datetime 42ddc34 [Hyonjee] switch pyspark client day_of_week_index and day_of_week_name to use pandas dt functions f6bcff1 [Hyonjee] temporal operations cleanup, basic test for casting c163ff0 [Hyonjee] PySpark backend temporal operations
Implemented date/time operations for PySpark backend to pass all supported tests in
ibis/tests/all/test_temporal.py.Features that are supported:
DateTimestampstrftimeAdditional operations implemented to pass
test_temporal.py:castlimitarrayindexFeatures that are not supported:
ms,us,nstime units - unsupported in native PySpark sql functionsinterval(e.g. casting an integer column to an interval column) - this is because there is no timedelta type in PySpark