Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add support for date/time operations in PySpark backend #1974

Merged
merged 6 commits into from
Sep 26, 2019

Conversation

hjoo
Copy link
Contributor

@hjoo hjoo commented Sep 21, 2019

Implemented date/time operations for PySpark backend to pass all supported tests in ibis/tests/all/test_temporal.py.

Features that are supported:

  • Date
  • Timestamp
  • Date and time extraction
  • Date and timestamp truncation
  • Timestamp now
  • String to timestamp
  • Timestamp to formatted string - strftime
  • Day of week index
  • Day of week name
  • Date add, subtract
  • Timestamp add, subtract
  • Interval add, subtract

Additional operations implemented to pass test_temporal.py:

  • cast
  • limit
  • array index

Features that are not supported:

  • ms, us, ns time units - unsupported in native PySpark sql functions
  • non-literal interval (e.g. casting an integer column to an interval column) - this is because there is no timedelta type in PySpark
  • Date and timestamp diff - again, no support for time delta type

Copy link
Contributor

@icexelloss icexelloss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. I think there are a few places we can simplify the implementation. Please see comments.

@pep8speaks
Copy link

pep8speaks commented Sep 23, 2019

Hello @hjoo! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-09-25 20:56:38 UTC

@icexelloss
Copy link
Contributor

@hjoo One small comments here #1974 (comment) otherwise LGTM

Copy link
Contributor

@icexelloss icexelloss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hjoo
Copy link
Contributor Author

hjoo commented Sep 24, 2019

@toryhaavik @DiegoAlbertoTorres can you look over and help merge?

Copy link
Contributor

@toryhaavik toryhaavik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small changes

ibis/pyspark/compiler.py Show resolved Hide resolved
ibis/pyspark/compiler.py Show resolved Hide resolved
ibis/pyspark/compiler.py Show resolved Hide resolved
ibis/pyspark/compiler.py Show resolved Hide resolved
ibis/pyspark/compiler.py Show resolved Hide resolved
@toryhaavik toryhaavik merged commit eb1b6a0 into ibis-project:master Sep 26, 2019
costrouc pushed a commit to costrouc/ibis that referenced this pull request Oct 10, 2019
Implemented date/time operations for PySpark backend to pass all
supported tests in `ibis/tests/all/test_temporal.py`.    Features that
are supported:  - `Date`  - `Timestamp`  - Date and time extraction  -
Date and timestamp truncation  - Timestamp now  - String to timestamp
- Timestamp to formatted string - `strftime`  - Day of week index  -
Day of week name  - Date add, subtract  - Timestamp add, subtract  -
Interval add, subtract    Additional operations implemented to pass
`test_temporal.py`:  - `cast`  - `limit`  - `array` index    Features
that are not supported:  - `ms`, `us`, `ns` time units - unsupported
in native PySpark sql functions  - non-literal `interval` (e.g.
casting an integer column to an interval column) - this is because
there is no timedelta type in PySpark  - Date and timestamp diff -
again, no support for time delta type
Author: Hyonjee <hyonjee.joo@twosigma.com>

Closes ibis-project#1974 from hjoo/pyspark-temporal and squashes the following commits:

8ee0578 [Hyonjee] xfail failing param mutate test for pyspark backend
142daae [Hyonjee] minor refactor in pyspark compile_limit
2dc7e62 [Hyonjee] pyspark backend: rename extract_x_from_datetime to extract_component_from_datetime
42ddc34 [Hyonjee] switch pyspark client day_of_week_index and day_of_week_name to use pandas dt functions
f6bcff1 [Hyonjee] temporal operations cleanup, basic test for casting
c163ff0 [Hyonjee] PySpark backend temporal operations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants