Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] datetime column (pd.DataFrame) returned in Transformer is causing spark error #50

Closed
goodwanghan opened this issue Sep 27, 2020 · 1 comment · Fixed by #54
Closed

Comments

@goodwanghan
Copy link
Collaborator

Minimal Code To Reproduce

import pandas as pd

# schema: a:datetime
def t(sdf:pd.DataFrame) -> pd.DataFrame:
    sdf["a"]=pd.to_datetime(sdf["a"])
    return sdf

with FugueSQLWorkflow(SparkExecutionEngine()) as dag:
    dag.df([["2020-01-01"]], "a:str").transform(t).show()

Describe the bug

TypeError: field a: TimestampType can not accept object Timestamp('2020-01-01 00:00:00') in type <class 'pandas._libs.tslibs.timestamps.Timestamp'>

Expected behavior
This should work.
Should add datetime tests into general execution engine test suites.

Environment (please complete the following information):

  • Backend: pandas/dask/ray? spark
  • Backend version: 3
  • Python version: 3.6.9
  • OS: linux/windows linux
@goodwanghan goodwanghan added bug Something isn't working spark labels Sep 27, 2020
@goodwanghan goodwanghan added this to the 0.4.1 milestone Sep 27, 2020
@goodwanghan goodwanghan added version dependent and removed bug Something isn't working labels Sep 27, 2020
@goodwanghan
Copy link
Collaborator Author

This problem seems only affects pyarrow<0.15.1

@goodwanghan goodwanghan linked a pull request Sep 27, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant