Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add way to write DATE types to Hyper #100

Closed
mhadi813 opened this issue May 7, 2020 · 16 comments · Fixed by #220
Closed

ENH: Add way to write DATE types to Hyper #100

mhadi813 opened this issue May 7, 2020 · 16 comments · Fixed by #220
Labels
enhancement New feature or request

Comments

@mhadi813
Copy link

mhadi813 commented May 7, 2020

I'm trying to write a dataframe that contains datetime.date object to hyper using pantab.frame_to_hyper method and it raisers TypeError.

Steps to reproduce the problem:

import pandas as pd
import datetime
date = datetime.date(2020,5,8)
df = pd.DataFrame({'Date': [date,date,date], 'Col' : list('ABC') })
df.head()
df.info()
import pantab
from tableauhyperapi import TableName
table = TableName('Extract','Extract')
pantab.frame_to_hyper(df, 'random_db.hyper', table=table)

=> TypeError: Invalid value "datetime.date(2020, 5, 8)" found (row 0 column 0)

converting datetime.date to pd.datetime solves the problem
df.iloc[0,0]
df['Date'] = pd.to_datetime(df['Date'])
pantab.frame_to_hyper(df, 'random_db.hyper', table=table)

other info:
OS: macOS Catalina 10.15.3
pandas version 1.0.0
pantab version 1.1.0

Thanks

Hadi

@WillAyd
Copy link
Collaborator

WillAyd commented May 7, 2020

Thanks for the report. This is “by design” in today’s world because there isnt a first class dtype in pandas for dates.

Your workaround is the suggested approach, though if you really want date and not date time in the extract it falls short. I think could use a keyword argument that allows you to explicitly store date time dtypes as dates - interested in trying a PR for that?

@WillAyd WillAyd changed the title TypeError: Invalid value datetime.date No way to write DATE types to Hyper May 8, 2020
@WillAyd WillAyd changed the title No way to write DATE types to Hyper ENH: Add way to write DATE types to Hyper May 8, 2020
@WillAyd WillAyd added the enhancement New feature or request label May 8, 2020
@mhadi813
Copy link
Author

Thanks will, i'll make a PR.

@mhadi813
Copy link
Author

I'm trying to make a PR for kwargs for casting datetime.date to pd.datetime. Can you grant me permission? Thanks

def frame_to_hyper(
df: pd.DataFrame,
database: Union[str, pathlib.Path],
*,
table: pantab_types.TableType,
table_mode: str = "w",
**kwargs: Union[str, list]
) -> None:
"""See api.rst for documentation"""
if 'date_column' in kwargs:
date_column = kwargs.get('date_column')
if isinstance(date_column, list):
for col in date_column:
df[col] = pd.to_datetime(df[col])
elif isinstance(date_column, str):
df[date_column] = pd.to_datetime(df[date_column])

@WillAyd
Copy link
Collaborator

WillAyd commented May 10, 2020 via email

@WillAyd
Copy link
Collaborator

WillAyd commented Jun 18, 2020

So there is a discussion of adding this as a type upstream in pandas:

pandas-dev/pandas#32473

I think any work we do here would have to wait on that, so let's see if that gets traction

@joshuataylor
Copy link

The date field seems to have stalled in pandas, can this be considered again?

We have a fair few dates in our project, and would love to use pantab for this.

@WillAyd
Copy link
Collaborator

WillAyd commented Apr 11, 2022

@joshuataylor have you looked at hyperarrow? It is a similar tool but with arrow as a back end you get first class DATE support

https://hyperarrow.readthedocs.io/en/latest/

@joshuataylor
Copy link

I didn't know that library existed, awesome work 😍 . Will give it a go.

@jstrauss18
Copy link

Is this still open? Running into this issue right now using pandas.

TypeError: Invalid value "datetime.date(2023, 10, 5)" found (row 0 column 5)

@WillAyd
Copy link
Collaborator

WillAyd commented Oct 23, 2023

@jstrauss18 your column dtype is likely object. If you want to write time stamps make sure you use a datetime dtype column. Pandas does not natively support plain DATE types (pyarrow does, but pantab currently does not leverage pyarrow types)

@jstrauss18
Copy link

jstrauss18 commented Oct 23, 2023

Not sure what to do. I'm using databricks delta sharing to load data frame and I don't name the columns.

df

@WillAyd
Copy link
Collaborator

WillAyd commented Oct 23, 2023

Sorry I'm not familiar with databricks so can't give specific advice. You might want to try StackOverflow for something more tailored. Most I/O methods in pandas provide a parse_dates= argument that you can use when inferencing is not correct, although there may be something more foundational to be fixed with your code

As a hack you could try df.iloc[:, 5] = pd.to_datetime(df.iloc[:, 5]) since the traceback says its the fifth column where you are having an issue. But beyond that I would try StackOverflow or a Databricks support forum

@mohamedhamnache
Copy link

mohamedhamnache commented Jan 9, 2024

Hello,
I am facing the same problem. I moved my application from using native tableau server API to convert my CSV files to hyper in order gain more performances in terms of conversion time. However, my dataset contains DATE format. The dates are converted ton datetime. DATE format is needed. Any news about this issue ?
@WillAyd

@WillAyd
Copy link
Collaborator

WillAyd commented Jan 9, 2024

Your best bet will be the keep track of the pantab 4.0 development which will be a significant overhaul of the code base

@mohamedhamnache
Copy link

Your best bet will be the keep track of the pantab 4.0 development which will be a significant overhaul of the code base

Any idea about the release date and who is handling this

@WillAyd
Copy link
Collaborator

WillAyd commented Jan 11, 2024

I am maintaining a checklist of things in #219 - feel free to comment there or ask questions.

As far as a release date...I do not know. I am looking at using some new technology so there are many variables at play. This being an open source project things get developed as myself or anyone in the community has time and interest, which also adds another layer. The best thing I can say is "maybe a couple of months" but without any guarantee :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants