Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localized dates cause crash with load_table_columnar method #219

Closed
stets opened this issue Apr 22, 2019 · 4 comments · Fixed by #230
Closed

Localized dates cause crash with load_table_columnar method #219

stets opened this issue Apr 22, 2019 · 4 comments · Fixed by #230
Labels

Comments

@stets
Copy link

stets commented Apr 22, 2019

While attempting to load a table into OmniSci via pymapd's columnar load with con.load_table_columnar(mapdtable, df, preserve_index=False, col_names_from_schema=True)

I receive back:

Traceback (most recent call last):
  File "/main.py", line 135, in <module>
    main()
  File "/main.py", line 123, in main
    con.load_table_columnar(mapdtable, df, preserve_index=False, col_names_from_schema=True)
  File "/usr/local/lib/python3.6/site-packages/pymapd/connection.py", line 593, in load_table_columnar
    col_names=col_names
  File "/usr/local/lib/python3.6/site-packages/pymapd/_pandas_loaders.py", line 136, in build_input_columnar
    data = thrift_cast(data, mapd_type, 0)
  File "/usr/local/lib/python3.6/site-packages/pymapd/_pandas_loaders.py", line 84, in thrift_cast
    return datetime_to_seconds(data)
  File "/usr/local/lib/python3.6/site-packages/pymapd/_utils.py", line 26, in datetime_to_seconds
    elif arr.dtype == 'object' or 'datetime64[ns,' in arr.dtype:
TypeError: argument of type 'DatetimeTZDtype' is not iterable

This is from latest master release pymapd==0.10.1.dev11+ga3fd5b9

Replication

Create a table with a single column as type TIMESTAMP(0):

create table example(test TIMESTAMP(0))

Run the following python code:

import os 
import pymapd
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

mapdhost =  'localhost'
mapdport = '9090'
mapduser = 'mapd'
mapdpass =  'HyperInteractive')
mapddbname = 'mapd'
mapdprotocol = 'http'

mapdcon = pymapd.connect(user=mapduser, password=mapdpass, host=mapdhost, dbname=mapddbname, port=mapdport, protocol=mapdprotocol)

# create a dataframe of random dates
date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(7), freq='D')
np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'test': days})
# convert the col test to UTC localized
df['test'] = df['test'].dt.tz_localize('UTC')
# load table
mapdcon.load_table_columnar('example', df, preserve_index=False, col_names_from_schema=True)
@randyzwitch
Copy link
Contributor

Can you provide a reproducible example? Even a printed example of what the 'DatetimeTZDtype column looks like would be helpful.

@stets
Copy link
Author

stets commented Apr 22, 2019

Thanks @randyzwitch -- I'll get this replicated and upload some code to show the issue.

@stets stets changed the title load_table_columnar Failing in Master Localized dates cause crash with load_table_columnar method Apr 22, 2019
@stets
Copy link
Author

stets commented Apr 22, 2019

@randyzwitch Just added a bit more detail, I'm not sure if this is expected behavior or not since the columns in OmniSci are TIMESTAMP(0) types and the Dataframe objects have to be of type datetime64[ns, UTC]. I believe this worked in the previous versions.

@randyzwitch
Copy link
Contributor

We should think through what the desired behavior is here. In your example, you are converting to UTC to get a DatetimeTZDtype, but what should we do if the data aren't UTC? I think the answer is to do the conversion to UTC, because that's what the backend expects, but we'd need to double check this since silent casting of data is frequently undesired.

wamsiv added a commit that referenced this issue Jul 22, 2019
`np.dtype('datetime64[ns, UTC]')` is not a numpy type. Therefore, we were getting the type error. I have also added a proper conversion of of timestamps from various Timezones to 'UTC'. Since that is a standard which we follow in core.
Also added unit tests to validate and catch similar exception in future.

With new changes:
1) TIMESTAMP type with different timezones would be accepted and converted to GMT/UTC time which is assumed/followed in omnisci core
2) Increase unit tests coverage

Relates to #219
fixes #253
wamsiv added a commit that referenced this issue Jul 25, 2019
`np.dtype('datetime64[ns, UTC]')` is not a numpy type. Therefore, we were getting the type error. I have also added a proper conversion of of timestamps from various Timezones to 'UTC'. Since that is a standard which we follow in core.
Also added unit tests to validate and catch similar exception in future.

With new changes:
1) TIMESTAMP type with different timezones would be accepted and converted to GMT/UTC time which is assumed/followed in omnisci core
2) Increase unit tests coverage

Relates to #219
fixes #253
wamsiv added a commit that referenced this issue Jul 25, 2019
`np.dtype('datetime64[ns, UTC]')` is not a numpy type. Therefore, we were getting the type error. I have also added a proper conversion of of timestamps from various Timezones to 'UTC'. Since that is a standard which we follow in core.
Also added unit tests to validate and catch similar exception in future.

With new changes:
1) TIMESTAMP type with different timezones would be accepted and converted to GMT/UTC time which is assumed/followed in omnisci core
2) Increase unit tests coverage

Relates to #219
fixes #253
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants