Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: selecting from HDFStore with a tz-aware level of a multi-index #11926

Closed
iyer opened this issue Dec 30, 2015 · 3 comments

Comments

@iyer
Copy link

commented Dec 30, 2015

I'm encountering a bug when I query for a multiindex dataframe with a timezoned DatetimeIndex in one of the multiindex levels.
This only happens

  1. for a multiindex with one of the levels as timestamps with timezones (As seen in [1]). If timestamps have no timezone set, there is no issue (As seen in [2])
  2. if the query returns no rows
  3. in pandas 0.17.* This was working fine in pandas 0.16.*
In [1]: periods = 10
   ...: dts = pd.date_range('20151201', periods=periods, freq='D', tz='UTC') #WITH TIMEZONE
   ...: mi = pd.MultiIndex.from_arrays([dts, range(periods)], names = ['DATE', 'NO'])
   ...: df = pd.DataFrame({'MYCOL':0}, index=mi)
   ...: file_path = 'table.h5'
   ...: key = 'mykey'
   ...: with pd.HDFStore(file_path, 'w') as store:
   ...:     store.append(key, df, format='table', append=True)
   ...:     dfres = store.select(key, where="""DATE > '20151220'""")
   ...:     print(dfres)
   ...: 
   ...: 
Traceback (most recent call last):

  File "<ipython-input-1-e0b7db50fd4d>", line 9, in <module>
    dfres = store.select(key, where="""DATE > '20151220'""")

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/io/pytables.py", line 669, in select
    return it.get_result()

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/io/pytables.py", line 1352, in get_result
    results = self.func(self.start, self.stop, where)

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/io/pytables.py", line 662, in func
    columns=columns, **kwargs)

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/io/pytables.py", line 4170, in read
    df = super(AppendableMultiFrameTable, self).read(**kwargs)

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/io/pytables.py", line 4029, in read
    df = concat(frames, axis=1, verify_integrity=False).consolidate()

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/tools/merge.py", line 813, in concat
    return op.get_result()

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/tools/merge.py", line 995, in get_result
    mgrs_indexers, self.new_axes, concat_axis=self.axis, copy=self.copy)

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/core/internals.py", line 4456, in concatenate_block_managers
    for placement, join_units in concat_plan]

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/core/internals.py", line 4456, in <listcomp>
    for placement, join_units in concat_plan]

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/core/internals.py", line 4553, in concatenate_join_units
    for ju in join_units]

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/core/internals.py", line 4553, in <listcomp>
    for ju in join_units]

  File "/export/data/anaconda/anaconda3.2.4/lib/python3.5/site-packages/pandas/core/internals.py", line 4801, in get_reindexed_values
    missing_arr = np.empty(self.shape, dtype=empty_dtype)

TypeError: data type not understood


In [2]: periods = 10
   ...: dts = pd.date_range('20151201', periods=periods, freq='D') #WITHOUT TIMEZONE
   ...: mi = pd.MultiIndex.from_arrays([dts, range(periods)], names = ['DATE', 'NO'])
   ...: df = pd.DataFrame({'MYCOL':0}, index=mi)
   ...: file_path = 'table.h5'
   ...: key = 'mykey'
   ...: with pd.HDFStore(file_path, 'w') as store:
   ...:     store.append(key, df, format='table', append=True)
   ...:     dfres = store.select(key, where="""DATE > '20151220'""")
   ...:     print(dfres)
   ...: 
   ...: 
Empty DataFrame
Columns: [MYCOL]
Index: []

In [3]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.1.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-431.11.2.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 19.1.1
Cython: 0.23.4
numpy: 1.10.2
scipy: 0.16.1
statsmodels: None
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.5.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.10
pymysql: None
psycopg2: None
Jinja2: None

@iyer iyer changed the title Saving multindex dataframe with datetimes to HDFStore Saving multindex dataframe with timestamps to HDFStore Dec 30, 2015

@jreback

This comment has been minimized.

Copy link
Contributor

commented Dec 30, 2015

So its the readback, not the writing. I think that its taking the wrong path on the dtype conversion.

import numpy as np
import pandas as pd

periods = 10
dts = pd.date_range('20151201', periods=periods, freq='D', tz='UTC') #WITH TIMEZONE
mi = pd.MultiIndex.from_arrays([dts, range(periods)], names = ['DATE', 'NO'])
df = pd.DataFrame({'MYCOL':0}, index=mi)

file_path = 'table.h5'
key = 'mykey'

with pd.HDFStore(file_path, 'w') as store:
   store.append(key, df, format='table', append=True)

print(pd.read_hdf(file_path, key))


dfres = pd.read_hdf(file_path, key, where="DATE > 20151220")
print(dfres)

@jreback jreback added this to the Next Major Release milestone Dec 30, 2015

@jreback jreback changed the title Saving multindex dataframe with timestamps to HDFStore BUG: selecting from HDFStore with a tz-aware level of a multi-index Dec 30, 2015

@mmongeon-aa

This comment has been minimized.

Copy link

commented Feb 18, 2016

Has there been any update to patch this? Any ideas on which commit broke this since 0.16* -> 0.17*?

I'm encountering the same issue when selecting datetime64[ns, tz] data using an iterator.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2016

there are vast changes to the way tz's work in 0.17 vs. 0.16. see the whatsnew here.

This is a relatively simple fix however. pull-requests are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.