HDFStore: unable to create index, no error message #28156

adamjstewart · 2019-08-26T22:12:39Z

I was trying to follow the documentation at https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#indexing but ran into an unintuitive bug with HDFStore index creation. I thought I would report it in case someone else runs across this problem.

First, I create 2 dataframes and an HDFStore:

>>> import pandas as pd
>>> import numpy as np
>>> df_1 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
>>> df_2 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
>>> st = pd.HDFStore('appends.h5', mode='w')

Now, when I append, if I do:

>>> st.append('df', df_1, data_columns=['B'], index=False)
>>> st.append('df', df_2, data_columns=['B'], index=False)

I can successfully create an index:

>>> st.create_table_index('df', columns=['B'], optlevel=9, kind='full')
>>> st.get_storer('df').table
/df/table (Table(20,)) ''
  description := {
  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1),
  "B": Float64Col(shape=(), dflt=0.0, pos=2)}
  byteorder := 'little'
  chunkshape := (2730,)
  autoindex := True
  colindexes := {
    "B": Index(9, full, shuffle, zlib(1)).is_csi=True}

But if I instead leave out the data_columns:

>>> st.append('df', df_1, index=False)
>>> st.append('df', df_2, index=False)

no index is created:

>>> st.create_table_index('df', columns=['B'], optlevel=9, kind='full')
>>> st.get_storer('df').table
/df/table (Table(20,)) ''
  description := {
  "index": Int64Col(shape=(), dflt=0, pos=0),
  "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1)}
  byteorder := 'little'
  chunkshape := (2730,)

This is unintuitive for 2 reasons:

Why does HDFStore need to know the indexable columns during append and during create_table_index?
Why doesn't create_table_index raise an error message when it isn't able to create an index?

I think fixing either 1 or 2 would make things much more intuitive.

The text was updated successfully, but these errors were encountered:

arw2019 · 2020-06-24T21:28:38Z

I reproduce this bug exactly as above on the master version of pandas.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : 526f404 python : 3.8.3.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-106-generic Version : #107-Ubuntu SMP Thu Jun 4 11:27:52 UTC 2020 machine : x86_64 processor : byteorder : little LC_ALL : C.UTF-8 LANG : C.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+1940.g526f40431
numpy : 1.18.5
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 47.3.1.post20200616
Cython : 0.29.20
pytest : 5.4.3
hypothesis : 5.18.0
sphinx : 3.1.1
blosc : None
feather : None
xlsxwriter : 1.2.9
lxml.etree : 4.5.1
html5lib : 1.1
pymysql : 0.9.3
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.15.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.7.4
fastparquet : 0.4.0
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.17
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.48.0

arw2019 · 2020-06-25T04:01:01Z

I think that the answer is that you should not be allowed to read a column if you did not specify it as a data_column. See this doc:
https://pandas.pydata.org/pandas-docs/version/0.15.1/io.html

If everyone agrees with that then we should be raising an AttributeError when a user attempts this

jbrockmendel added the IO HDF5 read_hdf, HDFStore label Oct 16, 2019

mroeschke added the Bug label May 16, 2020

arw2019 mentioned this issue Jun 25, 2020

BUG: HDFStore unable to create colindex w/o error thrown #34983

Merged

5 tasks

jreback added the Error Reporting Incorrect or improved errors from pandas label Jun 25, 2020

jreback added this to the 1.1 milestone Jun 25, 2020

jreback closed this as completed in #34983 Jun 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDFStore: unable to create index, no error message #28156

HDFStore: unable to create index, no error message #28156

adamjstewart commented Aug 26, 2019

arw2019 commented Jun 24, 2020

arw2019 commented Jun 25, 2020

HDFStore: unable to create index, no error message #28156

HDFStore: unable to create index, no error message #28156

Comments

adamjstewart commented Aug 26, 2019

arw2019 commented Jun 24, 2020

arw2019 commented Jun 25, 2020