We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I was trying to follow the documentation at https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#indexing but ran into an unintuitive bug with HDFStore index creation. I thought I would report it in case someone else runs across this problem.
First, I create 2 dataframes and an HDFStore:
>>> import pandas as pd >>> import numpy as np >>> df_1 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB')) >>> df_2 = pd.DataFrame(np.random.randn(10, 2), columns=list('AB')) >>> st = pd.HDFStore('appends.h5', mode='w')
Now, when I append, if I do:
>>> st.append('df', df_1, data_columns=['B'], index=False) >>> st.append('df', df_2, data_columns=['B'], index=False)
I can successfully create an index:
>>> st.create_table_index('df', columns=['B'], optlevel=9, kind='full') >>> st.get_storer('df').table /df/table (Table(20,)) '' description := { "index": Int64Col(shape=(), dflt=0, pos=0), "values_block_0": Float64Col(shape=(1,), dflt=0.0, pos=1), "B": Float64Col(shape=(), dflt=0.0, pos=2)} byteorder := 'little' chunkshape := (2730,) autoindex := True colindexes := { "B": Index(9, full, shuffle, zlib(1)).is_csi=True}
But if I instead leave out the data_columns:
data_columns
>>> st.append('df', df_1, index=False) >>> st.append('df', df_2, index=False)
no index is created:
>>> st.create_table_index('df', columns=['B'], optlevel=9, kind='full') >>> st.get_storer('df').table /df/table (Table(20,)) '' description := { "index": Int64Col(shape=(), dflt=0, pos=0), "values_block_0": Float64Col(shape=(2,), dflt=0.0, pos=1)} byteorder := 'little' chunkshape := (2730,)
This is unintuitive for 2 reasons:
append
create_table_index
I think fixing either 1 or 2 would make things much more intuitive.
The text was updated successfully, but these errors were encountered:
I reproduce this bug exactly as above on the master version of pandas.
pandas : 1.1.0.dev0+1940.g526f40431 numpy : 1.18.5 pytz : 2020.1 dateutil : 2.8.1 pip : 20.1.1 setuptools : 47.3.1.post20200616 Cython : 0.29.20 pytest : 5.4.3 hypothesis : 5.18.0 sphinx : 3.1.1 blosc : None feather : None xlsxwriter : 1.2.9 lxml.etree : 4.5.1 html5lib : 1.1 pymysql : 0.9.3 psycopg2 : None jinja2 : 2.11.2 IPython : 7.15.0 pandas_datareader: None bs4 : 4.9.1 bottleneck : 1.3.2 fsspec : 0.7.4 fastparquet : 0.4.0 gcsfs : None matplotlib : 3.2.2 numexpr : 2.7.1 odfpy : None openpyxl : 3.0.3 pandas_gbq : None pyarrow : 0.17.1 pytables : None pyxlsb : None s3fs : 0.4.2 scipy : 1.4.1 sqlalchemy : 1.3.17 tables : 3.6.1 tabulate : 0.8.7 xarray : 0.15.1 xlrd : 1.2.0 xlwt : 1.3.0 numba : 0.48.0
Sorry, something went wrong.
I think that the answer is that you should not be allowed to read a column if you did not specify it as a data_column. See this doc: https://pandas.pydata.org/pandas-docs/version/0.15.1/io.html
data_column
If everyone agrees with that then we should be raising an AttributeError when a user attempts this
AttributeError
Successfully merging a pull request may close this issue.
I was trying to follow the documentation at https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#indexing but ran into an unintuitive bug with HDFStore index creation. I thought I would report it in case someone else runs across this problem.
First, I create 2 dataframes and an HDFStore:
Now, when I append, if I do:
I can successfully create an index:
But if I instead leave out the
data_columns
:no index is created:
This is unintuitive for 2 reasons:
append
and duringcreate_table_index
?create_table_index
raise an error message when it isn't able to create an index?I think fixing either 1 or 2 would make things much more intuitive.
The text was updated successfully, but these errors were encountered: