New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presence of softlink in HDF5 file breaks HDFStore.keys() #20523

Closed
dworvos opened this Issue Mar 28, 2018 · 1 comment

Comments

Projects
None yet
2 participants
@dworvos
Contributor

dworvos commented Mar 28, 2018

Code Sample, a copy-pastable example if possible

#! /path/to/python3.6

import pandas as pd

df = pd.DataFrame({ "a": [1], "b": [2] })
print(df.to_string())

hdf = pd.HDFStore("/tmp/test.hdf", mode="w")
hdf.put("/test/key", df)

#Brittle
hdf._handle.create_soft_link(hdf._handle.root.test, "symlink", "/test/key")
hdf.close()
print("Successful write")

hdf = pd.HDFStore("/tmp/test.hdf", mode="r")
'''
Traceback (most recent call last):
  File "snippet.py", line 31, in <module>
    print(hdf.keys())
  File "python3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 529, in keys
    return [n._v_pathname for n in self.groups()]
  File "python3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 1077, in groups
    g for g in self._handle.walk_nodes()
  File "python3.6.3/lib/python3.6/site-packages/pandas/io/pytables.py", line 1078, in <listcomp>
    if (getattr(g._v_attrs, 'pandas_type', None) or
  File "python3.6.3/lib/python3.6/site-packages/tables/link.py", line 79, in __getattr__
    "`%s` instance" % self.__class__.__name__)
KeyError: 'you cannot get attributes from this `NoAttrs` instance'
'''
print(hdf.keys()) #causes exception
hdf.close()

print("Successful read")

Problem description

I know I have a esoteric problem, but I'm building an HDF5 file using Pandas and then using pytables to softlink to the Pandas dataframe. I understand this is unsupported and brittle but for my use case I haven't been able to come up with a better/simpler solution.

This issue is similar to: #6019

The root cause is when we call HDFStore.keys(), it calls HDFStore.groups() and eventually g._v_attrs on a Pytables File.

https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L1076

But calling g._v_attrs on a tables.link.SoftLink causes a KeyError due to:

https://github.com/PyTables/PyTables/blob/develop/tables/link.py#L76

And there doesn't look to be a way to guard against an instance of NoAttrs since that class is defined within the method. One solution may be to check the instance of g if it's a Link

        return [
            g for g in self._handle.walk_nodes()
            if (not isinstance(g, _table_mod.link.Link) and
                (getattr(g._v_attrs, 'pandas_type', None) or
                 getattr(g, 'table', None) or
                (isinstance(g, _table_mod.table.Table) and
                 g._v_name != u('table'))))
        ]

I'd be happy to write a PR and tests if you find this change acceptable.

Expected Output

   a  b
0  1  2
Successful write
['/test/key']
Successful read

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.21.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.utf-8
LANG: en_US.utf-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.5.1
Cython: None
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback

This comment has been minimized.

Contributor

jreback commented Mar 29, 2018

sure would take a patch to avoid an error on this

@jreback jreback added this to the Next Major Release milestone Mar 29, 2018

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Mar 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment