BUG: HDFStore.select_as_multiple doesn't respect start/stop kwargs #16209

Closed
JosephWagner opened this Issue May 3, 2017 · 4 comments

Comments

Projects
None yet
2 participants
@JosephWagner
Contributor

JosephWagner commented May 3, 2017

import pandas as pd

df = pd.DataFrame({"foo": [1, 2], "bar": [1, 2]})

with pd.HDFStore("foo.h5", 'w') as store:
    store.append_to_multiple({'selector': ['foo'], 'data': None}, df, selector='selector')
    single_row = store.select_as_multiple(['selector', 'data'], selector='selector', start=0, stop=1)
    assert len(single_row) == 1, "requested 1 row, got back {}".format(len(single_row))

Currently select_as_multiple returns the entire table. I would expect, if start=0 and stop=1, for just one row to be returned

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-642.15.1.el6.centos.plus.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 32.3.1.post20170108
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.8.0
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.5.3
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.4.0
numexpr: 2.6.1
matplotlib: None
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: 0.9.6
lxml: None
bs4: 4.5.3
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: 0.7.9.None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 3, 2017

Contributor

yep, looks like https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L829 needs to pass start/stop as well.

pull-requests welcome!

Contributor

jreback commented May 3, 2017

yep, looks like https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L829 needs to pass start/stop as well.

pull-requests welcome!

@jreback jreback added this to the Next Major Release milestone May 3, 2017

@JosephWagner

This comment has been minimized.

Show comment
Hide comment
@JosephWagner

JosephWagner May 6, 2017

Contributor

Hmm, so I'm attempting to submit a PR. I've added a new test that mirrors the example from above, but when I insert start/stop into line 829, I get the following test failure. I've spent some time digging around to see why this isn't working, but I can't quite figure out why. It looks like this exception is raised here, which means the 'where' variable (an int64index) is passed to self.generate. I don't know enough about the code to know if that's expected or not.

self = [0, 1]

    def evaluate(self):
        """ create and return the numexpr condition and filter """
    
        try:
            self.condition = self.terms.prune(ConditionBinOp)
        except AttributeError:
            raise ValueError("cannot process expression [{0}], [{1}] is not a "
>                            "valid condition".format(self.expr, self))
E           ValueError: cannot process expression [[0 1]], [[0, 1]] is not a valid condition

Contributor

JosephWagner commented May 6, 2017

Hmm, so I'm attempting to submit a PR. I've added a new test that mirrors the example from above, but when I insert start/stop into line 829, I get the following test failure. I've spent some time digging around to see why this isn't working, but I can't quite figure out why. It looks like this exception is raised here, which means the 'where' variable (an int64index) is passed to self.generate. I don't know enough about the code to know if that's expected or not.

self = [0, 1]

    def evaluate(self):
        """ create and return the numexpr condition and filter """
    
        try:
            self.condition = self.terms.prune(ConditionBinOp)
        except AttributeError:
            raise ValueError("cannot process expression [{0}], [{1}] is not a "
>                            "valid condition".format(self.expr, self))
E           ValueError: cannot process expression [[0 1]], [[0, 1]] is not a valid condition

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback May 6, 2017

Contributor

you have to step thru. This is an evaluation tree which can go pretty deep. However I think that all you need to do is make the change I suggested above.

Contributor

jreback commented May 6, 2017

you have to step thru. This is an evaluation tree which can go pretty deep. However I think that all you need to do is make the change I suggested above.

@JosephWagner

This comment has been minimized.

Show comment
Hide comment
@JosephWagner

JosephWagner May 9, 2017

Contributor

Ahh, I think I figured it out. I also needed to pass self.start/stop to this function call here: https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L1423

I'll submit a PR soon!

Contributor

JosephWagner commented May 9, 2017

Ahh, I think I figured it out. I also needed to pass self.start/stop to this function call here: https://github.com/pandas-dev/pandas/blob/master/pandas/io/pytables.py#L1423

I'll submit a PR soon!

@jreback jreback modified the milestones: 0.20.2, Next Major Release May 10, 2017

JosephWagner pushed a commit to JosephWagner/pandas that referenced this issue May 31, 2017

@jreback jreback closed this in #16317 May 31, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment