Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: workaround PyTables 319, but not setting expected rows (GH8265, GH9676) #9681

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Mar 19, 2015

closes #8265
closes #9676

@jreback jreback added Bug IO HDF5 read_hdf, HDFStore labels Mar 19, 2015
@jreback jreback added this to the 0.16.0 milestone Mar 19, 2015
@jreback
Copy link
Contributor Author

jreback commented Mar 19, 2015

cc @rockg
cc @alexfields

give this a try and see if it fixed the issues you guys reported. lmk and can put this in 0.16.0

…GH9676)

     seems that setting expected rows casuses odd indexing issues in some cases
@rockg
Copy link
Contributor

rockg commented Mar 19, 2015

That did not work for me and actually returned fewer records than previously. With this change, 2804 records were returned versus 2892 previously (the actual number should be 2972 records).

@rockg
Copy link
Contributor

rockg commented Mar 19, 2015

Maybe important to note that I tested this on 0.14.0 and putting your change in as that is what I have at work. I don't know if there have been other substantive changes that might impact my test.

@alexfields
Copy link

My test agrees with @rockg - resaving my "problem" hdf via this commit and then reading with where statements gives slightly fewer lines than before (192334 vs 193757 previously, full file is 202836). Sorry @jreback! BTW in my case I am not using chunks or start/stop, I am selecting on the full file, and it still fails.

In the meantime I've just been calling ptrepack every time I save an HDF. I don't know if this always solves the problem but it has solved it every time I've tested, and gives an added bonus of speeding up on-disk selects anyway. As long as the tables saved by ptrepack are OK, this doesn't seem like such a bad workaround for now.

@jreback
Copy link
Contributor Author

jreback commented Mar 19, 2015

hmm ok
can u guys test with master on your dataset when u have a chance and lmk?

@alexfields
Copy link

Master 026a122 is giving me the same 193757 number as in 0.15.2 (different from the 7480a4b commit you sent earlier).

@jreback
Copy link
Contributor Author

jreback commented Mar 19, 2015

ok, seems that even though MY test worked, something else is going on. Ok will close this and can bug the PyTables guys to see if they can fix.

@jreback jreback closed this Mar 19, 2015
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: entries missing when reading from pytables hdf store using "where" statement HDF5 index corruption
3 participants