Skip to content
This repository has been archived by the owner on Jun 21, 2022. It is now read-only.

Can't set upper entry limit for pandas data frame #86

Closed
daniel-saunders opened this issue May 29, 2018 · 3 comments
Closed

Can't set upper entry limit for pandas data frame #86

daniel-saunders opened this issue May 29, 2018 · 3 comments

Comments

@daniel-saunders
Copy link

Hi. I may have discovered a bug. I was trying to export a TTree as a pandas data frame, with a fixed upper limit on the number of entries to load (when using the arrays method this works fine). I got an error:

return tree.pandas.df(["promptVolume"], entrystop = nEventsNeeded)
  File "/home/dsaunder/.local/lib/python2.7/site-packages/uproot/_connect/to_pandas.py", line 72, in df
    arrays = self._tree.arrays(newbranches, entrystart=entrystart, entrystop=entrystop, cache=cache, basketcache=basketcache, keycache=keycache, executor=executor)
UnboundLocalError: local variable 'newbranches' referenced before assignment

I had a quick look at the source, and i notice that the dict newbranches is only defined when entrystart and entrystop are set such to read the whole tree. Is this intended?

I installed uproot using pip, and am running python 2.7

@jpivarski
Copy link
Member

It does go into a special mode for Pandas when entrystart == 0 and entrystop == numentries. Pandas manages its own Numpy arrays, so the only way to fill a DataFrame without a copy is to let Pandas make its arrays and then fill a view of them. If you try to feed Pandas some pre-filled Numpy arrays, it does some sort of reorganization with lots of copying, which is slow. However, if entrystart and entrystop are not the whole array, we can't fill those Numpy arrays in-place because the filling process involves overfilling at basket boundaries and re-trimming, which would leaves stale data in the DataFrame. Therefore, we check for entrystart == 0 and entrystop == numentries because that's the only case that can be optimized: anything else is a slow copy (but correct!).

However, I don't see this newbranches variable anywhere (outside of iterate, which you're not using). What version are you using?

@daniel-saunders
Copy link
Author

Thanks, version is Version: 2.6.15

@jpivarski
Copy link
Member

Do a pip install -U uproot. The current version is 2.8.25.

I'm going to close this issue so that I don't forget, but if you do see the same or a similar bug in the new version, please do reopen it!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants