-
Notifications
You must be signed in to change notification settings - Fork 67
tree.pandas.df() with branches==None AttributeError #102
Comments
|
I have no idea how to go about figuring out the issue, but adding
tests/test_issues.py At least captures the error |
Thanks for catching this— I'll fix it as soon as possible. It's not a mysterious bug— but it illustrates that I need to systemize some of the special case handing for different branch types. You're getting this error because some of your branches have string type and the DataFrame-handling code doesn't handle that case. You can avoid this error (for now) by reading the data as arrays and converting them into a DataFrame:
The thing you would be missing by doing this is the ability to flatten jagged data into a DataFrame with a MultiIndex. (It's equivalent to I'll fix it as soon as possible (but it could be a week— on vacation). |
Perfect, no problem. Thanks for the hard work! Enjoy the vacation. |
An interesting observation on some 100M plain (not jagged etc) root files.
This is a pretty notable difference in speed and seems like the first should be faster but maybe has other stuff behind the scenes that slow it down. |
That's weird. The The other confounding variable here is constructing the DataFrame, which has performance characteristics that are mysterious to me. My prescription of setting If it turns out to be |
And the plot thickens with not understanding how the heck data frames are constructed. In my case
|
(Seeing as I'm traveling with my family, I can't try these things myself, but I can keep giving you suggestions of things to try. I don't know, however, if your physics case really needs more performance on reading these DataFrames. As a library developer, I'm on the lookout for speedups, but if your focus is on physics, the difference between a minute and half a minute isn't that different.) I suggested using import collections
df = pandas.DataFrame(tree.arrays(outputtype=collections.OrderedDict)) When uproot fills an OrderedDict, it does so in the TTree's natural branch order. But then again, maybe the column order doesn't matter to you. :) |
Thanks, I really only point these out for interesting things as a method to more fully understand what is at the bottom of the whole system. Seems like the actual bug is easy to fix when you return and that the rest is really file it away deep in the brain as a "oh I remember that" when it comes back as enhancements or someone's application requires more speed that currently is there. As you point out that is not me currently other than being nerd driven. My particular love of this package is driven by moving away from root as early in my processing as possible and enabling me to use tools I am more comfortable with. I'm not a HEP guy but a space physics guy using geant for instrument responses. |
…ther objects that use JaggedArrays to define their structure without being jagged conceptually
… really are jagged (as opposed to strings)
I got up before everyone else and fixed the original bug that started this thread. I couldn't find the performance difference, but I don't have your file. Considering the changes that are in store for this bit of code, however, it might not be worth tuning it until after the awkward-arrays are in. |
This seems like it crept in recently, as it used to work.
... works
The text was updated successfully, but these errors were encountered: