New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set up interface between Uproot and Awkward so that Awkward can be used to optimize object-reading. #96
Set up interface between Uproot and Awkward so that Awkward can be used to optimize object-reading. #96
Conversation
…ary-dependent choices. Also enables TBasket.array to be more like TBranch.array.
@tamasgal The good news is that, with this PR and scikit-hep/awkward#448, the following code takes 3.6 seconds instead of 194 seconds (53× faster): import uproot4
branch = uproot4.open("issue-90.root:E/Evt/trks/trks.fitinf")
for i in range(branch.num_baskets):
print(repr(branch.basket(i).array())) In the above, we're reading each TBasket individually. The bad news is that it still takes a long time to concatenate TBaskets because ak.concatenate has only been (internally) implemented for pairs: concatenating n arrays means creating and throwing away n - 2 temporary arrays of quadratically increasing size. That was fine for examples of concatenating 2 or 3 arrays at a time, but when it comes to concatenating data from this file's 461 TBaskets, it's a problem. Clearly, ak.concatenate needs to be fixed anyway. There might already be an issue open about it. (It's been on my mind for a while...) Anyway, I'll tackle that next. We want the conclusion of the above story to be that the whole array is produced in 3.7 seconds! |
Awesome! Many thanks, Jim, that's really a huge leap. |
…ds (78 times faster).
For cases that aren't covered by the new interpret-by-Awkward mechanism, how about parallelizing the pure Python interpretation?
So the baseline of 197 seconds wasn't unnecessarily harsh. It really is an 80× speedup. |
…that it has the possibility of being parallelized, with the cost of possibly interpreting entries that will later be trimmed.
Making use of this will require Awkward 0.2.37 (but it won't break with earlier versions), which is being deployed now. I'll do one last test after that deployment so that GHA pulls the new Awkward from PyPI. |
Thanks, that's really nice and helps a lot! |
As discussed in issue #90.
Corresponds to scikit-hep/awkward#448.