Skip to content
This repository has been archived by the owner on Jan 7, 2023. It is now read-only.

Optimize BetterChain::GetEntry to mimic the behavior of TTreeFormula. #162

Merged
merged 1 commit into from
Oct 5, 2014

Conversation

xenoscopic
Copy link
Contributor

At present, BetterChain::GetEntry uses TChain::GetEntry, which calls down to TTree::GetEntry. Unfortunately, TTree::GetEntry iterates over ALL branches and calls TBranch::GetEntry, and even though TBranch::GetEntry is a no-op if the branch is not active, the iteration incurs considerable cost. This commit modifies BetterChain::GetEntry to manually iterate over only active branches and call TBranch::GetEntry. This mimics the behavior of TTreeFormula::EvalInstance, which only calls TBranch::GetEntry for required branches.

Given that BetterChain already re-implements a large portion of the functionality of TChain, the code complexity added here is not egregious. In the TTree::GetEntry documentation, manually calling TBranch::GetEntry is even an officially sanctioned alternative. This optimization only affects TTrees with a large number of branches, but the performance increase is substantial.

In tests on a TTree with 3,397,413 entries with 408 branches, I tried loading 14 branches of all entries using root2array, and I saw the following load times:

Without modification

  • 11.925s
  • 11.966s
  • 11.857s
  • 11.904s
  • 11.968s

Average: 11.924s

With modification

  • 4.910s
  • 3.851s
  • 3.879s
  • 4.057s
  • 3.887s

Average: 4.1168s

This is a ~2.9x speedup. While one might argue that this is a contrived case, the growing size and complexity of ntuples is going to make this type of tree more common, so I would argue that such an optimization is worth it. There is also 0 penalty to smaller trees, and even they will see some performance enhancement.

Unfortunately, perf doesn't reveal any more low-hanging fruit.

At present, BetterChain::GetEntry uses TChain::GetEntry, which calls
down to TTree::GetEntry.  Unfortunately, TTree::GetEntry iterates over
ALL branches and calls TBranch::GetEntry, and while TBranch::GetEntry is
a no-op if the branch is not active, the iteration incurs considerable
cost.  This commit modifies BetterChain::GetEntry to manually iterate
over only active branches and call TBranch::GetEntry.  This mimics the
behavior of TTreeFormula::EvalInstance, which only loads required
branches on-demand.
@ndawe
Copy link
Member

ndawe commented Oct 5, 2014

@havoc-io this is a huge speedup. Brilliant.

Speed optimization is the next big step for root_numpy and this already gets us off to a good start.

There are also places where the cython code can still be optimized, especially in tight loops. Running make show-cython will display the annotated cython code in your browser where bright yellow lines compile into more expensive C++. Definitely room for improvement there.

I'm merging this. Thanks!

ndawe added a commit that referenced this pull request Oct 5, 2014
Optimize BetterChain::GetEntry to mimic the behavior of TTreeFormula.
@ndawe ndawe merged commit 3a958b7 into scikit-hep:master Oct 5, 2014
@xenoscopic xenoscopic deleted the fastload branch October 5, 2014 00:51
@xenoscopic
Copy link
Contributor Author

That's pretty neat.

I've been using perf to try to isolate the bottlenecks, basically just diffing the profiles of root2array with some TTreeFormula-based code (since the performance of the latter is pretty good). Unfortunately beyond TTree::GetEntry, there are no smoking guns.

BasicConverter::write seems to be the next optimizable culprit according to profiling, I suspect due to its memcpy call. For scalar branches, this is probably more efficiently replaced with an assignment, though I'm not sure how to write the necessary cast/assignment in Cython.

@piti118
Copy link
Contributor

piti118 commented Oct 5, 2014

Awesome!!!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants