Optimize BetterChain::GetEntry to mimic the behavior of TTreeFormula. #162

xenoscopic · 2014-10-04T23:49:04Z

At present, BetterChain::GetEntry uses TChain::GetEntry, which calls down to TTree::GetEntry. Unfortunately, TTree::GetEntry iterates over ALL branches and calls TBranch::GetEntry, and even though TBranch::GetEntry is a no-op if the branch is not active, the iteration incurs considerable cost. This commit modifies BetterChain::GetEntry to manually iterate over only active branches and call TBranch::GetEntry. This mimics the behavior of TTreeFormula::EvalInstance, which only calls TBranch::GetEntry for required branches.

Given that BetterChain already re-implements a large portion of the functionality of TChain, the code complexity added here is not egregious. In the TTree::GetEntry documentation, manually calling TBranch::GetEntry is even an officially sanctioned alternative. This optimization only affects TTrees with a large number of branches, but the performance increase is substantial.

In tests on a TTree with 3,397,413 entries with 408 branches, I tried loading 14 branches of all entries using root2array, and I saw the following load times:

Without modification

11.925s
11.966s
11.857s
11.904s
11.968s

Average: 11.924s

With modification

4.910s
3.851s
3.879s
4.057s
3.887s

Average: 4.1168s

This is a ~2.9x speedup. While one might argue that this is a contrived case, the growing size and complexity of ntuples is going to make this type of tree more common, so I would argue that such an optimization is worth it. There is also 0 penalty to smaller trees, and even they will see some performance enhancement.

Unfortunately, perf doesn't reveal any more low-hanging fruit.

At present, BetterChain::GetEntry uses TChain::GetEntry, which calls down to TTree::GetEntry. Unfortunately, TTree::GetEntry iterates over ALL branches and calls TBranch::GetEntry, and while TBranch::GetEntry is a no-op if the branch is not active, the iteration incurs considerable cost. This commit modifies BetterChain::GetEntry to manually iterate over only active branches and call TBranch::GetEntry. This mimics the behavior of TTreeFormula::EvalInstance, which only loads required branches on-demand.

ndawe · 2014-10-05T00:17:44Z

@havoc-io this is a huge speedup. Brilliant.

Speed optimization is the next big step for root_numpy and this already gets us off to a good start.

There are also places where the cython code can still be optimized, especially in tight loops. Running make show-cython will display the annotated cython code in your browser where bright yellow lines compile into more expensive C++. Definitely room for improvement there.

I'm merging this. Thanks!

Optimize BetterChain::GetEntry to mimic the behavior of TTreeFormula.

xenoscopic · 2014-10-05T01:08:35Z

That's pretty neat.

I've been using perf to try to isolate the bottlenecks, basically just diffing the profiles of root2array with some TTreeFormula-based code (since the performance of the latter is pretty good). Unfortunately beyond TTree::GetEntry, there are no smoking guns.

BasicConverter::write seems to be the next optimizable culprit according to profiling, I suspect due to its memcpy call. For scalar branches, this is probably more efficiently replaced with an assignment, though I'm not sure how to write the necessary cast/assignment in Cython.

piti118 · 2014-10-05T02:50:22Z

Awesome!!!

ndawe added a commit that referenced this pull request Oct 5, 2014

Merge pull request #162 from havoc-io/fastload

3a958b7

Optimize BetterChain::GetEntry to mimic the behavior of TTreeFormula.

ndawe merged commit 3a958b7 into scikit-hep:master Oct 5, 2014

xenoscopic deleted the fastload branch October 5, 2014 00:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize BetterChain::GetEntry to mimic the behavior of TTreeFormula. #162

Optimize BetterChain::GetEntry to mimic the behavior of TTreeFormula. #162

xenoscopic commented Oct 4, 2014

ndawe commented Oct 5, 2014

xenoscopic commented Oct 5, 2014

piti118 commented Oct 5, 2014

Optimize BetterChain::GetEntry to mimic the behavior of TTreeFormula. #162

Optimize BetterChain::GetEntry to mimic the behavior of TTreeFormula. #162

Conversation

xenoscopic commented Oct 4, 2014

ndawe commented Oct 5, 2014

xenoscopic commented Oct 5, 2014

piti118 commented Oct 5, 2014