v6.22 Buffer merger improvement #7339

pcanal · 2021-03-03T01:03:24Z

See discussion at #7286

This set of improvements to TBufferMerger (and more) was inspired by the terrible performance of the parallel merging (and fast merging in general) in the case where the TTree has a very large number of branches (1000+).

Where in the original version a TBufferMerger with a file with 1000+ branches and only 50s and ran with any number of threads would take more than 3m (did not wait until the end) the new version takes 11s with 1 thread, 8s with 6 threads and 22s with 6 thread when increased to 500 events. (using the CMS file ../data//250202_181_RECO.root).

This PR includes:

Skipping the boxing/compressing/uncompressing/unboxing cycle if the TBufferMerger is available (not already merging) when the thread is writing its TMemFile.
Skipping SetBranchAddress and SetMakeClass in when doing fast cloning (where that information is not used anyway).
Replacing calling to the slow TObjArray::GetEntries (which counts the slot used) by calling GetEntriesFast.
Speeding up the GetMother implementation (caching parent's address sooner when reading, use that information in GetMother).
In fast cloning, delay writing the output until the last input is processed (instead of writing the output after each input),
Optimization of fast cloning handing of empty write basket.

It also contains a couple of bug fixes (RNtuple merging forgetting to merge the result of the objects in the file, iofeatures incorrectly cloned).

Also made TBufferMerger::GetQueueSize actually thread safe (it is necessary to use it to stop the producer from adding more data if the queue is too full).

Also improved TClass::GetBaseClassOffset parallelism (benefit parallel boxing/unboxing) and reduced/removed contention see in the mechanism use to determine the actual object type at run-time (TIsAProxy).

... instead of just always ignoring it

fParent was already set during the usual branch creation (TTree::Branch and friends) but might still be not set if the user create a hierarchy by hand

When fMother is not set, first try to use fParent rather than iterate through the TTree's list of top level branches

This allow the fast cloning of the first tree in the list when using TTree::MergeTrees (rare)

This speeds up the first step of merging (hadd)

This can now be safely use to avoid overloading the queue.

This can be used to monitor the queue size in bytes rather than number of files

We do not apply the same to CloneTree as the addresses copying is a side-effect (no reset at the end) that is relied upon by user code.g

Limited to TTree and Histogram.

TObjArray::GetEntries() counts the number of non-zero slots while TObjArray::GetEntriesFast() return (in most case) the value of a high water mark (the slot over which all remaining slots are empty). In one use case (doing read a TTree and do many TTree cloning (via TBufferMerger), TObjArray::GetEntries was taking 8% of the running time (compression was disabled) Note: TObjArray::IsEmpty use GetEntriesFast rather than GetEntries.

For iteration over the list of baskets, the existing code try to reduce the range of the iteration to stop as soon as all non-nullptr slot have been seen. However to know the number of non-nullptr slot, the code (has to) call GetEntries which scan the entire array anyway ...

Currently only in the case where we have a Merge function and a dictionary.

If TBufferMerger can take the merge lock, then rather than queuing the input file, merge it immediately and skip the Writing the TTree, doing a memcpy, Pushing to the queue then Reading and then deleting parts.

Add new enum for TObject::Write: kOnlyPrepStep = BIT(3) ///< Just prepare the objects do not write them (i.e. call TTree::FlushBaskets)

TClass::GetBaseClassOffset is a 'often' used routine in the I/O. In the fast path (looking up in a cache), rather than taking the global read lock (which is not only global but also 'slower' than a regular mutex), we are now use a instance specific mutex to protect the cache.

… file

To save space, the compression can be re-enabled by TBuffferMerger::SetCompressTemporaryKeys. The TBaskets are still compressed and thus the end result is unchanged.

This allows to significant reduce the number of search through the list of cached result ... That search was a noticeable cause of slow down with an increase of number threads due to the contention on updating the number of readers (used for the local read/write lock mechanism).

Since fClass is modified only 'once' at init time of the IsAProxy object, it is actual more efficient to protect the initialization by a lock rather than having fClass being atomic.

…tween now and the next search).

This is necessary to support the 'fast' incremental merge where the input will continue to be used.

This enum will be removed shortly (functionality moved from TObject::Write to *::Merge)

phsft-bot · 2021-03-03T01:03:34Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

pcanal · 2021-03-03T01:03:43Z

@phsft-bot build with flags -DCTEST_TEST_EXCLUDE_NONE=On

phsft-bot · 2021-03-03T01:03:49Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14 with flags -DCTEST_TEST_EXCLUDE_NONE=On
How to customize builds

pcanal added 30 commits March 2, 2021 16:20

Shortcut in TTree::GetBranch for exact match to top level branches

31f1aff

Actually give priority to the TFileMergeInfo's feature request

6ad3249

... instead of just always ignoring it

Remove stray duplicate line in TBranch::GetSubBranch

b4e01e6

Set TBranch::fParent when reading.

ee14793

fParent was already set during the usual branch creation (TTree::Branch and friends) but might still be not set if the user create a hierarchy by hand

Speed-up TBranch::GetMother.

bbc2544

When fMother is not set, first try to use fParent rather than iterate through the TTree's list of top level branches

TTree::MergeTrees pass options to first Cloning.

a88e070

This allow the fast cloning of the first tree in the list when using TTree::MergeTrees (rare)

Add TTree::InPlaceClone.

534c5fd

This speeds up the first step of merging (hadd)

Make TBufferMerger::GetQueueSize thread safe.

509c4f3

This can now be safely use to avoid overloading the queue.

Add TBufferMerger::GetBuffered.

0326479

This can be used to monitor the queue size in bytes rather than number of files

Do not call CopyAddresses during TTree::Merge fast cloning operation.

cff5d2b

We do not apply the same to CloneTree as the addresses copying is a side-effect (no reset at the end) that is relied upon by user code.g

TFileMerger allow to delay the object writes.

81a07d3

Limited to TTree and Histogram.

In TFileMerger use obj in memory if available.

2cb23c1

Currently only in the case where we have a Merge function and a dictionary.

Allow TBufferMerger to skip write/read cycle.

2acb2d2

If TBufferMerger can take the merge lock, then rather than queuing the input file, merge it immediately and skip the Writing the TTree, doing a memcpy, Pushing to the queue then Reading and then deleting parts.

TFileMerger: Use obj in memory even if it has no key (yet)

93faf48

Tree: micro-optimization

dde4b8f

TBranch::AddBasket allow to replace an empty write basket

e71acdd

AddLastBasket: do not update first basket start entry

2e1d66e

TTreeCloner: do not clone empty basket

5948e9a

TBufferMergerFile: Call FlushBasket in unlocked thread.

1fce8bf

Add new enum for TObject::Write: kOnlyPrepStep = BIT(3) ///< Just prepare the objects do not write them (i.e. call TTree::FlushBaskets)

Mark TBufferMerger::fBuffered as atomic

8c4a8c3

TBufferMerger: Allow to disable explict call to Flush when no tree in…

b5762f0

… file

Skip compression of the top TKeys in the TMemFile by default.

3dd0685

To save space, the compression can be re-enabled by TBuffferMerger::SetCompressTemporaryKeys. The TBaskets are still compressed and thus the end result is unchanged.

TIsAProxy switch fClass to non-atomic.

bf2fa1f

Since fClass is modified only 'once' at init time of the IsAProxy object, it is actual more efficient to protect the initialization by a lock rather than having fClass being atomic.

Sort data member of TIsAProxy

f427508

TIsAProxy: Don't record failed searches (a library might be loaded be…

3ac983e

…tween now and the next search).

Inline TBufferMerger::GetBuffered

853c76e

pcanal added 6 commits March 2, 2021 16:30

Fix typo in comments

2742c6f

TFileMerger: only delete the directory if the merger created it.

12d31d8

This is necessary to support the 'fast' incremental merge where the input will continue to be used.

TFileMerger delete directory only if we induced its creation/loading

33b0a4f

[NFC] Coding style

e5b7783

[NFC] Improve documentation (TTreeCloner, TBufferMerger, TFileMerger)

9ce4ecc

'Hide' kOnlyPrepStep as a protected member of TObject.

13b0a2e

This enum will be removed shortly (functionality moved from TObject::Write to *::Merge)

pcanal merged commit 7966dee into root-project:v6-22-00-patches Mar 3, 2021

pcanal deleted the v622-BufferMergerImprove branch March 3, 2021 05:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v6.22 Buffer merger improvement #7339

v6.22 Buffer merger improvement #7339

Uh oh!

pcanal commented Mar 3, 2021 •

edited

Loading

Uh oh!

phsft-bot commented Mar 3, 2021

Uh oh!

pcanal commented Mar 3, 2021

Uh oh!

phsft-bot commented Mar 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

v6.22 Buffer merger improvement #7339

v6.22 Buffer merger improvement #7339

Uh oh!

Conversation

pcanal commented Mar 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phsft-bot commented Mar 3, 2021

Uh oh!

pcanal commented Mar 3, 2021

Uh oh!

phsft-bot commented Mar 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pcanal commented Mar 3, 2021 •

edited

Loading