Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTreeCache scale (very) poorly with number of baskets/clusters. #12649

Closed
pcanal opened this issue Apr 12, 2023 · 24 comments · Fixed by #12650
Closed

TTreeCache scale (very) poorly with number of baskets/clusters. #12649

pcanal opened this issue Apr 12, 2023 · 24 comments · Fixed by #12650

Comments

@pcanal
Copy link
Member

pcanal commented Apr 12, 2023

As described in https://root-forum.cern.ch/t/interactively-working-with-large-ntuple-tree-files-discovered-big-difference-between-old-and-new-root-versions-in-speed-7s-vs-1-hour-response-time/54312, when a file/TTree is created such that the cluster is actually pretty small (in bytes) and thus the number of cluster in the file is very larges (100,000+ clusters), the TTreeCache scale poorly (since v6.14/00).

In particular in the described case, the run-time increase from a few second (7s) to more than one hour.

@pcanal pcanal added this to the 6.28/04 milestone Apr 12, 2023
@pcanal pcanal self-assigned this Apr 12, 2023
pcanal added a commit to pcanal/root that referenced this issue Apr 12, 2023
Prior to this change, the cache of which basket to start the search
on would restart from the first basket at each cluster iteration
(i.e. for filling the cache with with clusters)

On an extreme example:
        15,272,928 entries
           152,739 baskets (and as many clusters)
            10,000 Actual TTreeCache buffer size (minimum allowed)
             8,442 estimated buffer size of TTreeCache (1.5 times compressed buffer size)
               400 bytes per baskets
               100 entries per baskets (i.e. per clusters)
                25 number of cluster per TTreeCache buffer for single branch with default size.
                 1 float per entry (reading a single branch).

This ends up repairing the performance of a simple `TTree::Draw` of a single branch
from 1 hour back down to 7s (performance seem in v6.12).

This correct an issue introduced by commit 73f6223 first since in v6.14/00

This fixes root-project#12649
pcanal added a commit to pcanal/root that referenced this issue Apr 12, 2023
Prior to this change, the cache of which basket to start the search
on would restart from the first basket at each cluster iteration
(i.e. for filling the cache with with clusters)

On an extreme example:
        15,272,928 entries
           152,739 baskets (and as many clusters)
            10,000 Actual TTreeCache buffer size (minimum allowed)
             8,442 estimated buffer size of TTreeCache (1.5 times compressed buffer size)
               400 bytes per baskets
               100 entries per baskets (i.e. per clusters)
                25 number of cluster per TTreeCache buffer for single branch with default size.
                 1 float per entry (reading a single branch).

This ends up repairing the performance of a simple `TTree::Draw` of a single branch
from 1 hour back down to 7s (performance seem in v6.12).

This correct an issue introduced by commit 73f6223 first since in v6.14/00

This fixes root-project#12649
pcanal added a commit to pcanal/root that referenced this issue Apr 12, 2023
Prior to this change, the cache of which basket to start the search
on would restart from the first basket at each cluster iteration
(i.e. for filling the cache with with clusters)

On an extreme example:
        15,272,928 entries
           152,739 baskets (and as many clusters)
            10,000 Actual TTreeCache buffer size (minimum allowed)
             8,442 estimated buffer size of TTreeCache (1.5 times compressed buffer size)
               400 bytes per baskets
               100 entries per baskets (i.e. per clusters)
                25 number of cluster per TTreeCache buffer for single branch with default size.
                 1 float per entry (reading a single branch).

This ends up repairing the performance of a simple `TTree::Draw` of a single branch
from 1 hour back down to 7s (performance seem in v6.12).

This correct an issue introduced by commit 73f6223 first since in v6.14/00

This fixes root-project#12649
pcanal added a commit that referenced this issue Apr 19, 2023
Prior to this change, the cache of which basket to start the search
on would restart from the first basket at each cluster iteration
(i.e. for filling the cache with with clusters)

On an extreme example:
        15,272,928 entries
           152,739 baskets (and as many clusters)
            10,000 Actual TTreeCache buffer size (minimum allowed)
             8,442 estimated buffer size of TTreeCache (1.5 times compressed buffer size)
               400 bytes per baskets
               100 entries per baskets (i.e. per clusters)
                25 number of cluster per TTreeCache buffer for single branch with default size.
                 1 float per entry (reading a single branch).

This ends up repairing the performance of a simple `TTree::Draw` of a single branch
from 1 hour back down to 7s (performance seem in v6.12).

This correct an issue introduced by commit 73f6223 first since in v6.14/00

This fixes #12649
@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

1 similar comment
@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

pcanal added a commit to pcanal/root that referenced this issue Apr 21, 2023
Prior to this change, the cache of which basket to start the search
on would restart from the first basket at each cluster iteration
(i.e. for filling the cache with with clusters)

On an extreme example:
        15,272,928 entries
           152,739 baskets (and as many clusters)
            10,000 Actual TTreeCache buffer size (minimum allowed)
             8,442 estimated buffer size of TTreeCache (1.5 times compressed buffer size)
               400 bytes per baskets
               100 entries per baskets (i.e. per clusters)
                25 number of cluster per TTreeCache buffer for single branch with default size.
                 1 float per entry (reading a single branch).

This ends up repairing the performance of a simple `TTree::Draw` of a single branch
from 1 hour back down to 7s (performance seem in v6.12).

This correct an issue introduced by commit 73f6223 first since in v6.14/00

This fixes root-project#12649
@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

1 similar comment
@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

pcanal added a commit that referenced this issue Apr 25, 2023
Prior to this change, the cache of which basket to start the search
on would restart from the first basket at each cluster iteration
(i.e. for filling the cache with with clusters)

On an extreme example:
        15,272,928 entries
           152,739 baskets (and as many clusters)
            10,000 Actual TTreeCache buffer size (minimum allowed)
             8,442 estimated buffer size of TTreeCache (1.5 times compressed buffer size)
               400 bytes per baskets
               100 entries per baskets (i.e. per clusters)
                25 number of cluster per TTreeCache buffer for single branch with default size.
                 1 float per entry (reading a single branch).

This ends up repairing the performance of a simple `TTree::Draw` of a single branch
from 1 hour back down to 7s (performance seem in v6.12).

This correct an issue introduced by commit 73f6223 first since in v6.14/00

This fixes #12649
@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

6 similar comments
@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

github-actions bot commented May 1, 2023

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

github-actions bot commented May 2, 2023

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

github-actions bot commented May 3, 2023

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

github-actions bot commented May 4, 2023

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

github-actions bot commented May 5, 2023

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

12 similar comments
@github-actions
Copy link

github-actions bot commented May 8, 2023

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

github-actions bot commented May 9, 2023

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

@github-actions
Copy link

Hi @pcanal,

It appears this issue is closed, but wasn't yet added to a project. Please add upcoming versions that will include the fix, or 'not applicable' otherwise.

Sincerely,
🤖

pcanal added a commit that referenced this issue May 23, 2023
Prior to this change, the cache of which basket to start the search
on would restart from the first basket at each cluster iteration
(i.e. for filling the cache with with clusters)

On an extreme example:
        15,272,928 entries
           152,739 baskets (and as many clusters)
            10,000 Actual TTreeCache buffer size (minimum allowed)
             8,442 estimated buffer size of TTreeCache (1.5 times compressed buffer size)
               400 bytes per baskets
               100 entries per baskets (i.e. per clusters)
                25 number of cluster per TTreeCache buffer for single branch with default size.
                 1 float per entry (reading a single branch).

This ends up repairing the performance of a simple `TTree::Draw` of a single branch
from 1 hour back down to 7s (performance seem in v6.12).

This correct an issue introduced by commit 73f6223 first since in v6.14/00

This fixes #12649
pcanal added a commit that referenced this issue May 23, 2023
Prior to this change, the cache of which basket to start the search
on would restart from the first basket at each cluster iteration
(i.e. for filling the cache with with clusters)

On an extreme example:
        15,272,928 entries
           152,739 baskets (and as many clusters)
            10,000 Actual TTreeCache buffer size (minimum allowed)
             8,442 estimated buffer size of TTreeCache (1.5 times compressed buffer size)
               400 bytes per baskets
               100 entries per baskets (i.e. per clusters)
                25 number of cluster per TTreeCache buffer for single branch with default size.
                 1 float per entry (reading a single branch).

This ends up repairing the performance of a simple `TTree::Draw` of a single branch
from 1 hour back down to 7s (performance seem in v6.12).

This correct an issue introduced by commit 73f6223 first since in v6.14/00

This fixes #12649
@pcanal pcanal added this to Issues in Fixed in 6.24/10 via automation May 24, 2023
@pcanal pcanal added this to Issues in Fixed in 6.26/12 via automation May 24, 2023
@pcanal pcanal added this to Issues in Fixed in 6.30/00 via automation May 24, 2023
@pcanal pcanal added this to Issues in Fixed in 6.28/06 via automation May 24, 2023
enirolf pushed a commit to enirolf/root that referenced this issue May 26, 2023
Prior to this change, the cache of which basket to start the search
on would restart from the first basket at each cluster iteration
(i.e. for filling the cache with with clusters)

On an extreme example:
        15,272,928 entries
           152,739 baskets (and as many clusters)
            10,000 Actual TTreeCache buffer size (minimum allowed)
             8,442 estimated buffer size of TTreeCache (1.5 times compressed buffer size)
               400 bytes per baskets
               100 entries per baskets (i.e. per clusters)
                25 number of cluster per TTreeCache buffer for single branch with default size.
                 1 float per entry (reading a single branch).

This ends up repairing the performance of a simple `TTree::Draw` of a single branch
from 1 hour back down to 7s (performance seem in v6.12).

This correct an issue introduced by commit 73f6223 first since in v6.14/00

This fixes root-project#12649
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment