Skip to content

[ntuple] Fix read request coalescing when opening file from anchor#16659

Merged
jblomer merged 3 commits intoroot-project:masterfrom
jblomer:ntuple-view-prefetch
Oct 14, 2024
Merged

[ntuple] Fix read request coalescing when opening file from anchor#16659
jblomer merged 3 commits intoroot-project:masterfrom
jblomer:ntuple-view-prefetch

Conversation

@jblomer
Copy link
Copy Markdown
Contributor

@jblomer jblomer commented Oct 11, 2024

When we open an RNTuple from an anchor, we don't read the anchor with the mini file reader and therefore we must initialize the max key size manually from the given anchor. Otherwise, page coalescing won't work.

@Dr15Jones FYI.

When we open an RNTuple from an anchor, we don't read the anchor with
the mini file reader and therefore we must initialize the max key size
manually from the given anchor. Otherwise, page coalescing won't work.
@jblomer jblomer self-assigned this Oct 11, 2024
Comment thread tree/ntuple/v7/src/RPageStorageFile.cxx Outdated
Comment on lines +318 to +320
} else {
fReader.SetMaxKeySize(fAnchor->GetMaxKeySize());
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused. Why is it not:

Suggested change
} else {
fReader.SetMaxKeySize(fAnchor->GetMaxKeySize());
}
if (!fAnchor) {
fAnchor = fReader.GetNTuple(fNTupleName).Unwrap();
fReader.SetMaxKeySize(fAnchor->GetMaxKeySize());
}

and then the SetMaxKeySize also on line 305 ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mini file reader will set the max key size as a side effect of reading the anchor.
If we already have the anchor, we don't need to read it again.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the diverging code path? I.e. Is there a reason to keep the mini file reader setting the value as a side effect rather than having set only in one place?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@silverweed What do you think? I agree that the current situation is not ideal.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added @pcanal's suggestion to the PR

@Dr15Jones
Copy link
Copy Markdown
Collaborator

So this affected reading an RNTuple back from a TFile? What would be the visible change in behavior after this fix?

@jblomer
Copy link
Copy Markdown
Contributor Author

jblomer commented Oct 11, 2024

So this affected reading an RNTuple back from a TFile? What would be the visible change in behavior after this fix?

The effect is that ROOT uses much fewer read requests to get the data. At the moment, every page is a read request. When the fix is merged, I expect more or less $(\text{number of top-level fields} + \text{number of clusters})$.

You'll be able to verify this in the metrics output, looking at the nPageRead and the nRead and nReadV counters.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Oct 11, 2024

Test Results

    17 files      17 suites   4d 2h 59m 31s ⏱️
 2 714 tests  2 713 ✅ 0 💤 1 ❌
43 530 runs  43 529 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit f36963c.

♻️ This comment has been updated with latest results.

@jblomer jblomer requested a review from pcanal October 12, 2024 08:23
Copy link
Copy Markdown
Member

@hahnjo hahnjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as a fix, other improvements can be done in future PRs. Some comments on the added test inline.

std::optional<ROOT::Experimental::RNTupleView<void>> viewPy;
std::optional<ROOT::Experimental::RNTupleView<void>> viewPz;

float px, py, pz;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe initialize these variables to silence the (spurious) compiler warnings...

auto model = RNTupleModel::Create();
auto ptrPx = model->MakeField<float>("px");
auto ptrPy = model->MakeField<float>("py");
model->MakeField<bool>("trigger");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is trigger used for?

Comment on lines +338 to +339
EXPECT_LT(reader->GetDescriptor().GetNClusters(),
reader->GetMetrics().GetCounter("RNTupleReader.RPageSourceFile.nClusterLoaded")->GetValueAsInt());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we expect to load more clusters than there are in the file?

@jblomer jblomer merged commit 0daab25 into root-project:master Oct 14, 2024
@jblomer jblomer deleted the ntuple-view-prefetch branch October 14, 2024 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants