Skip to content

Allow I/O of interpreted class if the underlying data format allows#22234

Open
vepadulano wants to merge 1 commit into
root-project:masterfrom
vepadulano:rdf-awkward-snapshot
Open

Allow I/O of interpreted class if the underlying data format allows#22234
vepadulano wants to merge 1 commit into
root-project:masterfrom
vepadulano:rdf-awkward-snapshot

Conversation

@vepadulano
Copy link
Copy Markdown
Member

@vepadulano vepadulano commented May 11, 2026

RDataFrame needs the type_info of input column types to create the column readers. This is searched first in a map of known simple types, then through TClass. Sometimes TClass might not know about a type_info even though it's available to the interpreter. This is the case for example of runtime-generated classes that are declared to the interpreter but do not have a dictionary, in which case tclass->IsLoaded() will return false and the TClass will not expose the type_info but the interpreter knows about it.

This commit proposes to extend the search in RDataFrame to include also a call to the interpreter in case the RTTI cannot be found through TClass.

This removes a long-standing safeguard introduced in RDataFrame that would prevent users from trying to Snapshot a column if the type had no dictionary. But both TTree and RNTuple support this to different degrees. By removing the safeguard and querying the interpreter for the RTTI RDataFrame delegates responsability to the underlying I/O technology.

As a side effect, this also helps in the integration with awkward arrays, where a C++ type is generated per different awkward array layout at runtime in the Python application. The awkward data source is still responsible to communicate correctly to TTree or RNTuple what they should store to disk when a Snapshot is called.

FYI @ianna

Fixes #22233

Comment thread tree/dataframe/src/RDFUtils.cxx Outdated
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

Test Results

    22 files      22 suites   3d 10h 33m 57s ⏱️
 3 855 tests  3 855 ✅ 0 💤 0 ❌
77 037 runs  77 037 ✅ 0 💤 0 ❌

Results for commit fb44bbe.

♻️ This comment has been updated with latest results.

@vepadulano vepadulano force-pushed the rdf-awkward-snapshot branch from 93748f1 to ed725fc Compare May 11, 2026 22:31
RDataFrame needs the type_info of input column types to create the column readers. This is searched first in a map of known simple types, then through TClass. Sometimes TClass might not know about a type_info even though it's available to the interpreter. This is the case for example of runtime-generated classes that are declared to the interpreter but do not have a dictionary, in which case `tclass->IsLoaded()` will return false and the TClass will not expose the type_info but the interpreter knows about it.

This commit proposes to extend the search in RDataFrame to include also a call to the interpreter in case the RTTI cannot be found through TClass.

This removes a long-standing safeguard introduced in RDataFrame that would prevent users from trying to Snapshot a column if the type had no dictionary. But both TTree and RNTuple support this to different degrees. By removing the safeguard and searching querying the interpreter for the RTTI RDataFrame delegates responsability to the underlying I/O technology.

As a side effect, this also helps in the integration with awkward arrays, where a C++ type is generated per different awkward array layout at runtime in the Python application. The awkward data source is still responsible to communicate correctly to TTree or RNTuple what they should store to disk when a Snapshot is called.
@vepadulano vepadulano force-pushed the rdf-awkward-snapshot branch from ed725fc to fb44bbe Compare May 12, 2026 14:06
@vepadulano vepadulano requested a review from bellenot as a code owner May 12, 2026 14:06
@vepadulano vepadulano changed the title Search more thoroughly for the RTTI of interpreted types Allow I/O of interpreted class if the underlying data format allows May 12, 2026
@vepadulano vepadulano requested review from jblomer and pcanal May 12, 2026 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression in previously implicit behaviour in RDataFrame Snapshot

2 participants