Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large computation graphs cause serious memory and runtime overhead #14510

Closed
1 task done
vepadulano opened this issue Jan 31, 2024 · 0 comments · Fixed by #15264
Closed
1 task done

Large computation graphs cause serious memory and runtime overhead #14510

vepadulano opened this issue Jan 31, 2024 · 0 comments · Fixed by #15264

Comments

@vepadulano
Copy link
Member

Check duplicate issues.

  • Checked for duplicates

Description

Increasing the number of nodes in an RDataFrame computation graph can introduce serious overheads both in terms of performance and memory usage. In extreme (but very realistic) cases, this leads to OOM crashes.

A flamegraph (attached) can highlight that the main culprit is the allocation/deallocation of very large STL containers (std::map, std::vector), which happens in the machinery of the RColumnRegister class. This class has a copy-on-write policy, introduced by #10899 and further explained at #11297 .

many_defines_original

For large graphs (O(10K) nodes), we start seeing multiple GBs of memory used just to make the Define calls, and a large portion of the total runtime being spent in the destruction of the RDataFrame itself (i.e. at the end of the application the user is stuck at the terminal).

The copy-on-write policy is there for a reason. This way, any new branch of the computation graph can share the information about the columns defined (available) for that branch, without being contaminated by information coming from other branches of the graph (this is the cause of the CI errors in #14490 for example).

We need to rethink about a way to keep the same functionality that does not incur in the performance/memory usage penalties.

Reproducer

#include <ROOT/RDataFrame.hxx>

#include <string>
#include <iostream>

int main()
{

    ROOT::RDataFrame df(1);

    auto node = df.Define("col_0",
                            []()
                            { return 42; });

    // Increase the number of iterations for a more evident effect
    for (std::size_t i = 0; i < 10000; i++)
    {
        node = node.Define("col_" + std::to_string(i + 1),
                           []()
                           { return 42; });
    }

    std::cout << "End of main\n";
}

ROOT version

Any

Installation method

Built from source

Operating system

Any

Additional context

No response

vepadulano added a commit to vepadulano/root that referenced this issue Apr 29, 2024
Every node of the computation graph needs to know which columns it has access to. This information is stored in the RColumnRegister class, which holds a map associating every column name available to a certain node with the corresponding RDefineReader. This object can become quite heavy as each column name is stored as a std::string and the readers are held by an RDefinesWithReaders object which itself is not a trivial type. For very deep computation graphs (e.g. O(10K) `Define` calls chained one after another in the same branch), just the creation of the graph can take up several GBs of memory and a large portion of the runtime is spent in the creation and subsequent destruction of such heavy objects.

This commit moves the actual creation of an RDefinesWithReaders and the relative string representing its column name to a centralized register in the RLoopManager. The RColumnRegister only holds references to the cached values, according to which columns it needs. The internal map is also changed to a vector since the definition order of the column names is relevant for following operations when processing data.

Fixes root-project#14510
@vepadulano vepadulano added this to the 6.32/00 milestone May 2, 2024
vepadulano added a commit to vepadulano/root that referenced this issue May 2, 2024
Every node of the computation graph needs to know which columns it has access
to. This information is stored in the RColumnRegister class, which holds a map
associating every column name available to a certain node with the corresponding
RDefineReader. This object can become quite heavy as each column name is stored
as a std::string and the readers are held by an RDefinesWithReaders object which
itself is not a trivial type. For very deep computation graphs (e.g. O(10K)
`Define` calls chained one after another in the same branch), just the creation
of the graph can take up several GBs of memory and a large portion of the
runtime is spent in the creation and subsequent destruction of such heavy
objects. A similar logic is used for the map of registered variations, but the
number of variations grows much slower than the number of calls to Define, so
the effects of that are even more difficult to notice.

This commit proposes a complete refactoring of how these objects are handled
within the RDataFrame computation graph. At first, both the collection of define
readers as well as the variation readers are stripped of their ownership
responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are
created within the RColumnRegister class API, but they are registered centrally
by the RLoopManager, which now manages them all via unique_ptr. The
RColumnRegister class now only holds references to those objects. As a further
memory optimization measure, all the strings relative to the column/variation
names are also cached centrally in the RLoopManager and only views to those
strings are kept in the RColumnRegister. To avoid circular references in the
shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own
the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the
computation graph themselves (via RInterfaceBase). Now, when the last node of
the computation graph is destroyed, it will also trigger the destruction of the
RLoopManager. In turn, this triggers the deregistration of all the define and
variation readers.

Fixes root-project#14510
vepadulano added a commit to vepadulano/root that referenced this issue May 2, 2024
Every node of the computation graph needs to know which columns it has access
to. This information is stored in the RColumnRegister class, which holds a map
associating every column name available to a certain node with the corresponding
RDefineReader. This object can become quite heavy as each column name is stored
as a std::string and the readers are held by an RDefinesWithReaders object which
itself is not a trivial type. For very deep computation graphs (e.g. O(10K)
`Define` calls chained one after another in the same branch), just the creation
of the graph can take up several GBs of memory and a large portion of the
runtime is spent in the creation and subsequent destruction of such heavy
objects. A similar logic is used for the map of registered variations, but the
number of variations grows much slower than the number of calls to Define, so
the effects of that are even more difficult to notice.

This commit proposes a complete refactoring of how these objects are handled
within the RDataFrame computation graph. At first, both the collection of define
readers as well as the variation readers are stripped of their ownership
responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are
created within the RColumnRegister class API, but they are registered centrally
by the RLoopManager, which now manages them all via unique_ptr. The
RColumnRegister class now only holds references to those objects. As a further
memory optimization measure, all the strings relative to the column/variation
names are also cached centrally in the RLoopManager and only views to those
strings are kept in the RColumnRegister. To avoid circular references in the
shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own
the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the
computation graph themselves (via RInterfaceBase). Now, when the last node of
the computation graph is destroyed, it will also trigger the destruction of the
RLoopManager. In turn, this triggers the deregistration of all the define and
variation readers.

Fixes root-project#14510
vepadulano added a commit that referenced this issue May 2, 2024
Every node of the computation graph needs to know which columns it has access
to. This information is stored in the RColumnRegister class, which holds a map
associating every column name available to a certain node with the corresponding
RDefineReader. This object can become quite heavy as each column name is stored
as a std::string and the readers are held by an RDefinesWithReaders object which
itself is not a trivial type. For very deep computation graphs (e.g. O(10K)
`Define` calls chained one after another in the same branch), just the creation
of the graph can take up several GBs of memory and a large portion of the
runtime is spent in the creation and subsequent destruction of such heavy
objects. A similar logic is used for the map of registered variations, but the
number of variations grows much slower than the number of calls to Define, so
the effects of that are even more difficult to notice.

This commit proposes a complete refactoring of how these objects are handled
within the RDataFrame computation graph. At first, both the collection of define
readers as well as the variation readers are stripped of their ownership
responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are
created within the RColumnRegister class API, but they are registered centrally
by the RLoopManager, which now manages them all via unique_ptr. The
RColumnRegister class now only holds references to those objects. As a further
memory optimization measure, all the strings relative to the column/variation
names are also cached centrally in the RLoopManager and only views to those
strings are kept in the RColumnRegister. To avoid circular references in the
shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own
the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the
computation graph themselves (via RInterfaceBase). Now, when the last node of
the computation graph is destroyed, it will also trigger the destruction of the
RLoopManager. In turn, this triggers the deregistration of all the define and
variation readers.

Fixes #14510
vepadulano added a commit to vepadulano/root that referenced this issue May 3, 2024
Every node of the computation graph needs to know which columns it has access
to. This information is stored in the RColumnRegister class, which holds a map
associating every column name available to a certain node with the corresponding
RDefineReader. This object can become quite heavy as each column name is stored
as a std::string and the readers are held by an RDefinesWithReaders object which
itself is not a trivial type. For very deep computation graphs (e.g. O(10K)
`Define` calls chained one after another in the same branch), just the creation
of the graph can take up several GBs of memory and a large portion of the
runtime is spent in the creation and subsequent destruction of such heavy
objects. A similar logic is used for the map of registered variations, but the
number of variations grows much slower than the number of calls to Define, so
the effects of that are even more difficult to notice.

This commit proposes a complete refactoring of how these objects are handled
within the RDataFrame computation graph. At first, both the collection of define
readers as well as the variation readers are stripped of their ownership
responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are
created within the RColumnRegister class API, but they are registered centrally
by the RLoopManager, which now manages them all via unique_ptr. The
RColumnRegister class now only holds references to those objects. As a further
memory optimization measure, all the strings relative to the column/variation
names are also cached centrally in the RLoopManager and only views to those
strings are kept in the RColumnRegister. To avoid circular references in the
shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own
the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the
computation graph themselves (via RInterfaceBase). Now, when the last node of
the computation graph is destroyed, it will also trigger the destruction of the
RLoopManager. In turn, this triggers the deregistration of all the define and
variation readers.

Fixes root-project#14510
@vepadulano vepadulano added this to Issues in Fixed in 6.32.00 via automation May 3, 2024
vepadulano added a commit that referenced this issue May 3, 2024
Every node of the computation graph needs to know which columns it has access
to. This information is stored in the RColumnRegister class, which holds a map
associating every column name available to a certain node with the corresponding
RDefineReader. This object can become quite heavy as each column name is stored
as a std::string and the readers are held by an RDefinesWithReaders object which
itself is not a trivial type. For very deep computation graphs (e.g. O(10K)
`Define` calls chained one after another in the same branch), just the creation
of the graph can take up several GBs of memory and a large portion of the
runtime is spent in the creation and subsequent destruction of such heavy
objects. A similar logic is used for the map of registered variations, but the
number of variations grows much slower than the number of calls to Define, so
the effects of that are even more difficult to notice.

This commit proposes a complete refactoring of how these objects are handled
within the RDataFrame computation graph. At first, both the collection of define
readers as well as the variation readers are stripped of their ownership
responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are
created within the RColumnRegister class API, but they are registered centrally
by the RLoopManager, which now manages them all via unique_ptr. The
RColumnRegister class now only holds references to those objects. As a further
memory optimization measure, all the strings relative to the column/variation
names are also cached centrally in the RLoopManager and only views to those
strings are kept in the RColumnRegister. To avoid circular references in the
shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own
the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the
computation graph themselves (via RInterfaceBase). Now, when the last node of
the computation graph is destroyed, it will also trigger the destruction of the
RLoopManager. In turn, this triggers the deregistration of all the define and
variation readers.

Fixes #14510
silverweed pushed a commit to silverweed/root that referenced this issue May 14, 2024
Every node of the computation graph needs to know which columns it has access
to. This information is stored in the RColumnRegister class, which holds a map
associating every column name available to a certain node with the corresponding
RDefineReader. This object can become quite heavy as each column name is stored
as a std::string and the readers are held by an RDefinesWithReaders object which
itself is not a trivial type. For very deep computation graphs (e.g. O(10K)
`Define` calls chained one after another in the same branch), just the creation
of the graph can take up several GBs of memory and a large portion of the
runtime is spent in the creation and subsequent destruction of such heavy
objects. A similar logic is used for the map of registered variations, but the
number of variations grows much slower than the number of calls to Define, so
the effects of that are even more difficult to notice.

This commit proposes a complete refactoring of how these objects are handled
within the RDataFrame computation graph. At first, both the collection of define
readers as well as the variation readers are stripped of their ownership
responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are
created within the RColumnRegister class API, but they are registered centrally
by the RLoopManager, which now manages them all via unique_ptr. The
RColumnRegister class now only holds references to those objects. As a further
memory optimization measure, all the strings relative to the column/variation
names are also cached centrally in the RLoopManager and only views to those
strings are kept in the RColumnRegister. To avoid circular references in the
shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own
the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the
computation graph themselves (via RInterfaceBase). Now, when the last node of
the computation graph is destroyed, it will also trigger the destruction of the
RLoopManager. In turn, this triggers the deregistration of all the define and
variation readers.

Fixes root-project#14510
PPaye pushed a commit to PPaye/root that referenced this issue Jun 3, 2024
Every node of the computation graph needs to know which columns it has access
to. This information is stored in the RColumnRegister class, which holds a map
associating every column name available to a certain node with the corresponding
RDefineReader. This object can become quite heavy as each column name is stored
as a std::string and the readers are held by an RDefinesWithReaders object which
itself is not a trivial type. For very deep computation graphs (e.g. O(10K)
`Define` calls chained one after another in the same branch), just the creation
of the graph can take up several GBs of memory and a large portion of the
runtime is spent in the creation and subsequent destruction of such heavy
objects. A similar logic is used for the map of registered variations, but the
number of variations grows much slower than the number of calls to Define, so
the effects of that are even more difficult to notice.

This commit proposes a complete refactoring of how these objects are handled
within the RDataFrame computation graph. At first, both the collection of define
readers as well as the variation readers are stripped of their ownership
responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are
created within the RColumnRegister class API, but they are registered centrally
by the RLoopManager, which now manages them all via unique_ptr. The
RColumnRegister class now only holds references to those objects. As a further
memory optimization measure, all the strings relative to the column/variation
names are also cached centrally in the RLoopManager and only views to those
strings are kept in the RColumnRegister. To avoid circular references in the
shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own
the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the
computation graph themselves (via RInterfaceBase). Now, when the last node of
the computation graph is destroyed, it will also trigger the destruction of the
RLoopManager. In turn, this triggers the deregistration of all the define and
variation readers.

Fixes root-project#14510
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants