-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large computation graphs cause serious memory and runtime overhead #14510
Comments
vepadulano
added a commit
to vepadulano/root
that referenced
this issue
Apr 29, 2024
Every node of the computation graph needs to know which columns it has access to. This information is stored in the RColumnRegister class, which holds a map associating every column name available to a certain node with the corresponding RDefineReader. This object can become quite heavy as each column name is stored as a std::string and the readers are held by an RDefinesWithReaders object which itself is not a trivial type. For very deep computation graphs (e.g. O(10K) `Define` calls chained one after another in the same branch), just the creation of the graph can take up several GBs of memory and a large portion of the runtime is spent in the creation and subsequent destruction of such heavy objects. This commit moves the actual creation of an RDefinesWithReaders and the relative string representing its column name to a centralized register in the RLoopManager. The RColumnRegister only holds references to the cached values, according to which columns it needs. The internal map is also changed to a vector since the definition order of the column names is relevant for following operations when processing data. Fixes root-project#14510
vepadulano
added a commit
to vepadulano/root
that referenced
this issue
May 2, 2024
Every node of the computation graph needs to know which columns it has access to. This information is stored in the RColumnRegister class, which holds a map associating every column name available to a certain node with the corresponding RDefineReader. This object can become quite heavy as each column name is stored as a std::string and the readers are held by an RDefinesWithReaders object which itself is not a trivial type. For very deep computation graphs (e.g. O(10K) `Define` calls chained one after another in the same branch), just the creation of the graph can take up several GBs of memory and a large portion of the runtime is spent in the creation and subsequent destruction of such heavy objects. A similar logic is used for the map of registered variations, but the number of variations grows much slower than the number of calls to Define, so the effects of that are even more difficult to notice. This commit proposes a complete refactoring of how these objects are handled within the RDataFrame computation graph. At first, both the collection of define readers as well as the variation readers are stripped of their ownership responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are created within the RColumnRegister class API, but they are registered centrally by the RLoopManager, which now manages them all via unique_ptr. The RColumnRegister class now only holds references to those objects. As a further memory optimization measure, all the strings relative to the column/variation names are also cached centrally in the RLoopManager and only views to those strings are kept in the RColumnRegister. To avoid circular references in the shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the computation graph themselves (via RInterfaceBase). Now, when the last node of the computation graph is destroyed, it will also trigger the destruction of the RLoopManager. In turn, this triggers the deregistration of all the define and variation readers. Fixes root-project#14510
vepadulano
added a commit
to vepadulano/root
that referenced
this issue
May 2, 2024
Every node of the computation graph needs to know which columns it has access to. This information is stored in the RColumnRegister class, which holds a map associating every column name available to a certain node with the corresponding RDefineReader. This object can become quite heavy as each column name is stored as a std::string and the readers are held by an RDefinesWithReaders object which itself is not a trivial type. For very deep computation graphs (e.g. O(10K) `Define` calls chained one after another in the same branch), just the creation of the graph can take up several GBs of memory and a large portion of the runtime is spent in the creation and subsequent destruction of such heavy objects. A similar logic is used for the map of registered variations, but the number of variations grows much slower than the number of calls to Define, so the effects of that are even more difficult to notice. This commit proposes a complete refactoring of how these objects are handled within the RDataFrame computation graph. At first, both the collection of define readers as well as the variation readers are stripped of their ownership responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are created within the RColumnRegister class API, but they are registered centrally by the RLoopManager, which now manages them all via unique_ptr. The RColumnRegister class now only holds references to those objects. As a further memory optimization measure, all the strings relative to the column/variation names are also cached centrally in the RLoopManager and only views to those strings are kept in the RColumnRegister. To avoid circular references in the shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the computation graph themselves (via RInterfaceBase). Now, when the last node of the computation graph is destroyed, it will also trigger the destruction of the RLoopManager. In turn, this triggers the deregistration of all the define and variation readers. Fixes root-project#14510
vepadulano
added a commit
that referenced
this issue
May 2, 2024
Every node of the computation graph needs to know which columns it has access to. This information is stored in the RColumnRegister class, which holds a map associating every column name available to a certain node with the corresponding RDefineReader. This object can become quite heavy as each column name is stored as a std::string and the readers are held by an RDefinesWithReaders object which itself is not a trivial type. For very deep computation graphs (e.g. O(10K) `Define` calls chained one after another in the same branch), just the creation of the graph can take up several GBs of memory and a large portion of the runtime is spent in the creation and subsequent destruction of such heavy objects. A similar logic is used for the map of registered variations, but the number of variations grows much slower than the number of calls to Define, so the effects of that are even more difficult to notice. This commit proposes a complete refactoring of how these objects are handled within the RDataFrame computation graph. At first, both the collection of define readers as well as the variation readers are stripped of their ownership responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are created within the RColumnRegister class API, but they are registered centrally by the RLoopManager, which now manages them all via unique_ptr. The RColumnRegister class now only holds references to those objects. As a further memory optimization measure, all the strings relative to the column/variation names are also cached centrally in the RLoopManager and only views to those strings are kept in the RColumnRegister. To avoid circular references in the shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the computation graph themselves (via RInterfaceBase). Now, when the last node of the computation graph is destroyed, it will also trigger the destruction of the RLoopManager. In turn, this triggers the deregistration of all the define and variation readers. Fixes #14510
vepadulano
added a commit
to vepadulano/root
that referenced
this issue
May 3, 2024
Every node of the computation graph needs to know which columns it has access to. This information is stored in the RColumnRegister class, which holds a map associating every column name available to a certain node with the corresponding RDefineReader. This object can become quite heavy as each column name is stored as a std::string and the readers are held by an RDefinesWithReaders object which itself is not a trivial type. For very deep computation graphs (e.g. O(10K) `Define` calls chained one after another in the same branch), just the creation of the graph can take up several GBs of memory and a large portion of the runtime is spent in the creation and subsequent destruction of such heavy objects. A similar logic is used for the map of registered variations, but the number of variations grows much slower than the number of calls to Define, so the effects of that are even more difficult to notice. This commit proposes a complete refactoring of how these objects are handled within the RDataFrame computation graph. At first, both the collection of define readers as well as the variation readers are stripped of their ownership responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are created within the RColumnRegister class API, but they are registered centrally by the RLoopManager, which now manages them all via unique_ptr. The RColumnRegister class now only holds references to those objects. As a further memory optimization measure, all the strings relative to the column/variation names are also cached centrally in the RLoopManager and only views to those strings are kept in the RColumnRegister. To avoid circular references in the shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the computation graph themselves (via RInterfaceBase). Now, when the last node of the computation graph is destroyed, it will also trigger the destruction of the RLoopManager. In turn, this triggers the deregistration of all the define and variation readers. Fixes root-project#14510
vepadulano
added a commit
that referenced
this issue
May 3, 2024
Every node of the computation graph needs to know which columns it has access to. This information is stored in the RColumnRegister class, which holds a map associating every column name available to a certain node with the corresponding RDefineReader. This object can become quite heavy as each column name is stored as a std::string and the readers are held by an RDefinesWithReaders object which itself is not a trivial type. For very deep computation graphs (e.g. O(10K) `Define` calls chained one after another in the same branch), just the creation of the graph can take up several GBs of memory and a large portion of the runtime is spent in the creation and subsequent destruction of such heavy objects. A similar logic is used for the map of registered variations, but the number of variations grows much slower than the number of calls to Define, so the effects of that are even more difficult to notice. This commit proposes a complete refactoring of how these objects are handled within the RDataFrame computation graph. At first, both the collection of define readers as well as the variation readers are stripped of their ownership responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are created within the RColumnRegister class API, but they are registered centrally by the RLoopManager, which now manages them all via unique_ptr. The RColumnRegister class now only holds references to those objects. As a further memory optimization measure, all the strings relative to the column/variation names are also cached centrally in the RLoopManager and only views to those strings are kept in the RColumnRegister. To avoid circular references in the shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the computation graph themselves (via RInterfaceBase). Now, when the last node of the computation graph is destroyed, it will also trigger the destruction of the RLoopManager. In turn, this triggers the deregistration of all the define and variation readers. Fixes #14510
silverweed
pushed a commit
to silverweed/root
that referenced
this issue
May 14, 2024
Every node of the computation graph needs to know which columns it has access to. This information is stored in the RColumnRegister class, which holds a map associating every column name available to a certain node with the corresponding RDefineReader. This object can become quite heavy as each column name is stored as a std::string and the readers are held by an RDefinesWithReaders object which itself is not a trivial type. For very deep computation graphs (e.g. O(10K) `Define` calls chained one after another in the same branch), just the creation of the graph can take up several GBs of memory and a large portion of the runtime is spent in the creation and subsequent destruction of such heavy objects. A similar logic is used for the map of registered variations, but the number of variations grows much slower than the number of calls to Define, so the effects of that are even more difficult to notice. This commit proposes a complete refactoring of how these objects are handled within the RDataFrame computation graph. At first, both the collection of define readers as well as the variation readers are stripped of their ownership responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are created within the RColumnRegister class API, but they are registered centrally by the RLoopManager, which now manages them all via unique_ptr. The RColumnRegister class now only holds references to those objects. As a further memory optimization measure, all the strings relative to the column/variation names are also cached centrally in the RLoopManager and only views to those strings are kept in the RColumnRegister. To avoid circular references in the shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the computation graph themselves (via RInterfaceBase). Now, when the last node of the computation graph is destroyed, it will also trigger the destruction of the RLoopManager. In turn, this triggers the deregistration of all the define and variation readers. Fixes root-project#14510
PPaye
pushed a commit
to PPaye/root
that referenced
this issue
Jun 3, 2024
Every node of the computation graph needs to know which columns it has access to. This information is stored in the RColumnRegister class, which holds a map associating every column name available to a certain node with the corresponding RDefineReader. This object can become quite heavy as each column name is stored as a std::string and the readers are held by an RDefinesWithReaders object which itself is not a trivial type. For very deep computation graphs (e.g. O(10K) `Define` calls chained one after another in the same branch), just the creation of the graph can take up several GBs of memory and a large portion of the runtime is spent in the creation and subsequent destruction of such heavy objects. A similar logic is used for the map of registered variations, but the number of variations grows much slower than the number of calls to Define, so the effects of that are even more difficult to notice. This commit proposes a complete refactoring of how these objects are handled within the RDataFrame computation graph. At first, both the collection of define readers as well as the variation readers are stripped of their ownership responsibilities. RDefinesWithReaders and RVariationsWithReaders objects are created within the RColumnRegister class API, but they are registered centrally by the RLoopManager, which now manages them all via unique_ptr. The RColumnRegister class now only holds references to those objects. As a further memory optimization measure, all the strings relative to the column/variation names are also cached centrally in the RLoopManager and only views to those strings are kept in the RColumnRegister. To avoid circular references in the shared_ptr ownership of the RLoopManager itself, RColumnRegister does not own the RLoopManager anymore. The owner(s) of the RLoopManager are the nodes of the computation graph themselves (via RInterfaceBase). Now, when the last node of the computation graph is destroyed, it will also trigger the destruction of the RLoopManager. In turn, this triggers the deregistration of all the define and variation readers. Fixes root-project#14510
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Check duplicate issues.
Description
Increasing the number of nodes in an RDataFrame computation graph can introduce serious overheads both in terms of performance and memory usage. In extreme (but very realistic) cases, this leads to OOM crashes.
A flamegraph (attached) can highlight that the main culprit is the allocation/deallocation of very large STL containers (std::map, std::vector), which happens in the machinery of the
RColumnRegister
class. This class has a copy-on-write policy, introduced by #10899 and further explained at #11297 .For large graphs (O(10K) nodes), we start seeing multiple GBs of memory used just to make the
Define
calls, and a large portion of the total runtime being spent in the destruction of the RDataFrame itself (i.e. at the end of the application the user is stuck at the terminal).The copy-on-write policy is there for a reason. This way, any new branch of the computation graph can share the information about the columns defined (available) for that branch, without being contaminated by information coming from other branches of the graph (this is the cause of the CI errors in #14490 for example).
We need to rethink about a way to keep the same functionality that does not incur in the performance/memory usage penalties.
Reproducer
ROOT version
Any
Installation method
Built from source
Operating system
Any
Additional context
No response
The text was updated successfully, but these errors were encountered: