Add pipeline online partition logic for pipeline#3996
Conversation
|
Tested with onnxruntime_training_bert with both naive pipeline and pipedream logic. Will add some unit test too. #Resolved |
onnxruntime/core/graph/graph.cc
Outdated
| } | ||
|
|
||
| void Graph::AddValueInfo(const NodeArg* new_value_info){ | ||
| for(auto info : value_info_){ |
There was a problem hiding this comment.
auto [](start = 6, length = 4)
const auto& #Resolved
There was a problem hiding this comment.
Not sure worth changing it to reference -- as value_info_ is a vector of pointers, and a reference is also a pointer, so taking it by value v.s. by reference would take the same memory space.
I think auto will deduce the 'const', but I think it is a good idea to explicitly put it there.
In reply to: 428256226 [](ancestors = 428256226)
There was a problem hiding this comment.
ok, better to use const auto* or auto* for pointer types.
In reply to: 428394644 [](ancestors = 428394644,428256226)
There was a problem hiding this comment.
|
|
||
| // original node_arg pointer from the origin graph. This serves as the key in the | ||
| // updated_node_arg map and any reference for original node_arg name | ||
| auto original_node_arg = producer_node->MutableOutputDefs()[idx]; |
There was a problem hiding this comment.
auto [](start = 6, length = 4)
better to use auto* for pointer type #Resolved
| } | ||
|
|
||
| auto& new_receive_output = CreateNodeArg(graph, updated_node_arg); | ||
| auto old_shape = *(updated_node_arg->Shape()); |
There was a problem hiding this comment.
auto [](start = 6, length = 4)
const auto& ? #Pending
There was a problem hiding this comment.
I am not sure if it is legit that two nodes share the same shape object -- updated_node_arg->Shape() returns its shape object, and new_receive_output.SetShape(old_shape) function takes old_shape obj as a reference instead of a copy. So if old_shape is a reference to *updated_node_arg->Shape(), it means both updated_node_arg and new_receive_output shares the same shape obj.
I tested in the bert application with const auto&, it seems to work. But I am not sure whether if it is because I got lucky.
So just want to confirm before I make the change.
In reply to: 428259076 [](ancestors = 428259076)
| } | ||
|
|
||
| Status ApplyPipelinePartitionToMainGraph(Graph& graph, | ||
| std::vector<TrainingSession::TrainingConfiguration::CutInfo> cut_info, |
There was a problem hiding this comment.
std::vectorTrainingSession::TrainingConfiguration::CutInfo [](start = 41, length = 60)
const & #Resolved
| std::string& backward_waited_event_after_recv_name, | ||
| std::string& backward_recorded_event_before_send_name); | ||
|
|
||
| Status ApplyPipelinePartitionToMainGraph(Graph& graph, |
There was a problem hiding this comment.
Graph& graph, [](start = 41, length = 13)
nit: put in new line as above? #Resolved
| return graph_inputs_including_initializers_; | ||
| } | ||
|
|
||
| bool IsInputsIncludingInitializers(const NodeArg* node_arg) const noexcept{ |
There was a problem hiding this comment.
| bool IsInputsIncludingInitializers(const NodeArg* node_arg) const noexcept{ | |
| // Return true if "node_arg" is a input or an initializer. Otherwise, returns false. | |
| bool IsInputsIncludingInitializers(const NodeArg* node_arg) const noexcept{ |
Could you add comments for those newly-added functions (like other existing ones)? Also, I am not sure if this should be graph_utils or a member of graph. If the design is to keep core structure minimal, it'd be better to move those helper functions into a separated class. #Resolved
There was a problem hiding this comment.
comment added.
I am not against moving it to util. The main concern here is that, this should be consistant with graph object's graph_inputs_including_initializers_ variable. Having this method outside makes it hard to keep in sync with this.
In reply to: 429517645 [](ancestors = 429517645)
|
|
||
| ConstReverseNodeIterator rbegin() const noexcept { | ||
| return {nodes_.crbegin(), nodes_.crend()}; | ||
| } |
There was a problem hiding this comment.
Are we sure the reverse of topological sort can be computed like this? I think we need to do inverse all edges and do an actual sort.
[Update] I found a proof. Your design is correct. #Resolved
| onnx::TensorProto_DataType type, | ||
| std::vector<NodeArg*>& new_node_args, | ||
| std::vector<std::string>& new_names) { | ||
| auto& event_id = CreateTypedNodeArg(graph, type, op_name); |
There was a problem hiding this comment.
event_id [](start = 8, length = 8)
new_node_arg #Resolved
| } | ||
|
|
||
| template <typename T> | ||
| void AddNewNodeArgAndInitializer(Graph& graph, |
There was a problem hiding this comment.
AddNewNodeArgAndInitializer [](start = 5, length = 27)
AddNewScalarNodeArgAndInitializer #Resolved
| } | ||
|
|
||
| template <typename T> | ||
| void AddNewNodeArgAndInitializer(Graph& graph, |
There was a problem hiding this comment.
// This function is used when you want to create a scalar constant in a graph.
// It may create a NodeArg so that other Node can references its value.
// It also cerates an initializer to store its value. #Resolved
| proto_data.set_data_type(type); | ||
|
|
||
| switch (type) { | ||
| case ONNX_NAMESPACE::TensorProto_DataType_BOOL: |
There was a problem hiding this comment.
switch is not good. Can we deduce its type from typename T? #Resolved
There was a problem hiding this comment.
the switch is inevitable eventhough we deduce the type from T. Because we still need to differenciate different calling method (add_int32_data vs add_int64_data), it will be one more step of deducing and then entering the switch statement.
In reply to: 429519064 [](ancestors = 429519064)
There was a problem hiding this comment.
How about this, let's write
void set_int32_scalar(const int32_t value, ONNX_NAMESPACE::TensorProto& tensor) {
tensor.set_data_type(onnx::TensorProto_DataType::INT32);
tensor.add_int32_data(value);
}
void set_int64_scalar(const int64_t value, ONNX_NAMESPACE::TensorProto& tensor) {
tensor.set_data_type(onnx::TensorProto_DataType::INT64);
tensor.add_int64_data(value);
}
and call them in AddNewScalarNodeArgAndInitializer. Then, proto_data.set_data_type(type); and switch can be removed.
In reply to: 430577195 [](ancestors = 430577195,429519064)
There was a problem hiding this comment.
Even though we have set_int32_scalar and and set_int64_scalar, how we going to call them without a type check (i.e. if or switch)?
We still need to distinguish based on different data types to decide with function to call.
On another note, I am not sure why switch is so bad that we want to avoid it...
In reply to: 430690470 [](ancestors = 430690470,430577195,429519064)
There was a problem hiding this comment.
The type parameter "T" determines which one to call.
In reply to: 430724590 [](ancestors = 430724590,430690470,430577195,429519064)
There was a problem hiding this comment.
That's what I was referring too. Once inside AddNewNodeArgAndInitializer<T>, we still need to check T's type to decide which one to call. This check is inevitable.
In reply to: 430729278 [](ancestors = 430729278,430724590,430690470,430577195,429519064)
| TrainingSession::TrainingConfiguration::CutEdge("71", {"273"})}; | ||
| TrainingSession::TrainingConfiguration::CutInfo cut1 = {TrainingSession::TrainingConfiguration::CutEdge("308"), | ||
| TrainingSession::TrainingConfiguration::CutEdge("71", {"395"})}; | ||
| params.pipeline_partition_cut_list.emplace_back(cut0); |
There was a problem hiding this comment.
Those should be input arguments of main. For example, --cuts 186-71-273,308-71-395. #Resolved
There was a problem hiding this comment.
yeah, preferably, it should come as flexible commandline input instead of c++ code. (see TODO on line 385)
Currently the parser is not capable of parsing structured data like this. (the examle you put is not sufficient as there are many ways to interpolate 186-71-273. But it shouldn't be hard. I'll add the parser in later.
In reply to: 429520899 [](ancestors = 429520899)
| // If not provided, no optimizer is added. | ||
| optional<OptimizerConfiguration> optimizer_config{}; | ||
|
|
||
| // struct to describe a specific edge (not node_arg). This information is |
There was a problem hiding this comment.
struct to describe a specific edge (not node_arg) [](start = 7, length = 49)
struct to describe a specific edge (not node_arg) ---> struct to describe edges associated to the same tensor. Note that one edge is a connection from a tensor to an operator. If a tensor is consumed by multiple operators, there would be multiple edges from that tensor to its consumers. #Resolved
wschin
left a comment
There was a problem hiding this comment.
Overall looks good but I do feel there are some corners we can further clean up.
| // Each pipeline stage should pick up some strings from this field.. | ||
| std::vector<std::string> fetch_names; | ||
| // [TODO] Add cut information. | ||
| std::vector<CutInfo> cut_list; |
There was a problem hiding this comment.
std::vector cut_list; [](start = 6, length = 30)
What is this? cut_list[i] contains the edges between the first and the second pipeline stages? For vector fields, we need to put a definition for its elements. #Resolved
There was a problem hiding this comment.
cut_list[i] contains the CutInfo to make the partition between stage i and stage i+1. Comment updated.
In reply to: 429521450 [](ancestors = 429521450)
|
|
||
| // split the graph into disconnected subgraph based on provided CutInfo | ||
| common::Status SplitGraph(Graph& graph, | ||
| std::vector<TrainingSession::TrainingConfiguration::CutInfo> split_edge_groups_, |
There was a problem hiding this comment.
split_edge_groups_ [](start = 87, length = 18)
split_edge_groups #Resolved
| // add output node_arg for send/recv | ||
| AddNewNodeArg(graph, "send_output_signal" + cut_index_str, ONNX_NAMESPACE::TensorProto_DataType_BOOL, send_output_args, new_output_names); | ||
| AddNewNodeArg(graph, "receive_output_signal" + cut_index_str, ONNX_NAMESPACE::TensorProto_DataType_BOOL, recv_output_args, new_output_names); | ||
|
|
There was a problem hiding this comment.
Just curious. How long a line can be in ORT code base? #Resolved
There was a problem hiding this comment.
| for (auto& id : edgeIds) { | ||
| // find node whose output contains id.node_arg_name | ||
| auto producer_node = graph.GetMutableProducerNode(id.node_arg_name); | ||
| if (producer_node == nullptr) { |
There was a problem hiding this comment.
producer_node == nullptr [](start = 10, length = 24)
if (!producer_node) #Resolved
There was a problem hiding this comment.
I think basiclly they are the same. I have no problem changing it. Just curious if there is any guidance on prefer one over the other.
In reply to: 429526366 [](ancestors = 429526366)
There was a problem hiding this comment.
It's shorter and more readable if people has smaller screen. :)
In reply to: 430585524 [](ancestors = 430585524,429526366)
wschin
left a comment
There was a problem hiding this comment.
Thank you! Let's consider adding bool_data in TensorProto in ONNX.
|
Thank you all for reviewing :) |
Description: Support graph online partition for pipeline training. Now user can switch between offline-mode (parition offline), or online-mode (online partition) for pipeline training.
Motivation and Context
user no need to pre-partition their graph before running ort pipeline