Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Partitioner] User-defined partition #3237

Closed
wants to merge 1 commit into from
Closed

Conversation

beicy
Copy link
Contributor

@beicy beicy commented Jul 16, 2019

Summary:
This PR added user-defined partition flow. Basically, a struct "PartitionConfig" containing the partition info is passed into Partitioner to enable this flow. Now we let users have the full control of how to do the partitioning.
To use this flow, users can write their helper function to generate PartitionConfig, and call Partitioner directly.
In the following PR, we will add passing PartitionConfig through HostManager from a yaml file.

Related to #2298
Documentation:

[Optional Fixes #issue]

Test Plan:
Added unittest. ninja test.

Please see a detailed explanation of how to fill out the fields in the relevant sections in PULL_REQUEST.md.

@@ -52,6 +50,8 @@ static llvm::cl::opt<bool>
llvm::cl::desc("Enable dumping the graph of each partitions"),
llvm::cl::init(false), llvm::cl::cat(PartitionerCat));

#include <fstream>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like fstream is used. (If it is, put it with the other #includes above.

Copy link
Contributor

@bertmaher bertmaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a handful of nits before landing. Thanks for prioritizing this :)

@@ -1235,6 +1250,11 @@ llvm::Error Partitioner::Partition(CompilationContext &cctx) {
// algorithm.
F_ = selectRepFunc(module_, memSize_);

if (partitionConfig_.enabled()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this go before selectRepFunc (and maybe before getBackendMap)? It seems like it would circumvent the whole rest of the partitioning process so it'd be good to not do any irrelevant work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do it in front of selectRepFunc. However, we do need to run getBackendMap for later partition validation and logging.

@@ -1348,27 +1368,97 @@ llvm::Error Partitioner::Partition(CompilationContext &cctx) {
dumpDAG("DAG.dot");
}

int i = 0;
for (Function *subF : funcList) {
(void)subF;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only semi-related to this diff but I don't think (void)subF; is needed anymore, since subF is legitimately used below, even in Debug mode.

/// a partition. 2. For i-th (0 <= i < m) partition, the nodes mapped to this
/// partition id are not in this map, and the nodes mapped to other partitions
/// ids must be in this map.
llvm::StringMap<size_t> nodeToParitition;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sp: Paritition -> Partition

}
}

// Memory usage Validatian.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sp and inconsistent caps. // Validate memory usage. would read better.

}
RETURN_IF_ERR(memoryUsageValidation(partitionMap));

// Logical device ID validataion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sp: validation

logicalDeviceID_ = assignLogicalDeviceID(partitionMap);
RETURN_IF_ERR(logicalDevicesValidation(partitionMap));

// TODO : loop-free validataion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sp: validation

@@ -171,6 +171,28 @@ struct HostConfig {
size_t executorThreads{3};
};

/// This is struct is for user defined partition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// This is struct is for user defined partition.
/// This is struct for user defined partition.

@@ -171,6 +171,28 @@ struct HostConfig {
size_t executorThreads{3};
};

/// This is struct is for user defined partition.
struct PartitionConfig {
/// The name of the functiton to be partitioned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The name of the functiton to be partitioned.
/// The name of the function to be partitioned.

struct PartitionConfig {
/// The name of the functiton to be partitioned.
std::string funcName;
/// The number of user defined partitions. The partition ids are [0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The number of user defined partitions. The partition ids are [0,
/// The number of user defined partitions.
/// The partition ids are between 0 and numOfPartitions - 1, inclusive.

/// a partition. 2. For i-th (0 <= i < m) partition, the nodes mapped to this
/// partition id are not in this map, and the nodes mapped to other partitions
/// ids must be in this map.
llvm::StringMap<size_t> nodeToParitition;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
llvm::StringMap<size_t> nodeToParitition;
llvm::StringMap<size_t> nodeToPartition;

std::vector<std::string> backendNames;
/// The name for each partition. partitionNames.size() == numOfPartitions.
std::vector<std::string> partitionNames;
/// The mapping between nodes and Partition ids. Assume there are n nodes and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a map between node's name and unsigned partition number? It wasn't clear to me initially that name is used here. Also, need to clarify that Glow mangles names, meaning that Glow's node name is never the same as name from Caffe2 protobuf.

NodeToFunctionMap partitionMap;
std::vector<Function *> funcList;
std::unordered_set<size_t> unUsed;
std::vector<NodesSet> nodesSets;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<NodesSet> nodesSets;
std::vector<NodesSet> nodesSets(partitionConfig_.numOfPartitions);

Then you won't need nodesSets.push_back(NodesSet{}); later.

}
}

// If there is un used partition and un-mapped nodes, map those nodes to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If there is un used partition and un-mapped nodes, map those nodes to the
// If there is unused partition and unmapped nodes, map those nodes to the

// unMaped list.
unMapped.push_back(&node);
} else {
size_t partitionID = iter->second;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check that partitionID is < partitionConfig_.numOfPartitions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// If there is un used partition and un-mapped nodes, map those nodes to the
// un-used partition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// un-used partition.
// unused partition.

// If there is un used partition and un-mapped nodes, map those nodes to the
// un-used partition.
if (unMapped.size()) {
assert(unUsed.size() && "Missing node to partition mapping.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(unUsed.size() && "Missing node to partition mapping.");
assert(unUsed.size() == 1 && "There must be exactly 1 unused partition.");

@beicy beicy force-pushed the auto1 branch 2 times, most recently from bfb931b to c21d412 Compare July 16, 2019 23:14
@@ -209,6 +209,9 @@ class Partitioner {
// heterogeneous partition.
bool optimized_;

// The struct contain user-defined partition info.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doxygen style and above.

/// The number of user defined partitions.
/// The partition ids are between 0 and numOfPartitions - 1, inclusive.
size_t numOfPartitions;
/// The backend for each partition. backendNames.size() == numOfPartitions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is backendNames.size() == numOfPartitions. Why do we need numOfPartitions at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means backendNames.size() should be equal to numOfPartitions. Otherwise, there is an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just have backendNames alone and the size tells how many partitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also works. The reason I added this separate field is: I think it is clear for people just use a number directly as a reference. The we compare the size of partition names and backends in case something is missing by accident.


llvm::Error Partitioner::PartitionFromConfig() {
Function *F = module_->getFunction(partitionConfig_.funcName);
RETURN_ERR_IF_NOT(F, "Can't find the function in current module.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, add function name to the error message.


NodeToFunctionMap partitionMap;
std::vector<Function *> funcList;
std::unordered_set<size_t> unUsed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused.

// Create partitions based on the given number and names.
for (size_t i = 0; i < partitionConfig_.numOfPartitions; i++) {
Function *newF =
module_->createFunction(partitionConfig_.partitionNames[i]);

This comment was marked as resolved.

unMapped.push_back(&node);
} else {
size_t partitionID = iter->second;
assert(partitionID < partitionConfig_.numOfPartitions &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no asserts. DCHECK, add partition id to the error message.

// If there is unused partition and unmapped nodes, map those nodes to the
// unused partition.
if (unMapped.size()) {
assert(unUsed.size() == 1 && "There must be exactly 1 unused partition.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -886,3 +886,28 @@ TEST_F(PartitionerTest, memoryUsageValidation1) {
auto err = myPartitioner.Partition(cctx);
EXPECT_TRUE(errToBool(std::move(err)));
}

/// This one tests partition from a user-defined config.
TEST_F(PartitionerTest, partitionFromConfig1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: partitionFromConfig

Copy link

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beicy has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -1211,7 +1225,6 @@ llvm::Error Partitioner::QuantizationProfilingPartition(
module_->eraseFunction(F_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that the function can be renamed and will no longer match the name in partitionConfig?

Copy link
Contributor Author

@beicy beicy Jul 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. If partitionConfig is set, Partitioner will go to user-defined flow without considering other options like "-load-profiling". So later renaming in the rest flow won't affect the function name.

/// mapped to this partition id are not in this map, and the nodes mapped to
/// other partitions ids must be in this map. The node's name should be the
/// name in Glow function and may be different from the original name from
/// models. Since Glow will mangle names to make them unique.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not only to make them unique. We also e.g. replace / with _. Maybe something else.
You may write: Since Glow will mangle names according to its own conventions.

Partitioner(Module *parent, const std::vector<DeviceInfo> &devices,
bool saturateHost = false, bool optimized = false);
bool saturateHost = false, bool optimized = false,
PartitionConfig partitionConfig = PartitionConfig());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see that it's already bad API. Let's say caller 1 has only partitionConfig. But now it has to pass false, false for some other 2 flags. Caller 1 has to figure out what is saturation, and what to pass for corresponding flag to disable.

It would be a good practice to make sure that incompatible flags are not passed together. E.g. saturation=false when user defined partition. But these checks can quickly explode if we have more flags/modes.

What if we later introduce another feature, third mode for partitioner? Say, PyTorch-defined partition? Then caller 3 will have to figure out what to pass for partitionConfig argument to disable user-defined partition feature.

This is why I think we should never accept a whole all-inclusive list of args in one function (in this case, partitioner constructor). I was going to propose create different method for user defined partition. But if you are going to refactor everything later into different classes, I'm also fine with that.

partitionMap.add(node, funcList[partitionID]);
nodesSets[partitionID].insert(node);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else {
what if there are unused partitions? Meaning that some partitions are empty. Is it valid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be at most 1 unused partition (there is the check). Otherwise, it is invalid. Since it means we have empty functions in the module and will cause problems later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will cause problems later

It is better to nail these problems early. Here we still can report what exactly is wrong: no nodes were mapped to a partitions #i, #j, #k... Later it will be difficult to track down where the empty function is coming from. In fact, it may fail with obscure error, so first debugging step would be to figure out that the function is empty, and only after that find out why it's empty. I really suggest to report as many problems with the partition as we can find.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. So far, if there is empty function, it will cause the assertion and stop. But you are right, the error message is not enough. Will improve the debugging ability. Thanks!

Copy link
Contributor

@gcatron gcatron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Great work!

@beicy
Copy link
Contributor Author

beicy commented Jul 17, 2019

@artemrakhov Thanks a lot for the suggestion on the interface. Let's have more discussion about the interface offline.

@facebook-github-bot
Copy link

@beicy merged this pull request in 809a3e8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants