Skip to content
Permalink
Browse files

Add quantization profiling flow in Partitioner (#3169)

Summary:
This PR if for #3112  : if  "-dump-profile" is enabled, we won't do any partition. Instead, just generate the DAG (with only 1 node) and force the backend to be CPU. HostManager will force all DeviceManagers to be CPUDeviceManager and overwrite Provisioner and Executor.  Therefore, the network will be compiled and run under CPU backend. No change for "-load-profile" flow.

We run resnet50 for testing (see Test Plan)

Documentation:

[Optional Fixes #3112]
Pull Request resolved: #3169

Test Plan:
Added test in ./tests/images/run.sh

The following example shows that dump/load quantization profile for resnet50. The profile is generated using CPU backend, while the quantized model in running with Interpreter backend.
```
wangm-mbp:buildR wangm$  ./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -interpreter-memory=20000 -num-devices=3 -dump-profile="partition_profile.yaml"
Model: resnet50
Running 1 thread(s).
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0711 12:45:56.583297 54956032 Partitioner.cpp:1221] Profiling a model to be partitioned cross different backends. Each sub-network will be optimized and run on cpu backend.
 File: tests/images/imagenet/cat_285.png	Label-K1: 285 (probability: 0.5823)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9616)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9902)
wangm-mbp:buildR wangm$  ./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -interpreter-memory=20000 -num-devices=3 -load-profile="partition_profile.yaml"
Model: resnet50
Running 1 thread(s).
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0711 12:46:11.744526 9793536 Partitioner.cpp:1346] The number of partitions is : 2, and the DAG is dumped into DAG.dot file.
I0711 12:46:11.745139 9793536 Partitioner.cpp:69] Writing dotty graph for DAG after graph partitioning: DAG.dot
I0711 12:46:11.745558 9793536 Partitioner.cpp:1355] 	 Partition 0:
		 Name :	resnet50_part1
		 BackendKind :	Interpreter
		 Memory :	18233280
		 LogicalDeviceIDs :	0
I0711 12:46:11.745569 9793536 Partitioner.cpp:1355] 	 Partition 1:
		 Name :	resnet50_part2
		 BackendKind :	Interpreter
		 Memory :	10703512
		 LogicalDeviceIDs :	1
 File: tests/images/imagenet/cat_285.png	Label-K1: 285 (probability: 0.5565)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9551)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9890)
```

This is for heterogeneous partition testing (using this config file : tests/runtime_test/heterogeneousConfigs.yaml) :
```
wangm-mbp:buildR wangm$ ./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -load-device-configs="tests/runtime_test/heterogeneousConfigs.yaml" -dump-profile="quantiP.yaml"
Model: resnet50
Running 1 thread(s).
tests/runtime_test/heterogeneousConfigs.yaml
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0710 10:13:13.977905 142221312 Partitioner.cpp:1206] Profiling a model to be partitioned cross different backends. Each sub-network will be optimized and run on cpu backend.
 File: tests/images/imagenet/cat_285.png	Label-K1: 285 (probability: 0.5823)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9616)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9902)
wangm-mbp:buildR wangm$ ./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -load-device-configs="tests/runtime_test/heterogeneousConfigs.yaml" -load-profile="quantiP.yaml"
Model: resnet50
Running 1 thread(s).
tests/runtime_test/heterogeneousConfigs.yaml
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0710 10:13:32.141547 184852480 Partitioner.cpp:1330] The number of partitions is : 3, and the DAG is dumped into DAG.dot file.
I0710 10:13:32.142370 184852480 Partitioner.cpp:69] Writing dotty graph for DAG after graph partitioning: DAG.dot
I0710 10:13:32.142781 184852480 Partitioner.cpp:1339] 	 Partition 0:
		 Name :	resnet50_part1_part1
		 BackendKind :	CPU
		 Memory :	26571712
		 LogicalDeviceIDs :	0
I0710 10:13:32.142792 184852480 Partitioner.cpp:1339] 	 Partition 1:
		 Name :	resnet50_part2_part1
		 BackendKind :	Interpreter
		 Memory :	1228800
		 LogicalDeviceIDs :	1
I0710 10:13:32.142815 184852480 Partitioner.cpp:1339] 	 Partition 2:
		 Name :	resnet50_part3_part1
		 BackendKind :	CPU
		 Memory :	2088600
		 LogicalDeviceIDs :	0
 File: tests/images/imagenet/cat_285.png	Label-K1: 285 (probability: 0.5676)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9563)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9893)
```
Please see a detailed explanation of how to fill out the fields in the relevant sections in PULL_REQUEST.md.

Differential Revision: D16210568

Pulled By: beicy

fbshipit-source-id: be63473b9405fbb10b5d075976667ef9570c16e4
  • Loading branch information...
beicy authored and facebook-github-bot committed Jul 12, 2019
1 parent e43f123 commit dc7708297b0c9822f3ccb938263c7e9145b5e1f7
@@ -147,6 +147,13 @@ class BackendUsingGlowIR : public Backend {
static RegisterFactory<std::string, FactoryName, Backend> \
FactoryName##_REGISTERED;

/// The backend name used in Glow quantization profiling.
#ifdef GLOW_WITH_CPU
constexpr const char *profilingBackend = "CPU";
#else
constexpr const char *profilingBackend = "Interpreter";
#endif

} // namespace glow

#endif // GLOW_BACKENDS_BACKEND_H
@@ -256,7 +256,8 @@ class Partitioner {
void saturateHost(unsigned logicalDeviceCount);

FunctionToBackendNameMap
backendBasedPartition(Function *F, std::vector<Backend *> &backends);
backendBasedPartition(Function *F, std::vector<Backend *> &backends,
CompilationContext &cctx);

/// Performs a load balancing optimization pass to optimize for load
/// balance in addition to respecting memory constraints.
@@ -310,6 +311,14 @@ class Partitioner {
/// the Function. \returns whether there was an error encountered.
llvm::Error Partition(CompilationContext &cctx);

/// This partition approach is used in Glow Quantization Profiling flow. The
/// backendBasedPartition is applied first in case there are heterogeneous
/// backends. Then each sub-function will be compiled and run in CPU backend
/// for profiling.
llvm::Error QuantizationProfilingPartition(CompilationContext &cctx,
Function *F,
std::vector<Backend *> backends);

/// Get the partitions.
DAGListTy &getPartitionResult() { return partitions_; }

@@ -901,9 +901,8 @@ void Partitioner::doPartitioning(llvm::StringRef funcName,
}
}

FunctionToBackendNameMap
Partitioner::backendBasedPartition(Function *F,
std::vector<Backend *> &backends) {
FunctionToBackendNameMap Partitioner::backendBasedPartition(
Function *F, std::vector<Backend *> &backends, CompilationContext &cctx) {
FunctionToBackendNameMap ret;
NodeToFunctionMap mapping;
llvm::DenseMap<Node *, std::string> nodeToBackendName;
@@ -953,8 +952,15 @@ Partitioner::backendBasedPartition(Function *F,
newF = F->getParent()->createFunction(std::string(F->getName()) + "_part" +
std::to_string(++color));
auto backendName = nodeToBackendName[bfs[level - 1][0]];
mapping.createPartition(newF, backendName);
ret[newF] = backendName;
if (cctx.precisionConfig.quantMode == QuantizationMode::Profile) {
// When profiling, all the partition backend is assigned to
// profilingBackend.
mapping.createPartition(newF, profilingBackend);
ret[newF] = profilingBackend;
} else {
mapping.createPartition(newF, backendName);
ret[newF] = backendName;
}
for (int i = level - 1; i >= 0; i--) {
for (size_t j = 0, e = bfs[i].size(); j < e; j++) {
Node *N = bfs[i][j];
@@ -963,17 +969,35 @@ Partitioner::backendBasedPartition(Function *F,
backendName = bk;
newF = F->getParent()->createFunction(
std::string(F->getName()) + "_part" + std::to_string(++color));
mapping.createPartition(newF, backendName);
ret[newF] = backendName;
if (cctx.precisionConfig.quantMode == QuantizationMode::Profile) {
// When profiling, all the partition backend is assigned to be
// profilingBackend.
mapping.createPartition(newF, profilingBackend);
ret[newF] = profilingBackend;
} else {
mapping.createPartition(newF, backendName);
ret[newF] = backendName;
}
}
mapping.add(N, newF);
}
}

// Here we just need to split the function without generating DAG.
std::vector<Function *> funcs;
funcs.push_back(F);
doPartitioning(F->getName(), funcs, mapping, false);
// When profiling, the partition flow will be stopped after
// backendBasedPartition. Therefore, the DAG needs to be generated. Otherwise,
// no need to generate DAG.
bool genDAG = cctx.precisionConfig.quantMode == QuantizationMode::Profile
? true
: false;
if (genDAG) {
DeviceIDTy logicalDeviceID = 0;
for (auto &func : mapping.getPartitions()) {
mapping.appendLogicalDeviceID(func, logicalDeviceID++);
}
}
doPartitioning(F->getName(), funcs, mapping, genDAG);

return ret;
}
@@ -1177,6 +1201,30 @@ llvm::Error Partitioner::loadBalancedPartitioning(Function *F,
return llvm::Error::success();
}

llvm::Error Partitioner::QuantizationProfilingPartition(
CompilationContext &cctx, Function *F, std::vector<Backend *> backends) {
// Quantization profiling flow is run under CPU backend, so we don't really
// need the concrete partition. The backendBasedPartition is necessary since
// we need the mapping between quantized tensor and original tensor.
FunctionToBackendNameMap funcToBackend;
funcToBackend = backendBasedPartition(F_, backends, cctx);
module_->eraseFunction(F_);
auto backend = createBackend(profilingBackend);
for (Function *subF : module_->getFunctions()) {
(void)subF;
assert(subF->verify() && "Conversion led to invalid function");
if (!optimized_) {
RETURN_IF_ERR(::glow::optimizeFunction(subF, *backend, cctx));
}
}
if (logPartition) {
LOG(INFO)
<< "Profiling a model to be partitioned cross different backends. Each "
"sub-network will be optimized and run on cpu backend.\n";
}
return llvm::Error::success();
}

llvm::Error Partitioner::Partition(CompilationContext &cctx) {
// Prepare the mapping between BackendName and BackendInfo.
std::vector<Backend *> backends;
@@ -1187,6 +1235,12 @@ llvm::Error Partitioner::Partition(CompilationContext &cctx) {
// algorithm.
F_ = selectRepFunc(module_, memSize_);

if (cctx.precisionConfig.quantMode == QuantizationMode::Profile) {
// Jump into profiling flow, and leave without generating partitions for the
// backends with same type..
return QuantizationProfilingPartition(cctx, F_, backends);
}

// Step 1 : do the partition based on backends type.
FunctionToBackendNameMap funcToBackend;
std::string origName(F_->getName().data());
@@ -1201,13 +1255,14 @@ llvm::Error Partitioner::Partition(CompilationContext &cctx) {
if (logPartition) {
LOG(INFO) << "The model is too small for applying partition.\n"
<< "Model size : " << memSize_ << "\n"
<< "Backend Name : " << backendName << "\n"
<< "Device memory: " << backendMap_[backendName].memSize
<< "\n";
}
return createDAGWithoutPartition(backendName, backendMap_, cctx);
}
} else {
funcToBackend = backendBasedPartition(F_, backends);
funcToBackend = backendBasedPartition(F_, backends, cctx);
module_->eraseFunction(F_);
}

@@ -85,6 +85,7 @@ llvm::Error HostManager::addNetwork(std::unique_ptr<Module> module,
name);
}
}

std::vector<DeviceInfo> deviceInfo;
for (auto &device : devices_) {
DeviceInfo info = device.second->getDeviceInfo();
@@ -102,17 +103,21 @@ llvm::Error HostManager::addNetwork(std::unique_ptr<Module> module,
auto partitioner = Partitioner(module.get(), deviceInfo, saturateHost);
RETURN_IF_ERR(partitioner.Partition(cctx));
auto nodeList = std::move(partitioner.getPartitionResult());

if (cctx.precisionConfig.quantMode == QuantizationMode::Profile) {
// Check that all functions were not partitioned.
for (auto &network : nodeList) {
if (network.nodes.size() > 1) {
return MAKE_ERR(
GlowErr::ErrorCode::RUNTIME_ERROR,
"Failed to add network for profiling: Network was "
"partitioned, this is likely because the network was "
"larger than the configured memory of a single device manager.");
}
// For profiling, we use CPU backend. Overwrite Provisioner and Executor to
// force the network is compiled and run in profilingBackend.
// backend.
size_t devicesNum = devices_.size();
for (size_t i = 0; i < devicesNum; i++) {
auto name = devices_[i]->getDeviceConfig().name;
auto config = llvm::make_unique<DeviceConfig>(profilingBackend, name);
devices_[i] = std::unique_ptr<DeviceManager>(
DeviceManager::createDeviceManager(*config));
RETURN_IF_ERR(devices_[i]->init());
}
provisioner_.reset(new Provisioner(devices_));
executor_.reset(new ThreadPoolExecutor(devices_, config_.executorThreads));
}

RETURN_IF_ERR(provisioner_->provision(nodeList, *module, cctx));
@@ -145,10 +145,17 @@ done
./bin/image-classifier tests/images/imagenet/*.png -expected-labels=${imagenetIdxValues} -image-mode=0to1 -m=quant_resnet50 -model-input-name=gpu_0/data_0 -use-imagenet-normalization "$@"
num_errors=$(($num_errors + $?))

# Heterogeneous partition Resnet50 Caffe2 model test
# Heterogeneous partition Resnet50 Caffe2 model test.
./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -cpu-memory=100000 -load-device-configs="tests/runtime_test/heterogeneousConfigs.yaml" "$@"
num_errors=$(($num_errors + $?))

# Quantization with Heterogeneous partition Resnet50 Caffe2 model test. Dump and load profile.
./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -load-device-configs="tests/runtime_test/heterogeneousConfigs.yaml" -dump-profile="quantiP.yaml" "$@"
num_errors=$(($num_errors + $?))

./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -load-device-configs="tests/runtime_test/heterogeneousConfigs.yaml" -load-profile="quantiP.yaml" "$@"
num_errors=$(($num_errors + $?))

# Emotion_ferplus onnx model test
i=0
for png_filename in tests/images/EmotionSampleImages/*.png; do
@@ -406,6 +406,9 @@ void Loader::compile(PlaceholderBindings &bindings) {
// Emit IR for the graph and compile it.
auto error = hostManager_->addNetwork(std::move(M_), cctx);
EXIT_ON_ERR(std::move(error));
// After partitioning, the original function may be removed. Need to update
// F_.
F_ = module->getFunctions().front();
}
if (dumpGraphOpt) {
for (auto function : module->getFunctions()) {
@@ -445,9 +448,14 @@ void Loader::generateAndSerializeQuantizationInfos(
PlaceholderBindings &bindings) {
assert(!dumpProfileFileOpt.empty() &&
"Filename to dump serialized profile to must not be empty.");
std::vector<NodeQuantizationInfo> QI =
quantization::generateNodeQuantizationInfos(
bindings, F_, loweredMap_, quantizationSchema, quantizationPrecision);
std::vector<NodeQuantizationInfo> QI;
for (auto F : getModule()->getFunctions()) {
std::vector<NodeQuantizationInfo> tmp =
quantization::generateNodeQuantizationInfos(bindings, F, loweredMap_,
quantizationSchema,
quantizationPrecision);
QI.insert(QI.end(), tmp.begin(), tmp.end());
}
serializeToYaml(dumpProfileFileOpt, QI);
}

0 comments on commit dc77082

Please sign in to comment.
You can’t perform that action at this time.