Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Partitioner] Add quantization profiling flow in Partitioner #3169

Closed
wants to merge 1 commit into from

Conversation

Projects
None yet
5 participants
@beicy
Copy link
Contributor

commented Jun 25, 2019

Summary:
This PR if for #3112 : if "-dump-profile" is enabled, we won't do any partition. Instead, just generate the DAG (with only 1 node) and force the backend to be CPU. HostManager will force all DeviceManagers to be CPUDeviceManager and overwrite Provisioner and Executor. Therefore, the network will be compiled and run under CPU backend. No change for "-load-profile" flow.

We run resnet50 for testing (see Test Plan)

Documentation:

[Optional Fixes #3112]
Test Plan:
Added test in ./tests/images/run.sh

The following example shows that dump/load quantization profile for resnet50. The profile is generated using CPU backend, while the quantized model in running with Interpreter backend.

wangm-mbp:buildR wangm$  ./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -interpreter-memory=20000 -num-devices=3 -dump-profile="partition_profile.yaml"
Model: resnet50
Running 1 thread(s).
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0711 12:45:56.583297 54956032 Partitioner.cpp:1221] Profiling a model to be partitioned cross different backends. Each sub-network will be optimized and run on cpu backend.
 File: tests/images/imagenet/cat_285.png	Label-K1: 285 (probability: 0.5823)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9616)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9902)
wangm-mbp:buildR wangm$  ./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -interpreter-memory=20000 -num-devices=3 -load-profile="partition_profile.yaml"
Model: resnet50
Running 1 thread(s).
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0711 12:46:11.744526 9793536 Partitioner.cpp:1346] The number of partitions is : 2, and the DAG is dumped into DAG.dot file.
I0711 12:46:11.745139 9793536 Partitioner.cpp:69] Writing dotty graph for DAG after graph partitioning: DAG.dot
I0711 12:46:11.745558 9793536 Partitioner.cpp:1355] 	 Partition 0:
		 Name :	resnet50_part1
		 BackendKind :	Interpreter
		 Memory :	18233280
		 LogicalDeviceIDs :	0
I0711 12:46:11.745569 9793536 Partitioner.cpp:1355] 	 Partition 1:
		 Name :	resnet50_part2
		 BackendKind :	Interpreter
		 Memory :	10703512
		 LogicalDeviceIDs :	1
 File: tests/images/imagenet/cat_285.png	Label-K1: 285 (probability: 0.5565)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9551)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9890)

This is for heterogeneous partition testing (using this config file : tests/runtime_test/heterogeneousConfigs.yaml) :

wangm-mbp:buildR wangm$ ./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -load-device-configs="tests/runtime_test/heterogeneousConfigs.yaml" -dump-profile="quantiP.yaml" 
Model: resnet50
Running 1 thread(s).
tests/runtime_test/heterogeneousConfigs.yaml
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0710 10:13:13.977905 142221312 Partitioner.cpp:1206] Profiling a model to be partitioned cross different backends. Each sub-network will be optimized and run on cpu backend.
 File: tests/images/imagenet/cat_285.png	Label-K1: 285 (probability: 0.5823)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9616)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9902)
wangm-mbp:buildR wangm$ ./bin/image-classifier tests/images/imagenet/*.png -image-mode=0to1 -m=resnet50 -model-input-name=gpu_0/data -load-device-configs="tests/runtime_test/heterogeneousConfigs.yaml" -load-profile="quantiP.yaml" 
Model: resnet50
Running 1 thread(s).
tests/runtime_test/heterogeneousConfigs.yaml
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0710 10:13:32.141547 184852480 Partitioner.cpp:1330] The number of partitions is : 3, and the DAG is dumped into DAG.dot file.
I0710 10:13:32.142370 184852480 Partitioner.cpp:69] Writing dotty graph for DAG after graph partitioning: DAG.dot
I0710 10:13:32.142781 184852480 Partitioner.cpp:1339] 	 Partition 0:
		 Name :	resnet50_part1_part1
		 BackendKind :	CPU
		 Memory :	26571712
		 LogicalDeviceIDs :	0
I0710 10:13:32.142792 184852480 Partitioner.cpp:1339] 	 Partition 1:
		 Name :	resnet50_part2_part1
		 BackendKind :	Interpreter
		 Memory :	1228800
		 LogicalDeviceIDs :	1
I0710 10:13:32.142815 184852480 Partitioner.cpp:1339] 	 Partition 2:
		 Name :	resnet50_part3_part1
		 BackendKind :	CPU
		 Memory :	2088600
		 LogicalDeviceIDs :	0
 File: tests/images/imagenet/cat_285.png	Label-K1: 285 (probability: 0.5676)
 File: tests/images/imagenet/dog_207.png	Label-K1: 207 (probability: 0.9563)
 File: tests/images/imagenet/zebra_340.png	Label-K1: 340 (probability: 0.9893)

Please see a detailed explanation of how to fill out the fields in the relevant sections in PULL_REQUEST.md.

@beicy beicy requested review from jfix71 and opti-mix Jun 25, 2019

@rdzhabarov
Copy link
Contributor

left a comment

What do you think about having two partitioning schemas (two partitioner classes) at high level: regular partitioner and profile capturing partitioner.

You'd instantiate HostManager with the right partitioner (pass required partitioner as a param in HostManager ctor) and remove all custom logic inside the partitioner. I have not thought about details, but in this case there supposed to be a cleaner implementation without too much if/elses inside the exact partitioner. I becomes increasingly harder to maintain all the custom logic inside this partitioner.

@beicy

This comment has been minimized.

Copy link
Contributor Author

commented Jun 25, 2019

What do you think about having two partitioning schemas (two partitioner classes) at high level: regular partitioner and profile capturing partitioner.

You'd instantiate HostManager with the right partitioner (pass required partitioner as a param in HostManager ctor) and remove all custom logic inside the partitioner. I have not thought about details, but in this case there supposed to be a cleaner implementation without too much if/elses inside the exact partitioner. I becomes increasingly harder to maintain all the custom logic inside this partitioner.

Thanks for your advice. Maybe not tow classes, but 2 partition methods? I will think about it.

@jfix71
Copy link
Contributor

left a comment

Cool! Added one question in code, not sure if it makes sense, I don't know the Partitioner super well 🙂

Also, it's possible/valid to compile Glow without the CPU backend enabled. Can we check #ifdef GLOW_WITH_CPU to use CPU, otherwise fall back to Interpreter?

// in createDAGWithoutPartition.
BackendInfo backendInfo;
backendInfo.backend = createBackend(backendName);
backendMap_[backendName] = backendInfo;

This comment has been minimized.

Copy link
@jfix71

jfix71 Jun 26, 2019

Contributor

So you are adding logic for if we're in profiling mode in the case where we have 1 backend (here) vs. not (below). I think if we're in profiling mode we could just always set the backend to the single backend we're profiling on (e.g. CPU like here), no? Then we wouldn't need to add different cases here?

E.g. above where we call getBackendMap(), instead we could check if we're in profiling mode -- if not then call getBackendMap() as is currently done, otherwise set up backends with e.g. the CPU backend like you do, and that's all that's needed, no?

This comment has been minimized.

Copy link
@beicy

beicy Jun 26, 2019

Author Contributor

If there is only 1 backend, we can do that. But for different backends, we need to get the actually backend types first for backendBasedPartition.

@beicy beicy force-pushed the beicy:temp1 branch from c9dcefb to 4556421 Jun 26, 2019

@@ -935,6 +945,57 @@ llvm::Error Partitioner::createDAGWithoutPartition(
return llvm::Error::success();
}

llvm::Error
Partitioner::QuantizationProfilingPartition(CompilationContext &cctx) {

This comment has been minimized.

Copy link
@opti-mix

opti-mix Jul 2, 2019

Contributor

This function seems to be very similar to other functions in the partitioner. Is it possible to have some code-reuse?

@@ -85,6 +85,22 @@ llvm::Error HostManager::addNetwork(std::unique_ptr<Module> module,
name);
}
}

if (cctx.precisionConfig.quantMode == QuantizationMode::Profile) {

This comment has been minimized.

Copy link
@opti-mix

opti-mix Jul 2, 2019

Contributor

Is it possible to tweak this outside of the host manager? May be in the client of the host-manager like image-classifier or something?

The reason I ask is that I'm a bit worried about making HostManager aware of quantization.

This comment has been minimized.

Copy link
@beicy

beicy Jul 2, 2019

Author Contributor

"Is it possible to tweak this outside of the host manager? May be in the client of the host-manager like image-classifier or something?"
-- It is Host Manager calls the different partition method, so I don't think it is a good way to make this decision outside.
But I think we can let the Partitioner to make this decision. Something like if we would like to skip the backendBasedPartition or not.

This comment has been minimized.

Copy link
@opti-mix

opti-mix Jul 2, 2019

Contributor

But I think we can let the Partitioner to make this decision. Something like if we would like to skip the backendBasedPartition or not.

Sounds reasonable. This would be better than having both HostManager and Partitioner being aware of it.

@opti-mix

This comment has been minimized.

Copy link
Contributor

commented Jul 10, 2019

@beicy Ping! When do you plan to continue working on this PR?

@beicy beicy changed the title [WIP][Partitioner] Add quantization profiling flow in Partitioner [Partitioner] Add quantization profiling flow in Partitioner Jul 10, 2019

@beicy beicy force-pushed the beicy:temp1 branch from 4556421 to f0ec661 Jul 10, 2019

@beicy

This comment has been minimized.

Copy link
Contributor Author

commented Jul 10, 2019

@beicy Ping! When do you plan to continue working on this PR?

Updated. Now add testing for heterogenous partition.

@opti-mix
Copy link
Contributor

left a comment

Looks better now!

BTW, what about tests? Can we add some tests to this PR?

funcToBackend = backendBasedPartition(F_, backends, cctx);
module_->eraseFunction(F_);
auto backend = createBackend("CPU");
// Leave the partitioner early due to quantization profiling mode.

This comment has been minimized.

Copy link
@opti-mix

opti-mix Jul 10, 2019

Contributor

Is it still a correct comment? Where is the early exit? Is it this RETURN_IF_ERR below?

This comment has been minimized.

Copy link
@beicy

beicy Jul 10, 2019

Author Contributor

Will move this comment to end of this function for better understanding.

This comment has been minimized.

Copy link
@opti-mix

opti-mix Jul 10, 2019

Contributor

But if it is at the end of this function, what does it mean to "leave the partitioner early"? May be you should add this comment at the place where you call this function and then immediately return from the Partitioner?

"partitioned, this is likely because the network was "
"larger than the configured memory of a single device manager.");
}
// For profiling, we use CPU backend. Overwrite Provisioner and Executor to

This comment has been minimized.

Copy link
@opti-mix

opti-mix Jul 10, 2019

Contributor

I think we discussed in previous review rounds that it would be nice to have all the quantization specific logic in one place. And you suggested to use Partitioner for it. Or do we need to do this in the host manager because there are no easy ways to hide it all in the Partitioner?

This comment has been minimized.

Copy link
@beicy

beicy Jul 10, 2019

Author Contributor

We leave the decision of which partition flow should be called to Partitioner. However, this is for Provisioner and Executor initialization, and this should be done in HM.

@beicy beicy changed the title [Partitioner] Add quantization profiling flow in Partitioner [WIP][Partitioner] Add quantization profiling flow in Partitioner Jul 10, 2019

size_t devicesNum = devices_.size();
for (size_t i = 0; i < devicesNum; i++) {
auto name = devices_[i]->getDeviceConfig().name;
auto config = llvm::make_unique<DeviceConfig>("CPU", name);

This comment has been minimized.

Copy link
@opti-mix

opti-mix Jul 10, 2019

Contributor

Nit: Can we define a named constexpr for the backend to be used for profiling and then use this named constexpr here and in the partitioner instead of hard-coding it?

@@ -1187,6 +1220,11 @@ llvm::Error Partitioner::Partition(CompilationContext &cctx) {
// algorithm.
F_ = selectRepFunc(module_, memSize_);

if (cctx.precisionConfig.quantMode == QuantizationMode::Profile) {
// Jump into profiling flow.

This comment has been minimized.

Copy link
@opti-mix

opti-mix Jul 10, 2019

Contributor

You probably want the discussed comment about the early exit from the partitioner here.

@beicy beicy force-pushed the beicy:temp1 branch 3 times, most recently from 2eb00d4 to 494a774 Jul 11, 2019

@beicy beicy changed the title [WIP][Partitioner] Add quantization profiling flow in Partitioner [Partitioner] Add quantization profiling flow in Partitioner Jul 11, 2019

@beicy

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2019

@opti-mix Added test in tests/images/run.sh

@opti-mix
Copy link
Contributor

left a comment

Looking good (modulo two minor nits)! Thanks for fixing this!

@@ -147,6 +147,9 @@ class BackendUsingGlowIR : public Backend {
static RegisterFactory<std::string, FactoryName, Backend> \
FactoryName##_REGISTERED;

/// The backend name used in Glow quantization profiling.
#define ProfilingBackend "CPU"

This comment has been minimized.

Copy link
@opti-mix

opti-mix Jul 11, 2019

Contributor

Could you rather use constexpr char *profilngBackend = "CPU";?

This comment has been minimized.

Copy link
@jfix71

jfix71 Jul 11, 2019

Contributor

To ensure we don't break builds without the CPU backend enabled, can you make this something like:

#ifdef GLOW_WITH_CPU
constexpr char *profilngBackend = "CPU";
#else
constexpr char *profilngBackend = "Interpreter";
#endif
std::vector<Function *> funcs;
funcs.push_back(F);
doPartitioning(F->getName(), funcs, mapping, false);
// When profiling, the partition flow will be stopped after
// backendBasedPartition. Therefore, the DAG need to be generated. Otherwise,

This comment has been minimized.

Copy link
@opti-mix

opti-mix Jul 11, 2019

Contributor

need -> needs

@@ -310,6 +311,14 @@ class Partitioner {
/// the Function. \returns whether there was an error encountered.
llvm::Error Partition(CompilationContext &cctx);

/// This partition approch is used in Glow Quantization Profiling flow. The

This comment has been minimized.

Copy link
@jfix71

jfix71 Jul 11, 2019

Contributor

typo: "approch" -> "approach"

@@ -406,6 +406,9 @@ void Loader::compile(PlaceholderBindings &bindings) {
// Emit IR for the graph and compile it.
auto error = hostManager_->addNetwork(std::move(M_), cctx);
EXIT_ON_ERR(std::move(error));
// After partitioning, the original function may be removed. Need to update
// F_.
F_ = module->getFunctions().front();

This comment has been minimized.

Copy link
@jfix71

jfix71 Jul 11, 2019

Contributor

I'm curious, what do we use F_ for after this point? It seems weird that we would just reset F_ to the first function in the partition.

This comment has been minimized.

Copy link
@beicy

beicy Jul 11, 2019

Author Contributor

We will use F_ to get the module and then get all the functions in this module for generating quantization info in Loader::generateAndSerializeQuantizationInfos. When the fucntion Loader::generateAndSerializeQuantizationInfos is called, the M_ has been removed already. So we need to use a function in M_ to reach all functions in M_.

This comment has been minimized.

Copy link
@jfix71

jfix71 Jul 11, 2019

Contributor

Got it -- but we already have M_ in the Loader, so could we just do getModule()?

This comment has been minimized.

Copy link
@beicy

beicy Jul 11, 2019

Author Contributor

M_ is moved after addNetwork. Loader doesn't have a field to store M_'s raw pointer. getModule() here is { return F_->getParent();} So we still need the updated F_ (the original F_ maybe erased from this module after partitioning) .

Now the code in generateAndSerializeQuantizationInfos is

 for (auto F : getModule()->getFunctions()) {
    std::vector<NodeQuantizationInfo> tmp =
        quantization::generateNodeQuantizationInfos(bindings, F, loweredMap_,
                                                    quantizationSchema,
                                                    quantizationPrecision);
    QI.insert(QI.end(), tmp.begin(), tmp.end());
  }

@beicy beicy force-pushed the beicy:temp1 branch from 494a774 to f47c40d Jul 11, 2019

@facebook-github-bot
Copy link

left a comment

@beicy has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@beicy beicy force-pushed the beicy:temp1 branch 2 times, most recently from 5c59d26 to 4222174 Jul 11, 2019

@facebook-github-bot
Copy link

left a comment

@beicy has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@beicy beicy force-pushed the beicy:temp1 branch from 4222174 to 1bd2dd1 Jul 12, 2019

@facebook-github-bot
Copy link

left a comment

@beicy is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot

This comment has been minimized.

Copy link

commented Jul 12, 2019

@beicy merged this pull request in dc77082.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.