Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HostManager own Devices #2393

Merged
merged 3 commits into from Feb 14, 2019

Conversation

jackm321
Copy link
Contributor

@jackm321 jackm321 commented Feb 13, 2019

Description:
Make it so HostManager owns DeviceManagers and the rest of the runtime just uses them. Make the blocking on shutdown in Executor explicit to ensure that no inflight requests will be using DeviceManagers. Prevent processing new requests while shutting down.

Testing:
ninja all
cd to downloaded_models and ../bin/resnet-runtime ../../tests/images/imagenet

Documentation:
Fixes #2384
Fixes #2297

@jackm321
Copy link
Contributor Author

Also it's necessary to block until requests are done because otherwise this would be a use-after-free

Copy link
Contributor

@gcatron gcatron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, might be worth running the resnet-runtime example too.

@@ -214,6 +215,12 @@ ThreadPoolExecutor::~ThreadPoolExecutor() {
void ThreadPoolExecutor::run(const DAGNode *root,
std::unique_ptr<Context> context,
RunIdentifierTy runId, ResultCBTy cb) {
// Don't process new requests if the executor is shutting down.
if (shuttingDown_) {
cb(runId, ResultCode::Canceled, std::move(context));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, should it be a ResultCode::Failed? It never really started.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gcatron I don't know what these mean, it would be good if they had some comments saying what each of them is. Canceled sounds more correct though just from guessing because there was nothing wrong with the request, it just got there too late.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you tell me what each of these are, I can add a comment in this PR to the ResultCode enum

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they are a bit vague, canceled sounds good.

// requests from being serviced.
executor_->shutdown();
assert(activeRequestCount_ == 0 &&
"All requests should be finished when shutting downt HostManager.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downt -> down

@jackm321
Copy link
Contributor Author

@gcatron thanks for the speedy review!

deviceManagerMap_.insert(std::make_pair(testDeviceId, deviceManager));
testDeviceManagerMap_.insert(std::make_pair(testDeviceId, deviceManager));
llvm::make_unique<TestDeviceManager>(deviceManagerThreads);
deviceManagerMap_.insert(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: deviceManagerMap_.emplace(testDeviceId, std::move(deviceManager)) and in other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdzhabarov I didn't write that but yeah I agree it's better I'll change it

@jackm321 jackm321 merged commit e195bb1 into pytorch:master Feb 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants