Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Resource ownership #39

Merged
16 commits merged into from
Jun 24, 2022

Conversation

ryanolson
Copy link
Contributor

@ryanolson ryanolson commented Jun 17, 2022

Fixes #35
Fixes #36

The primary issue that was causing #36 is shared_ptr creep. Usually the deferred deletion / pseudo-garbage collection enabled by shared_ptrs is useful, but in this case, the resources that own the worker threads are held in objects that are managed by shared_ptrs. In some cases, the worker threads are the last owners of these shared ptrs and thus exhibit a deadlock where a thread ultimately tries to join itself.

This PR removes all shared_ptrs from object/resources that could be potentially captured by worker threads and replaces them with references.

The code is structured so that all resources and worker threads are owned by the Executor -> resources::Manager and that all threads must be joined before resources::Manager is destroyed.

This ensures that the resources outlive the pipeline execution provided that the Executor stays in scope. If it goes out of scope or is destroyed it should issue a kill to the pipeline, but still block until all threads are joined.

All executor resources should be created and managed by resources::Manager as a single point for life cycle management.

@ryanolson ryanolson requested review from a team as code owners June 17, 2022 08:02
@ryanolson ryanolson added bug Something isn't working non-breaking Non-breaking change labels Jun 17, 2022
Copy link
Contributor

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a lot of changes to fix a deadlock on resource shutdown and brings up some concerns about how intertwined the design is.

Also, I dont see a new test checking that the deadlock has been resolved.

CMakeLists.txt Outdated Show resolved Hide resolved
include/srf/core/fiber_pool.hpp Show resolved Hide resolved
include/srf/internal/executor/iexecutor.hpp Show resolved Hide resolved
include/srf/internal/system/iresources.hpp Show resolved Hide resolved
src/internal/data_plane/instance.cpp Show resolved Hide resolved
src/internal/system/partitions.cpp Show resolved Hide resolved
src/internal/system/partitions.hpp Outdated Show resolved Hide resolved
src/internal/system/partitions.hpp Outdated Show resolved Hide resolved
src/internal/system/system.cpp Show resolved Hide resolved
src/internal/system/system_provider.cpp Show resolved Hide resolved
@ryanolson
Copy link
Contributor Author

It is a lot less intertwined now.

ryanolson and others added 4 commits June 23, 2022 18:26
Co-authored-by: Michael Demoret <42954918+mdemoret-nv@users.noreply.github.com>
Co-authored-by: Michael Demoret <42954918+mdemoret-nv@users.noreply.github.com>
Copy link
Contributor

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Updating from branch-22.06 and will merge once CI passes

@mdemoret-nv
Copy link
Contributor

@gpucibot merge

@ghost ghost merged commit a7c0d09 into nv-morpheus:branch-22.06 Jun 24, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Race Condition in System and Resource Shutdown [BUG] TestNext.TapUniquePtr intermittently fails
2 participants