Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infrastructure for GPU devices #162

Conversation

igor-davidyuk
Copy link
Contributor

@igor-davidyuk igor-davidyuk commented Aug 27, 2021

An attempt to allow assigning GPU devices through OpenFL.
The PR introduces an optional 'device monitor' plugin for Envoy and two information flows:

  1. GPU status goes from Envoy through Director to the Frontend.
  2. GPU utilization policy that goes the same path but in reversed order.

The Director_Pytorch_Kvasir_UNET example is modified to utilize the new infrastructure.
There are 2 envoys, one that utilizes GPUs and one that does not.

Device assignment for an experiment is done through 'device assignment policy' which may be 'cuda preferred' or 'cpu only'

@igor-davidyuk igor-davidyuk added the enhancement New feature or request label Aug 27, 2021
openfl/component/collaborator/collaborator.py Show resolved Hide resolved
openfl/component/collaborator/collaborator.py Show resolved Hide resolved
openfl/component/collaborator/collaborator.py Show resolved Hide resolved
openfl/component/collaborator/collaborator.py Show resolved Hide resolved
openfl/component/collaborator/collaborator.py Show resolved Hide resolved
openfl/component/collaborator/collaborator.py Outdated Show resolved Hide resolved
openfl/component/collaborator/collaborator.py Outdated Show resolved Hide resolved
openfl/component/director/director.py Show resolved Hide resolved
openfl/component/director/director.py Show resolved Hide resolved
openfl/component/envoy/envoy.py Outdated Show resolved Hide resolved
openfl/component/envoy/envoy.py Outdated Show resolved Hide resolved
openfl/interface/envoy.py Show resolved Hide resolved
openfl/plugins/processing_units_monitor/pynvml_monitor.py Outdated Show resolved Hide resolved
openfl/protocols/director.proto Show resolved Hide resolved
openfl/transport/grpc/director_client.py Outdated Show resolved Hide resolved
resp.health_check_period.seconds = health_check_period

return resp

async def GetEnvoys(self, request, context): # NOQA:N802
"""Get a status information about envoys."""
envoy_infos = self.director.get_envoys()

return director_pb2.GetEnvoysResponse(envoy_infos=envoy_infos)
response = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a response. It's envoy_infos or envoy_info_messages. Response is director_pb2.GetEnvoysResponse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to envoy_statuses

root_certificate, private_key, certificate):
"""Start the Envoy."""
logger.info('🧿 Starting the Envoy.')

shard_descriptor = shard_descriptor_from_config(shard_config_path)
# Reed the Envoy config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it looks like we can add a method read_envoy_config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that we need a separate builder component, that would assemble our services and manage their plugins. For now, I would just leave the config reading logic in the cl interface

@alexey-gruzdev
Copy link
Contributor

ok to test

@alexey-gruzdev
Copy link
Contributor

ok to test

@github-actions github-actions bot locked and limited conversation to collaborators Oct 11, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants