Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add gpu support to container runtime #2914

Merged
merged 15 commits into from
Jul 12, 2021
Merged

Conversation

bwanglzu
Copy link
Member

@bwanglzu bwanglzu commented Jul 11, 2021

this pr close #2900

should allow us deploy a dockerized jina pod with local gpu devices, e.g.

jina executor --uses jinahub+docker://CLIPImageEncoder --gpus all 
jina executor --uses jinahub+docker://CLIPImageEncoder --gpus 2  # use 2 gpu devices
jina executor --uses jinahub+docker://CLIPImageEncoder --gpus device=GPU-1234-567  # use gpu by device id
jina executor --uses jinahub+docker://CLIPImageEncoder --gpus device=GPU-1234-567,device=GPU-1234-789  # use multiple gpus by device id
jina executor --uses jinahub+docker://CLIPImageEncoder --gpus device=GPU-1234-567,driver=nvidia ## and other parameters

for more info check out this link.

@jina-bot jina-bot added size/S area/core This issue/PR affects the core codebase area/network This issue/PR affects network functionality area/testing This issue/PR affects testing component/peapod labels Jul 11, 2021
@jina-bot jina-bot added area/cli This issue/PR affects the command line interface component/flow labels Jul 11, 2021
@codecov
Copy link

codecov bot commented Jul 11, 2021

Codecov Report

Merging #2914 (13ca9de) into master (0094bbf) will increase coverage by 6.60%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2914      +/-   ##
==========================================
+ Coverage   82.21%   88.82%   +6.60%     
==========================================
  Files         106      138      +32     
  Lines        7064     9567    +2503     
==========================================
+ Hits         5808     8498    +2690     
+ Misses       1256     1069     -187     
Flag Coverage Δ
daemon 43.24% <ø> (?)
jina 88.77% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
jina/__init__.py 71.64% <ø> (ø)
jina/checker.py 96.55% <ø> (ø)
jina/clients/__init__.py 100.00% <ø> (ø)
jina/clients/base/__init__.py 90.90% <ø> (ø)
jina/clients/base/grpc.py 63.82% <ø> (ø)
jina/clients/base/http.py 93.18% <ø> (ø)
jina/clients/base/websocket.py 86.36% <ø> (ø)
jina/clients/grpc.py 100.00% <ø> (ø)
jina/clients/helper.py 100.00% <ø> (ø)
jina/clients/http.py 100.00% <ø> (ø)
... and 215 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f59c6c...13ca9de. Read the comment docs.

'device': [],
'driver': '',
}
gpu_args = self.args.gpus[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is a list why to consider only first?

capabilities=[_gpus['capabilities']],
)
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove gpus from args before doing the container run for better backwards compatibility with jinahub on older jina versions

@bwanglzu bwanglzu self-assigned this Jul 11, 2021
Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then add a test to make sure that the device request param is properly sent in the container runtime call

@@ -163,6 +164,36 @@ def _docker_run(self, replay: bool = False):
'mode': 'rw',
}

device_requests = []
if self.args.gpus:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe u can put this in an static method where u can do better unittesting of this logic

@jina-bot jina-bot added size/M and removed size/S labels Jul 12, 2021
@@ -74,6 +74,38 @@ def _set_network_for_dind_linux(self):
)
client.close()

@staticmethod
def _set_device_requests_for_gpu(gpu_args):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be a _get?

@bwanglzu bwanglzu marked this pull request as ready for review July 12, 2021 09:01
@bwanglzu bwanglzu requested a review from a team as a code owner July 12, 2021 09:01
@JoanFM
Copy link
Member

JoanFM commented Jul 12, 2021

@bwanglzu have u tried in the GPU server to see if it works?

@JoanFM JoanFM merged commit f9c50b7 into master Jul 12, 2021
@JoanFM JoanFM deleted the feat-docker-gpu-support branch July 12, 2021 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cli This issue/PR affects the command line interface area/core This issue/PR affects the core codebase area/network This issue/PR affects network functionality area/testing This issue/PR affects testing component/flow component/peapod size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose device-requests in our ContainerRuntime to the docker client
3 participants