Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Deployments issues on NVIDIA DGX2 #2742

Closed
5 of 9 tasks
abuccts opened this issue May 10, 2019 · 3 comments
Closed
5 of 9 tasks

Deployments issues on NVIDIA DGX2 #2742

abuccts opened this issue May 10, 2019 · 3 comments
Assignees
Labels
deployment PAI deployment related pai-dev

Comments

@abuccts
Copy link
Member

abuccts commented May 10, 2019

Background

NVIDIA DGX2 uses a customized DGX OS and has GPU drivers, IB drivers, nvidia-docker, etc. installed already. To install PAI on DGX2 server and keep all pre-installed software, there'll be some issues during PAI deployment.

Issues

Workaround

This branch based on pai-0.12.y release can be used as a workaround to deploy PAI on DGX2 servers.

@abuccts abuccts added the deployment PAI deployment related label May 10, 2019
@fanyangCS
Copy link
Contributor

@abuccts could you provide some links to explain the last issue?

@abuccts
Copy link
Member Author

abuccts commented May 10, 2019

For Docker zfs driver issue, output in docker info:

Storage Driver: zfs
 Zpool: error while getting pool information strconv.ParseUint: parsing "": invalid syntax
 Zpool Health: not available
 Parent Dataset: storage/docker
 Space Used By Parent: ******
 Space Available: ******
 Parent Quota: no
 Compression: off

Here're related issues in Docker and its upstream Go wrapper for zfs.

But this has not affected Docker/k8s yet.

@fanyangCS
Copy link
Contributor

closed and track in separate issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
deployment PAI deployment related pai-dev
Projects
None yet
Development

No branches or pull requests

3 participants