Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Dockerfiles for Accelerate #377

Merged
merged 22 commits into from
May 23, 2022
Merged

Create Dockerfiles for Accelerate #377

merged 22 commits into from
May 23, 2022

Conversation

muellerzr
Copy link
Collaborator

@muellerzr muellerzr commented May 19, 2022

This PR adds two Dockerfiles for building on the CPU and the GPU. I chose to avoid writing one for Deepspeed for now.

Eventually these will be integrated into test runners and built nightly, similar to how transformers is setup.

This uses a staged process to reduce the size of the images a bit.

Uncompressed sizes of each docker container:

CPU: ~871mb
GPU: ~13gb

For perspective, the uncompressed size of the transformers docker image for torch is 11.2gb

The biggest difference is the inclusion of conda, to make it easier for us to switch between python versions when needed.

It's also recommended to use buildkit to build the images, as it reduces the time by a healthy chunk. E.g.:
sudo DOCKER_BUILDKIT=1 docker build .-t accelerate-cpu -f docker/accelerate-cpu/Dockerfile
(Note: times were ran cache-less)
GPU image:
5m56s -> 4m16s
CPU Image:
3m20s -> 2m44s

@muellerzr muellerzr added the enhancement New feature or request label May 19, 2022
@muellerzr muellerzr requested a review from sgugger May 19, 2022 19:41
@muellerzr muellerzr changed the title Create Dockerfile's for Accelerate Create Dockerfiles for Accelerate May 19, 2022
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 19, 2022

The documentation is not available anymore as the PR was closed or merged.

@sgugger sgugger requested a review from LysandreJik May 19, 2022 20:42
@sgugger
Copy link
Collaborator

sgugger commented May 19, 2022

I know nothing about Docker, so asking Lysandre for a review :-)

@pacman100
Copy link
Contributor

Great to have docker support! Helpful for production scenarios. Few comments:

  1. The Python versions in CPU and GPU are different - 3.6 and 3.8, respectively. Would it be better to have same Python versions? Also, transformers will be discontinuing 3.6 support, so CPU version can be bumped up to match the GPU version?
  2. Shouldn't the Dockerfile's be in the top-level directory to be able to copy accelerate codebase and install it? Maybe have Dockerfile_cpu and Dockerfile_gpu in top-level directory?
  3. Getting below error while building GPU image :
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease' is not signed.

@muellerzr
Copy link
Collaborator Author

muellerzr commented May 20, 2022

Great to have docker support! Helpful for production scenarios. Few comments:

These will also be used for having GPU and multi GPU tests, these mimic the transformers docker images for this exact purpose. (This is actually the true purpose behind this PR)

1. The Python versions in CPU and GPU are different - 3.6 and 3.8, respectively. Would it be better to have same Python versions? Also, transformers will be discontinuing 3.6 support, so CPU version can be bumped up to match the GPU version?

Yes they are, it's a partial limitation of the GPU image. I could go through the effort of having it use conda instead, but Sylvain and I discussed that 3.6 support will be dropped on the next release. Since it's so soon, it's easier to just have it be this way.

2. Shouldn't the Dockerfile's be in the top-level directory to be able to copy accelerate codebase and install it? Maybe have `Dockerfile_cpu` and `Dockerfile_gpu` in top-level directory?

Nope, these assume the git repo is the top level directory and then copy it. So when building the dockerfiles it looks something like (assuming from a fresh accelerate clone and cd'd to it) docker build . -f docker/accelerate-cpu/Dockerfile. See the transformers repo for a similar behavior

3. Getting below error while building GPU image :
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  InRelease' is not signed.

Unsure with this one. Do you have docker cuda properly setup? IIRC you might need to find the right keys to use as well. I was able to build it just fine yesterday.

@muellerzr
Copy link
Collaborator Author

muellerzr commented May 20, 2022

Thinking on it more, having the python version be configurable in the cuda image would be nice. Will change this

Also would probably be better to do similar to this Dockerfile and clone the repo instead: https://github.com/huggingface/transformers/blob/main/docker/transformers-pytorch-gpu/Dockerfile#L11

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@muellerzr muellerzr merged commit 5a00ece into main May 23, 2022
@muellerzr muellerzr deleted the Dockerfiles branch May 23, 2022 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants