-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade tf from 2.5.2 to 2.7.0. #1713
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tgaddair
reviewed
Jan 25, 2022
tgaddair
reviewed
Jan 26, 2022
tgaddair
reviewed
Jan 26, 2022
…g specific cuda library versions.
w4nderlust
approved these changes
Jan 27, 2022
tgaddair
reviewed
Jan 28, 2022
tgaddair
approved these changes
Jan 28, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updated github workflows to check python 3.7-3.9 (instead of 3.6-3.8)
Neuropod doesn't yet support python 3.9, so I've added pytest skip annotations for relevant tests contingent on the python version.
Updated requirements.txt files
Added a condition to requirements_serve.txt to install neuropod only if the python version is < 3.9.
Updated docker GPU images
Docker builds for ludwig GPU images for upgrading tensorflow to 2.7.0 were failing due to incompatible versions of cuDNN and CUDA. Tensorflow's docker linux builds only work for certain versions of cuDNN and CUDA. A table of supported versions is here: https://www.tensorflow.org/install/source#gpu
What's unclear is how the coarse
cuDNN
andCUDA
versions map to actual library versions that we need to reference in the Dockerfile. For instance, tensorflow 2.5.2 uses the same cuDNN (8.1) and CUDA (11.2) coarse versions as tensorflow 2.7.0, yetlibcudnn7=7.6.5.32-1+cuda10.1
installs fine from thetensorflow:2.5.2-gpu
docker image, but not fromtensorflow:2.7.0-gpu
.To find good versions, per @tgaddair 's recommendation, I referred to Horovod's docker composition test, though even there it was a small amount of guesswork mapping the declared versions, i.e.
CUDNN_VERSION: 8.1.1.33-1+cuda11.2
to Dockerfile library versions, i.e.libcudnn7=8.1.1.33-1+cuda11.2
->libcudnn8=8.1.1.33-1+cuda11.2
.To verify the package versions, I tried manually running the apt install commands in a tensorflow 2.7.0 container:
Not specifying a version also works in the container:
However, going versionless in ludwig's Dockerfile fails on github's workflow. Ludwig's horovod+tensorflow+gpu dockerimage seems to need specific versions for these drivers. @tgaddair, can you confirm this is expected?
To further verify that the new package versions install compatibly with the new tensorflow 2.7.0-based ludwig docker image, I ran:
docker build -t ludwig-tf-legacy-gpu docker/ludwig-gpu
This fails at the last step because there doesn't seem to be visibility of the
setup.py
file, but the first several steps ofapt-get install
for the CUDA/NCCL libraries passes.