Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade tf from 2.5.2 to 2.7.0. #1713

Merged
merged 15 commits into from
Jan 28, 2022
Merged

Upgrade tf from 2.5.2 to 2.7.0. #1713

merged 15 commits into from
Jan 28, 2022

Conversation

justinxzhao
Copy link
Contributor

@justinxzhao justinxzhao commented Jan 25, 2022

Updated github workflows to check python 3.7-3.9 (instead of 3.6-3.8)

Neuropod doesn't yet support python 3.9, so I've added pytest skip annotations for relevant tests contingent on the python version.

Updated requirements.txt files

Added a condition to requirements_serve.txt to install neuropod only if the python version is < 3.9.

Updated docker GPU images

Docker builds for ludwig GPU images for upgrading tensorflow to 2.7.0 were failing due to incompatible versions of cuDNN and CUDA. Tensorflow's docker linux builds only work for certain versions of cuDNN and CUDA. A table of supported versions is here: https://www.tensorflow.org/install/source#gpu

What's unclear is how the coarse cuDNN and CUDA versions map to actual library versions that we need to reference in the Dockerfile. For instance, tensorflow 2.5.2 uses the same cuDNN (8.1) and CUDA (11.2) coarse versions as tensorflow 2.7.0, yet libcudnn7=7.6.5.32-1+cuda10.1 installs fine from the tensorflow:2.5.2-gpu docker image, but not from tensorflow:2.7.0-gpu.

To find good versions, per @tgaddair 's recommendation, I referred to Horovod's docker composition test, though even there it was a small amount of guesswork mapping the declared versions, i.e. CUDNN_VERSION: 8.1.1.33-1+cuda11.2 to Dockerfile library versions, i.e. libcudnn7=8.1.1.33-1+cuda11.2 -> libcudnn8=8.1.1.33-1+cuda11.2.

To verify the package versions, I tried manually running the apt install commands in a tensorflow 2.7.0 container:

docker run -it tensorflow/tensorflow:2.7.0-gpu /bin/bash
apt install libcudnn7=7.6.5.32-1+cuda10.1 --> error (same as github)
apt install libcudnn7=8.6.5.32-1+cuda10.1 --> error (no such package)
apt install libcudnn8=8.6.5.32-1+cuda10.1 --> works

Not specifying a version also works in the container:

apt install libcudnn8 --> works

However, going versionless in ludwig's Dockerfile fails on github's workflow. Ludwig's horovod+tensorflow+gpu dockerimage seems to need specific versions for these drivers. @tgaddair, can you confirm this is expected?

To further verify that the new package versions install compatibly with the new tensorflow 2.7.0-based ludwig docker image, I ran:

docker build -t ludwig-tf-legacy-gpu docker/ludwig-gpu

This fails at the last step because there doesn't seem to be visibility of the setup.py file, but the first several steps of apt-get install for the CUDA/NCCL libraries passes.

@github-actions
Copy link

github-actions bot commented Jan 25, 2022

Unit Test Results

       6 files  ±0         6 suites  ±0   2h 49m 0s ⏱️ + 8m 30s
1 216 tests ±0  1 192 ✔️ ±0  24 💤 ±0  0 ±0 
3 648 runs  ±0  3 574 ✔️  - 2  74 💤 +2  0 ±0 

Results for commit bd5dfd7. ± Comparison against base commit 4f32c39.

♻️ This comment has been updated with latest results.

requirements.txt Outdated Show resolved Hide resolved
@tgaddair tgaddair merged commit 23b5264 into tf-legacy Jan 28, 2022
@tgaddair tgaddair deleted the upgrade_version branch January 28, 2022 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants