Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: Fix finding nvcc (if not in $PATH) with older versions of CMake #3682

Merged

Conversation

maxhgerlach
Copy link
Collaborator

In #3444 we've added a mechanism that helps CMake find nvcc when it's missing in $PATH, but the CUDA toolkit can still be found at a typical location. It has now been observed that this does not work as intended on older versions of CMake like 3.13 or 3.16 (#3545).

CMake would go into an infinite loop with messages like

  You have changed variables that require your cache to be deleted.
  Configure will be re-run and you may have to reset some variables.
  The following variables have changed:
  CMAKE_CUDA_COMPILER= NOTFOUND

NOTFOUND is the initial value assigned to CMAKE_CUDA_COMPILER by check_language, which is written to CMake's build cache then. We override it to a proper value set(CMAKE_CUDA_COMPILER "${CUDAToolkit_BIN_DIR}/nvcc"), but this change only appears to be effective with more recent versions of cmake (I was using 3.23).

In my tests with CMake 3.13 and 3.16 the problem is resolved by clearing the CMAKE_CUDA_COMPILER variable from the cache before setting it to a new value.

Similar infinite loops have been reported earlier (https://public.kitware.com/Bug/view.php?id=14841) and it has been discouraged to override CMAKE_<LANG>_COMPILER inside CMakeLists.txt, but I'd still prefer to contain this logic inside the CMake script.

Fixes #3545

@github-actions
Copy link

github-actions bot commented Sep 8, 2022

Unit Test Results

  1 125 files  +     76    1 125 suites  +76   11h 11m 2s ⏱️ + 4m 6s
     813 tests ±       0       755 ✔️ ±    0       58 💤 ±    0  0 ±0 
22 228 runs  +1 636  15 484 ✔️ +948  6 744 💤 +688  0 ±0 

Results for commit 48c741e. ± Comparison against base commit 1fbdab5.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Sep 8, 2022

Unit Test Results (with flaky tests)

  1 303 files  +   140    1 303 suites  +140   11h 49m 43s ⏱️ + 13m 5s
     813 tests ±       0       755 ✔️ ±       0       58 💤 ±    0  0 ±0 
25 400 runs  +2 354  17 384 ✔️ +1 392  8 016 💤 +962  0 ±0 

Results for commit 48c741e. ± Comparison against base commit 1fbdab5.

♻️ This comment has been updated with latest results.

…ke 3.13 or 3.16)

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Copy link
Collaborator

@EnricoMi EnricoMi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@maxhgerlach maxhgerlach merged commit daf0f41 into horovod:master Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_TENSORFLOW=1 pip3 install horovod[pytorch] install failed
2 participants