Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nccl] Remove NCCL auto-download due to licensing issues #17431

Merged
merged 10 commits into from May 6, 2021

Conversation

jacobkahn
Copy link
Contributor

@jacobkahn jacobkahn commented Apr 21, 2021

Follow up to #16031, mimics #16413, and makes additional changes as outlined in #17346

Fixes flashlight/wav2letter#950

  • Which triplets are supported/not supported? Have you updated the CI baseline?

Same triplets as before. This doesn't affect the CI baseline since CUDA 10.1 is still installed on CI Machines and NCCL 2.4.6 (compatible with 10.1) will still be downloaded from conda.

Yes

  • If you have added/updated a port: Have you run ./vcpkg x-add-version --all and committed the result?

Yes

@cenit
Copy link
Contributor

cenit commented Apr 21, 2021

please consider this:
#17331 (comment)

@jacobkahn
Copy link
Contributor Author

Aha - thanks for flagging @cenit -- I'll go ahead and remove it here too for this PR. cc @BillyONeal — if you want to tack NCCL onto your request, that would be helpful!

Too bad that we have to remove it, but makes sense.

@jacobkahn jacobkahn changed the title [nccl] Only download if compatible with found CUDA version [nccl] Remove NCCL auto-download due to licensing issues Apr 21, 2021
@BillyONeal
Copy link
Member

Will have this pending shortly:
image

@PhoebeHui PhoebeHui added the category:port-bug The issue is with a library, which is something the port should already support label Apr 22, 2021
@PhoebeHui
Copy link
Contributor

Depends on PR #17331.

Failures on linux:

-- Could NOT find NCCL: Found unsuitable version "", but required is at least "2.4.6.1" (found NCCL_INCLUDE_DIRS-NOTFOUND)
CMake Error at ports/nccl/portfile.cmake:16 (message):
  Please install NCCL using your system package manager (the same way you
  installed CUDA).  For example: apt install libnccl2 libnccl-dev.
Call Stack (most recent call first):
  scripts/ports.cmake:142 (include)

@PhoebeHui PhoebeHui added the depends:different-pr This PR or Issue depends on a PR which has been filed label Apr 23, 2021
@PhoebeHui
Copy link
Contributor

@jacobkahn, thanks for your contribution!

Could you update the baseline version by executing ' ./vcpkg x-add-version --overwrite-version nccl'?

Error: While reading versions for port nccl from file: C:\a\1\s\versions\n-\nccl.json
       File declares version `2.4.6#1` with SHA: 78f69d23214c0c0849fbe1a5d0fe62f934dc72e1
       But local port with the same verion has a different SHA: 65d03efc8a527afe223125d2cac2e685517297e3
       Please update the port's version fields and then run:

           vcpkg x-add-version nccl

       to add a new version.

@jacobkahn
Copy link
Contributor Author

jacobkahn commented Apr 23, 2021

@PhoebeHui done.

I think CI needs a rerun now that NCCL is installed.

@BillyONeal
Copy link
Member

I think CI needs a rerun now that NCCL is installed.

You can always trigger that by merging with origin/master.

BillyONeal
BillyONeal previously approved these changes Apr 27, 2021
Copy link
Member

@BillyONeal BillyONeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK to me assuming CI passes.

@BillyONeal
Copy link
Member

Looks like CI failures are a regression from #13100 + updating the compilers on the build machines to 16.9.4

@PhoebeHui
Copy link
Contributor

Yes, For gdk-pixbuf's failure, it's not relate to this PR, we would take the baseline error.

Failures:

E:\vcpkg\clean\vcpkg\buildtrees\gdk-pixbuf\src\k-pixbuf-2-c86079bed6\gdk-pixbuf\pixops\../fallback-c89.c(38): error C2169: 'lrint': intrinsic function, cannot be defined

For nccl:x64-linux, it still failed with following failure:

-- Could NOT find NCCL: Found unsuitable version "", but required is at least "2.4.6.1" (found NCCL_INCLUDE_DIRS-NOTFOUND)
CMake Error at ports/nccl/portfile.cmake:16 (message):
  Please install NCCL using your system package manager (the same way you
  installed CUDA).  For example: apt install libnccl2 libnccl-dev.
Call Stack (most recent call first):
  scripts/ports.cmake:142 (include)

@Neumann-A
Copy link
Contributor

gdk-pixbuf check for lrint in meson.build needs to be updated to use the correct parameters. I often see function checks in the form of somefunction(void) and VS is eagerly reporting errors with release build flags while with debug build flags the check passes fine.

@jacobkahn
Copy link
Contributor Author

Looks like the dependent PR has merged -- are we good here @BillyONeal @PhoebeHui ?

@PhoebeHui
Copy link
Contributor

@jacobkahn, I have rerun the CI testing, let's wait for the results.

@BillyONeal
Copy link
Member

BillyONeal commented Apr 29, 2021

Looks like the dependent PR has merged -- are we good here @BillyONeal @PhoebeHui ?

This one looks like the port actually not being happy:

-- Could NOT find NCCL: Found unsuitable version "", but required is at least "2.4.6.1" (found NCCL_INCLUDE_DIRS-NOTFOUND)
CMake Error at ports/nccl/portfile.cmake:16 (message):
  Please install NCCL using your system package manager (the same way you
  installed CUDA).  For example: apt install libnccl2 libnccl-dev.
Call Stack (most recent call first):
  scripts/ports.cmake:142 (include)

We installed nccl into our base VMs like this:

sudo apt install -y --no-install-recommends cuda-compiler-11-3 cuda-libraries-dev-11-3 cuda-driver-dev-11-3 \

@BillyONeal BillyONeal dismissed their stale review April 29, 2021 16:40

No longer relevant

@BillyONeal BillyONeal added requires:author-response and removed depends:different-pr This PR or Issue depends on a PR which has been filed labels Apr 29, 2021
@jacobkahn
Copy link
Contributor Author

@BillyONeal reprod and fixed. It seems that like cuDNN, NCCL throws bits directly in /usr/include and /usr/lib, and there's no pkgconfig, so added those hints as we have for the cuDNN package.

@PhoebeHui PhoebeHui added the info:reviewed Pull Request changes follow basic guidelines label May 6, 2021
@strega-nil-ms strega-nil-ms merged commit ac9da2c into microsoft:master May 6, 2021
@strega-nil-ms
Copy link
Contributor

Thanks @jacobkahn and @BillyONeal !

@jacobkahn jacobkahn deleted the nccl_cuda_fix branch May 6, 2021 21:34
@longnguyen2004
Copy link
Contributor

Can we also build nccl from source? The source is at https://github.com/NVIDIA/nccl/tree/master/src

@BillyONeal
Copy link
Member

@longnguyen2004 I would really like to get some form of nvidia comment before going there; the licensing in that repo is not consistent.

@longnguyen2004
Copy link
Contributor

Seems to me that it's mostly BSD 3-clause, judging from this page https://docs.nvidia.com/deeplearning/nccl/bsd/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:port-bug The issue is with a library, which is something the port should already support info:reviewed Pull Request changes follow basic guidelines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Installation Failed: flashlight-{cuda/cpu}[asr]
7 participants