New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow users to use mis-matched CUDA versions #3436
Conversation
print( | ||
f"{WARNING} DeepSpeed Op Builder: Installed CUDA version {sys_cuda_version} does not match the " | ||
f"version torch was compiled with {torch.version.cuda}." | ||
"Detected `DS_SKIP_CUDA_CHECK=1`: Allowing this combination of CUDA, but it may result in unexpected behavior." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth printing the versions that mismatch even if they're skipping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will still be useful to indicate the mismatched versions. Especially if users run into errors from that mismatch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I meant should it be added to the warning string for if they are overriding it, I agree it belongs in the other one too
Hi,
Even though I am getting the warning:
I still get an error:
Full Stack
I should note that installing without pre-compiling ops works normally. Thanks! |
Hi @FarzanT - are you still seeing this error? If so, can you please open a new bug so we can track it there rather than on the PR? |
We strictly enforce that CUDA major versions between the installed CUDA and torch-compiled CUDA match. However, in some cases the mismatch between major versions is OK. Rather than iterating and testing all possible mismatches, I'm adding an environment variable that can be set to skip this check and allow compilation of our kernels:
DS_SKIP_CUDA_CHECK=1
.