New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA runtime error: invalid device function sampling_topp_kernels.cu #14
Comments
Hmm, FasterTransformer has only been tested on Compute Capability >= 7.0, and the 1070 is 6.0. So it's possible something it uses is limited to more recent cards. For now I'll add a note to the README but I'll leave this open to investigate further. The line that's failing is: check_cuda_error(
cub::DeviceSegmentedRadixSort::SortPairsDescending(nullptr,
cub_temp_storage_size,
log_probs,
(T*)nullptr,
id_vals,
(int*)nullptr,
vocab_size * batch_size,
batch_size,
begin_offset_buf,
offset_buf + 1,
0, // begin_bit
sizeof(T) * 8, // end_bit = sizeof(KeyT) * 8
stream)); // cudaStream_t |
Thanks for reply! I will get a better card later. |
After my testing, now this works fine on 1060 (Compute Capability 6.1). BTW, the Computer Capability of 1070 should also be 6.1. |
Closing this as @Frederisk finds it working. If still an issue, please reopen. |
Hi guys,
I'm using:
5.18.16-200.fc36.x86_64
Using podman as container runtime with NV container toolkit
Command
nvdia-smi
works fine in container.Problem
The triton server started fine but it crashes when I request it using OpenAI API demo written in the readme.
Is this a GPU compatibility issue? if yes, which GPU model is supported?
Any help will be appreciated!
The text was updated successfully, but these errors were encountered: