Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish updated candle version on crates.io #1761

Closed
edesalve opened this issue Feb 26, 2024 · 6 comments
Closed

Publish updated candle version on crates.io #1761

edesalve opened this issue Feb 26, 2024 · 6 comments

Comments

@edesalve
Copy link

Hi all. In the last month several great updates have been released in candle, especially for quantized models inference (I have been waiting for a long time for cuda acceleration).
On crates.io the crate version is still stuck at 3.3.3. When is the new crate version scheduled to be published on crates.io?

Thank you very much.

@LaurentMazare
Copy link
Collaborator

Hopefully we'll release 0.4.0 in the next few days.

@LaurentMazare
Copy link
Collaborator

Version 0.4.0 should now be available on crates.io!

@edesalve
Copy link
Author

Hi @LaurentMazare, I believe that the version published on crates.io doesn't include the latest upgrades (cuda inference for quantized models).
Moreover there seems to be some issues with the version published during cuda tensors loading:

Cuda(Load { cuda: DriverError(CUDA_ERROR_UNSUPPORTED_PTX_VERSION, "the provided PTX was compiled with an unsupported toolchain."), module_name: "cast_u32_f32" })

My system:

Default host: x86_64-unknown-linux-gnu
rustup home: /home/ubuntu/.rustup

stable-x86_64-unknown-linux-gnu (default)
rustc 1.75.0 (82e1608df 2023-12-21)

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

Thank you.

@LaurentMazare
Copy link
Collaborator

I don't think the UNSUPPORTED_PTX_VERSION has much to do with the release, it seems more like a mismatch between the nvcc compiler and cuda toolchain vs the actual driver on your host.
Also I think this includes the latest upgrades of cuda inference with quantized model, e.g. this line was part of the change. Could you provide some code that doesn't work with this version and you would expect to work?

@edesalve
Copy link
Author

@LaurentMazare you're right, the latest upgrades are included, sorry for the oversight.
Concerning the problem it appears that the problem is when tensor.to_dtype() is called. I tried building different model and the same error appears, here is an example from mistral:

let inv_freq = Tensor::from_vec(inv_freq, (1, inv_freq_len), dev)?.to_dtype(dtype)?

that gives: Cuda(Load { cuda: DriverError(CUDA_ERROR_UNSUPPORTED_PTX_VERSION, "the provided PTX was compiled with an unsupported toolchain."), module_name: "cast_f32_bf16" }).

The problem arises only when device is cuda. With the previous version 0.3.3 all works fine.

@edesalve
Copy link
Author

edesalve commented Mar 1, 2024

Upgrading Cuda driver to >= 545 solves the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants