Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] CUDA use in LLamaSharp #266

Closed
vvdb-architecture opened this issue Jan 19, 2024 · 6 comments
Closed

[Question] CUDA use in LLamaSharp #266

vvdb-architecture opened this issue Jan 19, 2024 · 6 comments

Comments

@vvdb-architecture
Copy link

Context / Scenario

I'm using Kernel-memory with LLamaSharp. Despite having a RTX 3080 and the latest CUDA drivers installed, CUDA is not used.

Question

Not sure if this is a bug or I'm missing something, so here's a question instead:

The LlamaSharp.csproj contains

     <PackageReference Include="LLamaSharp.Backend.Cpu"/>
     <PackageReference Include="LLamaSharp.Backend.Cuda12"/>

I found out that if both Cpu and Cuda12 back-ends are referenced, only the CPU is being used even if the CUDA DLL is loaded.
If I remove the reference to LLamaSharp.Backend.Cpu, then the CUDA back-end will start to be used.

It might be a "latest version thing", I don't know. But here you are.

@vvdb-architecture vvdb-architecture added the question Further information is requested label Jan 19, 2024
@dluc
Copy link
Collaborator

dluc commented Jan 21, 2024

@vvdb-architecture I've noticed something similar but I could not repro, I would report it to the LLamaSharp project. They will probably ask for logs

@dluc
Copy link
Collaborator

dluc commented Jan 22, 2024

If you add a call to NativeLibraryConfig.Instance.WithLogs() you should see logs about the backend selection.

For instance, if you run the code here https://github.com/microsoft/kernel-memory/tree/llamatest the console should contain some useful information.

@vvdb-architecture
Copy link
Author

vvdb-architecture commented Jan 26, 2024

It seems that in LLamaSharp the CPU back-end and the Cuda back-ends can't be installed at the same time.

I would suggest the maintainers of Kernel-memory to either add a comment in the .csproj file or in the readme.md to this effect.

@dluc
Copy link
Collaborator

dluc commented Jan 26, 2024

Considering that the service is also packaged as a Docker image, even if we add a comment, the Docker image will have all the LLamaSharp packages, and the issue will persist. We could opt for ollama or LM Studio to support LLama models, maybe removing LLamaSharp.

@martindevans
Copy link

It seems that in LLamaSharp SciSharp/LLamaSharp#189 (comment).

It's intended that they should be installable at the same time now. If there are multiple installed LLamaSharp is doing runtime feature detection to try and detect which backend is best to use. There seems to be a bug in that right now though :(

@dluc
Copy link
Collaborator

dluc commented Mar 14, 2024

The runtime detection was available last year too but it never worked in my tests, with runtime always using CPU. Might be about the way assemblies are loaded and persist in memory, just guessing.

@microsoft microsoft locked and limited conversation to collaborators Jun 4, 2024
@dluc dluc converted this issue into discussion #545 Jun 4, 2024
@dluc dluc added discussion and removed question Further information is requested labels Jun 4, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

3 participants