Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Cortex-cpp continues to have 1 layer offload to CPU while using GPU #1104

Open
Van-QA opened this issue Jun 20, 2024 · 0 comments
Open
Labels
type: bug Something isn't working

Comments

@Van-QA
Copy link
Contributor

Van-QA commented Jun 20, 2024

Describe the bug
When generating responses using a local llm, cortex-cpp still seems to use CPU.
https://discord.com/channels/1107178041848909847/1149558035971321886/1253148982188838954

To Reproduce

  1. Install cortex-cpp and the CUDA toolkit locally.
  2. Turn on GPU acceler‌‌atio‌n
  3. Generate responses using a local llm.
  4. Observe high CPU usage.

Expected behavior
Since cortex-cpp is using a local llm and the CUDA toolkit, it should primarily use the GPU for processing and not consume as much CPU.

Desktop

  • OS: Linux

Additional context
The logs indicate that 32 out of 33 layers are offloaded to the GPU, but 1 layer is still processed on the CPU. This behavior will be investigated further.
image
image

@Van-QA Van-QA changed the title bug: Cortex-cpp continues to have 1 layer offload to CPU why using GPU bug: Cortex-cpp continues to have 1 layer offload to CPU while using GPU Jun 20, 2024
@imtuyethan imtuyethan added the type: bug Something isn't working label Sep 2, 2024
@0xSage 0xSage transferred this issue from janhq/jan Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
Status: Need Investigation
Development

No branches or pull requests

3 participants