Skip to content
This repository has been archived by the owner on Sep 30, 2023. It is now read-only.

WIP: Integrate more direct GPU support #55

Merged
merged 29 commits into from
Aug 26, 2023
Merged

Conversation

ravenscroftj
Copy link
Owner

Implement support for offloading inference to a GPU

@aperullo
Copy link
Contributor

aperullo commented Aug 23, 2023

I'm sure this is still in progress but I tried the branch out and got some weird behavior I thought worth mentioning. Loading any number of layers to the gpu causes the completions to be mostly nonsense.

I tried running stablecode with --ngl 24, 12, and even 1.

My prompt was

main.py

def divide_by_2(x):

With no layers on the gpu it said return x // 2.

With any layers on the gpu it generally said return ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

@ravenscroftj ravenscroftj changed the title Integrate more direct GPU support WIP: Integrate more direct GPU support Aug 24, 2023
@ravenscroftj
Copy link
Owner Author

Yeah this is currently WIP - the CLBlast version of the code works fine but the Nvidia/CUDA implementation is crazy so I need to work out what's going on before I merge it or just merge the CLBLast build and disable pure cuda offloading support for now (basically leave the current mainline implementation of cuda turned on without any change)

@ravenscroftj
Copy link
Owner Author

I was able to get CUDA working again by bringing the ggml submodule up to date with the current upstream main branch.

I've made some changes to the docker launch script that allow you to use GPU offloading.

I'm getting sub 10s responses for non-trivial prompts with my NVIDIA 4070 using stablecode

@ravenscroftj ravenscroftj merged commit 2b27760 into main Aug 26, 2023
15 of 30 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants