WIP: Integrate more direct GPU support #55

ravenscroftj · 2023-08-23T16:24:40Z

Implement support for offloading inference to a GPU

…ot into feature/gpu_layers

aperullo · 2023-08-23T22:05:32Z

I'm sure this is still in progress but I tried the branch out and got some weird behavior I thought worth mentioning. Loading any number of layers to the gpu causes the completions to be mostly nonsense.

I tried running stablecode with --ngl 24, 12, and even 1.

My prompt was

main.py

def divide_by_2(x):

With no layers on the gpu it said return x // 2.

With any layers on the gpu it generally said return ___________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

ravenscroftj · 2023-08-24T10:52:44Z

Yeah this is currently WIP - the CLBlast version of the code works fine but the Nvidia/CUDA implementation is crazy so I need to work out what's going on before I merge it or just merge the CLBLast build and disable pure cuda offloading support for now (basically leave the current mainline implementation of cuda turned on without any change)

…ot into feature/gpu_layers

ravenscroftj · 2023-08-26T15:34:39Z

I was able to get CUDA working again by bringing the ggml submodule up to date with the current upstream main branch.

I've made some changes to the docker launch script that allow you to use GPU offloading.

I'm getting sub 10s responses for non-trivial prompts with my NVIDIA 4070 using stablecode

ravenscroftj added 8 commits August 21, 2023 20:03

add gpu offload for gptneox

f818e2d

increase scratch on starcoder

364168d

Merge branch 'fix/starcoder_segfault' into feature/gpu_layers

6876043

update for gpu build

5f5e9f9

add gpu offload for gptneox

cf7c528

increase scratch on starcoder

1e14d91

update for gpu build

b3d8d99

Merge branch 'feature/gpu_layers' of github.com:ravenscroftj/turbopil…

c164deb

…ot into feature/gpu_layers

ravenscroftj changed the title ~~Integrate more direct GPU support~~ WIP: Integrate more direct GPU support Aug 24, 2023

ravenscroftj added 19 commits August 26, 2023 15:12

add gpu offload for gptneox

5f7155a

increase scratch on starcoder

b2b4a14

update for gpu build

4a47251

add gpu offload for gptneox

b79ab46

update for gpu build

8fa70e1

use ggerganov ggml instead of mine

a5517b0

remove crow submodule

356a83c

remove llama

0cf7a9c

tidy cmakelist

63b5547

Merge branch 'feature/gpu_layers' of github.com:ravenscroftj/turbopil…

6d26c9b

…ot into feature/gpu_layers

remove llama.cpp submodule

97a0377

use latest upstream ggml instead of mine

e9dc6a3

use latest upstream ggml instead of mine

31bb33c

Merge branch 'feature/gpu_layers' of github.com:ravenscroftj/turbopil…

23c0a3d

…ot into feature/gpu_layers

Merge branch 'main' into feature/gpu_layers

326e76c

update run script to incorporate GPU layers

88683ab

tidy up prints in stablecoder and starcoder

6041833

add gpu offload for gpt-j models (codegen)

0b40851

disable clblast docker images

91639b8

ravenscroftj added 2 commits August 26, 2023 16:16

update clblast code in gpt-j model

215a69b

recomment the cuda preprocessor check

a00de2a

ravenscroftj merged commit 2b27760 into main Aug 26, 2023
15 of 30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Integrate more direct GPU support #55

WIP: Integrate more direct GPU support #55

ravenscroftj commented Aug 23, 2023

aperullo commented Aug 23, 2023 •

edited

ravenscroftj commented Aug 24, 2023

ravenscroftj commented Aug 26, 2023

WIP: Integrate more direct GPU support #55

WIP: Integrate more direct GPU support #55

Conversation

ravenscroftj commented Aug 23, 2023

aperullo commented Aug 23, 2023 • edited

ravenscroftj commented Aug 24, 2023

ravenscroftj commented Aug 26, 2023

aperullo commented Aug 23, 2023 •

edited