Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue regarding falcon-7b quantized #728

Closed
Pablo1107 opened this issue Jul 7, 2023 · 8 comments · Fixed by #743
Closed

Issue regarding falcon-7b quantized #728

Pablo1107 opened this issue Jul 7, 2023 · 8 comments · Fixed by #743
Assignees
Labels
bug Something isn't working

Comments

@Pablo1107
Copy link

LocalAI version:
LocalAI version LocalAI v1.20.1-dirty (92614b9)

Environment, CPU architecture, OS, and Version:
Linux t14s 6.4.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 01 Jul 2023 16:17:21 +0000 x86_64 GNU/Linux

Describe the bug
Running LocalAI with falcon7b-instruct.ggmlv3.fp16.bin from TheBloke it is putting me out of memory with 16GB of RAM. So I tried falcon7b-instruct.ggmlv3.q8_0.bin which works with a little bit less of RAM but seg fault the backend.

To Reproduce

  1. Download this version of falcon-7b
  2. Run a any prompt.

Expected behavior
To not seg fault.

Logs

Expand
  ❯ local-ai --debug
Starting LocalAI using 4 threads, with models path: /home/pablo/.local/share/local-ai/models
unexpected end of JSON input

 ┌───────────────────────────────────────────────────┐
 │                   Fiber v2.47.0                   │
 │               http://127.0.0.1:8080               │
 │       (bound on host 0.0.0.0 and port 8080)       │
 │                                                   │
 │ Handlers ............ 32  Processes ........... 1 │
 │ Prefork ....... Disabled  PID .............. 8181 │
 └───────────────────────────────────────────────────┘

12:55PM DBG Request received: {"model":"falcon7b-instruct.ggmlv3.q8_0.bin","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"###\nRole name: shell\nProvide only zsh commands for Linux/Arch Linux without any description.\nIf there is a lack of details, provide most logical solution.\nEnsure the output is a valid shell command.\nIf multiple steps required try to combine them together.\n\nRequest: concat two .bin files into one\n###\nCommand:"}],"stream":true,"echo":false,"top_p":1,"top_k":0,"temperature":0.1,"max_tokens":0,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"frequency_penalty":0,"tfz":0,"seed":0,"mode":0,"step":0,"typical_p":0}
12:55PM DBG Parameter Config: &{OpenAIRequest:{Model:falcon7b-instruct.ggmlv3.q8_0.bin File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:1 TopK:80 Temperature:0.1 Maxtokens:512 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 Seed:0 Mode:0 Step:0 TypicalP:0} Name: StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:512 F16:false NUMA:false Threads:4 Debug:true Roles:map[] Embeddings:false Backend: TemplateConfig:{Completion: Chat: Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false PromptStrings:[] InputStrings:[] InputToken:[]}
12:55PM DBG Stream request received
[127.0.0.1]:43774  200  -  POST     /v1/chat/completions
12:55PM DBG Loading model 'falcon7b-instruct.ggmlv3.q8_0.bin' greedly
12:55PM DBG [llama] Attempting to load
12:55PM DBG Loading model llama from falcon7b-instruct.ggmlv3.q8_0.bin
12:55PM DBG Loading model in memory from file: /home/pablo/.local/share/local-ai/models/falcon7b-instruct.ggmlv3.q8_0.bin
12:55PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"falcon7b-instruct.ggmlv3.q8_0.bin","choices":[{"delta":{"role":"assistant"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

llama.cpp: loading model from /home/pablo/.local/share/local-ai/models/falcon7b-instruct.ggmlv3.q8_0.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 65024
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 4544
llama_model_load_internal: n_mult     = 71
llama_model_load_internal: n_head     = 1
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 7
llama_model_load_internal: ftype      = 7 (mostly Q8_0)
llama_model_load_internal: n_ff       = 12141
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 7313.92 MB
error loading model: llama.cpp: tensor 'tok_embeddings.weight' is missing from model
llama_load_model_from_file: failed to load model
12:55PM DBG [llama] Fails: failed loading model
12:55PM DBG [gpt4all] Attempting to load
12:55PM DBG Loading model gpt4all from falcon7b-instruct.ggmlv3.q8_0.bin
12:55PM DBG Loading model in memory from file: /home/pablo/.local/share/local-ai/models/falcon7b-instruct.ggmlv3.q8_0.bin
falcon_model_load: loading model from '/home/pablo/.local/share/local-ai/models/falcon7b-instruct.ggmlv3.q8_0.bin' - please wait ...
falcon_model_load: n_vocab   = 65024
falcon_model_load: n_embd    = 4544
falcon_model_load: n_head    = 71
falcon_model_load: n_head_kv = 1
falcon_model_load: n_layer   = 32
falcon_model_load: ftype     = 7
falcon_model_load: qntvr     = 0
falcon_model_load: ggml ctx size = 7313.92 MB
falcon_model_load: memory_size =    32.00 MB, n_mem = 65536
falcon_model_load: ........................ done
falcon_model_load: model size =  7313.87 MB / num tensors = 196
12:55PM DBG [gpt4all] Loads OK
fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x80000000027 pc=0xc53820]

runtime stack:
runtime.throw({0xe263d8?, 0xc49d6b?})
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/panic.go:1047 +0x5d fp=0x7f38c77e5610 sp=0x7f38c77e55e0 pc=0x47b4dd
runtime.sigpanic()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/signal_unix.go:825 +0x3e9 fp=0x7f38c77e5670 sp=0x7f38c77e5610 pc=0x491989

goroutine 51 [syscall]:
runtime.cgocall(0xb62770, 0xc0003f1238)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/cgocall.go:157 +0x5c fp=0xc0003f1210 sp=0xc0003f11d8 pc=0x44a2bc
github.com/nomic-ai/gpt4all/gpt4all-bindings/golang._Cfunc_model_prompt(0x7f38b840e950, 0x7f38b83fdb90, 0xc0003f8600, 0xa, 0x3f99999a, 0x400, 0x200, 0x50, 0x3f800000, 0x3dcccccd, ...)
	_cgo_gotypes.go:127 +0x45 fp=0xc0003f1238 sp=0xc0003f1210 pc=0x8ee9c5
github.com/nomic-ai/gpt4all/gpt4all-bindings/golang.(*Model).Predict.func1(0xe172f0?, 0x19?, {0xc0003f8600, 0x4539ea?, 0xc0003e6480?}, {0x400, 0xa, 0x200, 0x50, 0x1, ...})
	/home/runner/work/LocalAI/LocalAI/gpt4all/gpt4all-bindings/golang/gpt4all.go:61 +0x185 fp=0xc0003f12e0 sp=0xc0003f1238 pc=0x8ef465
github.com/nomic-ai/gpt4all/gpt4all-bindings/golang.(*Model).Predict(0x0?, {0xc0005f6000, 0x135}, {0xc0003f1508, 0x4, 0xd0?})
	/home/runner/work/LocalAI/LocalAI/gpt4all/gpt4all-bindings/golang/gpt4all.go:61 +0x225 fp=0xc0003f1440 sp=0xc0003f12e0 pc=0x8ef0e5
github.com/go-skynet/LocalAI/api.ModelInference.func11()
	/home/runner/work/LocalAI/LocalAI/api/prediction.go:523 +0x270 fp=0xc0003f1538 sp=0xc0003f1440 pc=0xaa2a30
github.com/go-skynet/LocalAI/api.ModelInference.func14()
	/home/runner/work/LocalAI/LocalAI/api/prediction.go:585 +0x1aa fp=0xc0003f15f0 sp=0xc0003f1538 pc=0xaa228a
github.com/go-skynet/LocalAI/api.ComputeChoices({0xc0005f6000, 0x135}, 0xc0001f0140, 0xc0001d4b00, 0xc0003a81a0?, 0xc000339dd0?, 0x1579460, 0xc0000123c0?)
	/home/runner/work/LocalAI/LocalAI/api/prediction.go:609 +0x246 fp=0xc0003f1eb0 sp=0xc0003f15f0 pc=0xaa55e6
github.com/go-skynet/LocalAI/api.chatEndpoint.func1({0xc0005f6000, 0x135}, 0xc0001f0140, 0xd10b60?, 0xc0001bed20?, 0xc000182240)
	/home/runner/work/LocalAI/LocalAI/api/openai.go:357 +0x1db fp=0xc0003f1fa0 sp=0xc0003f1eb0 pc=0xa9bd3b
github.com/go-skynet/LocalAI/api.chatEndpoint.func2.3()
	/home/runner/work/LocalAI/LocalAI/api/openai.go:428 +0x3f fp=0xc0003f1fe0 sp=0xc0003f1fa0 pc=0xa9bb1f
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0003f1fe8 sp=0xc0003f1fe0 pc=0x4ad401
created by github.com/go-skynet/LocalAI/api.chatEndpoint.func2
	/home/runner/work/LocalAI/LocalAI/api/openai.go:428 +0x7f1

goroutine 1 [IO wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc0001271d8 sp=0xc0001271b8 pc=0x47e236
runtime.netpollblock(0xc000127220?, 0x44994f?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/netpoll.go:527 +0xf7 fp=0xc000127210 sp=0xc0001271d8 pc=0x476a37
internal/poll.runtime_pollWait(0x7f38e1624df8, 0x72)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/netpoll.go:306 +0x89 fp=0xc000127230 sp=0xc000127210 pc=0x4a7ca9
internal/poll.(*pollDesc).wait(0xc0001fc480?, 0x1001272a0?, 0x0)
	/opt/hostedtoolcache/go/1.20.5/x64/src/internal/poll/fd_poll_runtime.go:84 +0x32 fp=0xc000127258 sp=0xc000127230 pc=0x5254f2
internal/poll.(*pollDesc).waitRead(...)
	/opt/hostedtoolcache/go/1.20.5/x64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc0001fc480)
	/opt/hostedtoolcache/go/1.20.5/x64/src/internal/poll/fd_unix.go:614 +0x2bd fp=0xc000127300 sp=0xc000127258 pc=0x52adfd
net.(*netFD).accept(0xc0001fc480)
	/opt/hostedtoolcache/go/1.20.5/x64/src/net/fd_unix.go:172 +0x35 fp=0xc0001273b8 sp=0xc000127300 pc=0x5ad2b5
net.(*TCPListener).accept(0xc0001a2660)
	/opt/hostedtoolcache/go/1.20.5/x64/src/net/tcpsock_posix.go:148 +0x25 fp=0xc0001273e0 sp=0xc0001273b8 pc=0x5c3665
net.(*TCPListener).Accept(0xc0001a2660)
	/opt/hostedtoolcache/go/1.20.5/x64/src/net/tcpsock.go:297 +0x3d fp=0xc000127410 sp=0xc0001273e0 pc=0x5c275d
github.com/valyala/fasthttp.acceptConn(0xc0003a4200, {0x1628f70, 0xc0001a2660}, 0xc000127608)
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/server.go:1928 +0x62 fp=0xc0001274f0 sp=0xc000127410 pc=0x80f562
github.com/valyala/fasthttp.(*Server).Serve(0xc0003a4200, {0x1628f70?, 0xc0001a2660})
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/server.go:1821 +0x4f4 fp=0xc000127638 sp=0xc0001274f0 pc=0x80eb74
github.com/gofiber/fiber/v2.(*App).Listen(0xc0001dd680, {0xe03ccd?, 0x7?})
	/home/runner/go/pkg/mod/github.com/gofiber/fiber/v2@v2.47.0/listen.go:88 +0x11d fp=0xc000127698 sp=0xc000127638 pc=0x8a5a5d
main.main.func1(0xc0003a6160?)
	/home/runner/work/LocalAI/LocalAI/main.go:161 +0x825 fp=0xc000127950 sp=0xc000127698 pc=0xad4845
github.com/urfave/cli/v2.(*Command).Run(0xc0003a6160, 0xc0001def00, {0xc0001aa000, 0x2, 0x2})
	/home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/command.go:274 +0x9eb fp=0xc000127bf0 sp=0xc000127950 pc=0xac190b
github.com/urfave/cli/v2.(*App).RunContext(0xc0003a2000, {0x1629478?, 0xc000198030}, {0xc0001aa000, 0x2, 0x2})
	/home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:332 +0x616 fp=0xc000127c60 sp=0xc000127bf0 pc=0xabe236
github.com/urfave/cli/v2.(*App).Run(...)
	/home/runner/go/pkg/mod/github.com/urfave/cli/v2@v2.25.7/app.go:309
main.main()
	/home/runner/work/LocalAI/LocalAI/main.go:165 +0x12b6 fp=0xc000127f80 sp=0xc000127c60 pc=0xad3f56
runtime.main()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:250 +0x207 fp=0xc000127fe0 sp=0xc000127f80 pc=0x47de07
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000127fe8 sp=0xc000127fe0 pc=0x4ad401

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000084fb0 sp=0xc000084f90 pc=0x47e236
runtime.goparkunlock(...)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:387
runtime.forcegchelper()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:305 +0xb0 fp=0xc000084fe0 sp=0xc000084fb0 pc=0x47e070
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x4ad401
created by runtime.init.6
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:293 +0x25

goroutine 3 [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000085780 sp=0xc000085760 pc=0x47e236
runtime.goparkunlock(...)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:387
runtime.bgsweep(0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgcsweep.go:319 +0xde fp=0xc0000857c8 sp=0xc000085780 pc=0x46a33e
runtime.gcenable.func1()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:178 +0x26 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x45f586
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x4ad401
created by runtime.gcenable
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:178 +0x6b

goroutine 4 [GC scavenge wait]:
runtime.gopark(0x17f724bd142?, 0x3ba297ad?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000085f70 sp=0xc000085f50 pc=0x47e236
runtime.goparkunlock(...)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:387
runtime.(*scavengerState).park(0x1b543a0)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgcscavenge.go:400 +0x53 fp=0xc000085fa0 sp=0xc000085f70 pc=0x4681f3
runtime.bgscavenge(0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgcscavenge.go:633 +0x65 fp=0xc000085fc8 sp=0xc000085fa0 pc=0x4687e5
runtime.gcenable.func2()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:179 +0x26 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x45f526
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x4ad401
created by runtime.gcenable
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:179 +0xaa

goroutine 18 [finalizer wait]:
runtime.gopark(0x1a0?, 0x1b55080?, 0xa0?, 0x61?, 0xc000084770?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000084628 sp=0xc000084608 pc=0x47e236
runtime.runfinq()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mfinal.go:193 +0x107 fp=0xc0000847e0 sp=0xc000084628 pc=0x45e5c7
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x4ad401
created by runtime.createfing
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mfinal.go:163 +0x45

goroutine 19 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000080750 sp=0xc000080730 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc0000807e0 sp=0xc000080750 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 5 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000086750 sp=0xc000086730 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc0000867e0 sp=0xc000086750 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 20 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000080f50 sp=0xc000080f30 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc000080fe0 sp=0xc000080f50 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 6 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000086f50 sp=0xc000086f30 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc000086fe0 sp=0xc000086f50 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000086fe8 sp=0xc000086fe0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 21 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000081750 sp=0xc000081730 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc0000817e0 sp=0xc000081750 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 22 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000081f50 sp=0xc000081f30 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc000081fe0 sp=0xc000081f50 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 34 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000114750 sp=0xc000114730 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc0001147e0 sp=0xc000114750 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0001147e8 sp=0xc0001147e0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 23 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000082750 sp=0xc000082730 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc0000827e0 sp=0xc000082750 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000827e8 sp=0xc0000827e0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 35 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000114f50 sp=0xc000114f30 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc000114fe0 sp=0xc000114f50 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000114fe8 sp=0xc000114fe0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 36 [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000115750 sp=0xc000115730 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc0001157e0 sp=0xc000115750 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0001157e8 sp=0xc0001157e0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 7 [GC worker (idle)]:
runtime.gopark(0x17f724a65e0?, 0x3?, 0x6d?, 0xb0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000087750 sp=0xc000087730 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc0000877e0 sp=0xc000087750 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000877e8 sp=0xc0000877e0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 24 [GC worker (idle)]:
runtime.gopark(0x17f72452981?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000082f50 sp=0xc000082f30 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc000082fe0 sp=0xc000082f50 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000082fe8 sp=0xc000082fe0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 37 [GC worker (idle)]:
runtime.gopark(0x17f724125fb?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000115f50 sp=0xc000115f30 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc000115fe0 sp=0xc000115f50 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000115fe8 sp=0xc000115fe0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 25 [GC worker (idle)]:
runtime.gopark(0x17f724482d8?, 0x1?, 0x8f?, 0x4c?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000083750 sp=0xc000083730 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc0000837e0 sp=0xc000083750 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0000837e8 sp=0xc0000837e0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 38 [GC worker (idle)]:
runtime.gopark(0x17f72412225?, 0x1?, 0xd3?, 0xbe?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000116750 sp=0xc000116730 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc0001167e0 sp=0xc000116750 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0001167e8 sp=0xc0001167e0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 8 [GC worker (idle)]:
runtime.gopark(0x17f72448671?, 0x0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000087f50 sp=0xc000087f30 pc=0x47e236
runtime.gcBgMarkWorker()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1275 +0xf1 fp=0xc000087fe0 sp=0xc000087f50 pc=0x4612f1
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000087fe8 sp=0xc000087fe0 pc=0x4ad401
created by runtime.gcBgMarkStartWorkers
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/mgc.go:1199 +0x25

goroutine 26 [select]:
runtime.gopark(0xc0001126b0?, 0x2?, 0x0?, 0x0?, 0xc000112674?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000093c20 sp=0xc000093c00 pc=0x47e236
runtime.selectgo(0xc000093eb0, 0xc000112670, 0x0?, 0x0, 0x0?, 0x1)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/select.go:327 +0x7be fp=0xc000093d60 sp=0xc000093c20 pc=0x48ddbe
github.com/go-skynet/LocalAI/api.(*galleryApplier).start.func1()
	/home/runner/work/LocalAI/LocalAI/api/gallery.go:78 +0xee fp=0xc000093fe0 sp=0xc000093d60 pc=0xa9718e
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000093fe8 sp=0xc000093fe0 pc=0x4ad401
created by github.com/go-skynet/LocalAI/api.(*galleryApplier).start
	/home/runner/work/LocalAI/LocalAI/api/gallery.go:76 +0xaa

goroutine 27 [sleep]:
runtime.gopark(0x182f0ca74b7?, 0xc0001ac3f0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000112f00 sp=0xc000112ee0 pc=0x47e236
time.Sleep(0x12a05f200)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/time.go:195 +0x135 fp=0xc000112f40 sp=0xc000112f00 pc=0x4aa275
github.com/valyala/fasthttp.(*FS).initRequestHandler.func1()
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/fs.go:482 +0x13c fp=0xc000112fe0 sp=0xc000112f40 pc=0x7da75c
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000112fe8 sp=0xc000112fe0 pc=0x4ad401
created by github.com/valyala/fasthttp.(*FS).initRequestHandler
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/fs.go:459 +0x4d6

goroutine 28 [sleep]:
runtime.gopark(0x182f0cb4066?, 0xc0001ac8c0?, 0x0?, 0x0?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000113700 sp=0xc0001136e0 pc=0x47e236
time.Sleep(0x12a05f200)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/time.go:195 +0x135 fp=0xc000113740 sp=0xc000113700 pc=0x4aa275
github.com/valyala/fasthttp.(*FS).initRequestHandler.func1()
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/fs.go:482 +0x13c fp=0xc0001137e0 sp=0xc000113740 pc=0x7da75c
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0001137e8 sp=0xc0001137e0 pc=0x4ad401
created by github.com/valyala/fasthttp.(*FS).initRequestHandler
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/fs.go:459 +0x4d6

goroutine 29 [sleep]:
runtime.gopark(0x181c6b7c637?, 0xc000113f88?, 0xc5?, 0xd5?, 0xc0001bed50?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000113f58 sp=0xc000113f38 pc=0x47e236
time.Sleep(0x2540be400)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/time.go:195 +0x135 fp=0xc000113f98 sp=0xc000113f58 pc=0x4aa275
github.com/valyala/fasthttp.(*workerPool).Start.func2()
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/workerpool.go:67 +0x56 fp=0xc000113fe0 sp=0xc000113f98 pc=0x81c056
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000113fe8 sp=0xc000113fe0 pc=0x4ad401
created by github.com/valyala/fasthttp.(*workerPool).Start
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/workerpool.go:59 +0xdd

goroutine 50 [select]:
runtime.gopark(0xc000123a08?, 0x3?, 0x34?, 0x0?, 0xc0001239da?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc000123860 sp=0xc000123840 pc=0x47e236
runtime.selectgo(0xc000123a08, 0xc0001239d4, 0x5ab9a9?, 0x0, 0xc0000c3000?, 0x1)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/select.go:327 +0x7be fp=0xc0001239a0 sp=0xc000123860 pc=0x48ddbe
github.com/valyala/fasthttp/fasthttputil.(*pipeConn).readNextByteBuffer(0xc0001f0958, 0x1)
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/fasthttputil/pipeconns.go:188 +0x1b3 fp=0xc000123a48 sp=0xc0001239a0 pc=0x7ccd73
github.com/valyala/fasthttp/fasthttputil.(*pipeConn).read(0xc0001f0958, {0xc0000c6000, 0x1000, 0xc0001a2bb8?}, 0x0?)
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/fasthttputil/pipeconns.go:165 +0x3a fp=0xc000123a78 sp=0xc000123a48 pc=0x7ccaba
github.com/valyala/fasthttp/fasthttputil.(*pipeConn).Read(0x1a94880?, {0xc0000c6000?, 0xc4?, 0x1000?})
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/fasthttputil/pipeconns.go:148 +0x88 fp=0xc000123af8 sp=0xc000123a78 pc=0x7cc9a8
github.com/valyala/fasthttp.writeBodyChunked(0xc000194930?, {0x7f38e0dceb20, 0xc0001f0958})
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/http.go:2062 +0x95 fp=0xc000123b68 sp=0xc000123af8 pc=0x807bd5
github.com/valyala/fasthttp.(*Response).writeBodyStream(0xc000194930, 0xc000123c48?, 0x1)
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/http.go:1974 +0x1f1 fp=0xc000123be0 sp=0xc000123b68 pc=0x807431
github.com/valyala/fasthttp.(*Response).Write(0xc0000c3000?, 0x1625260?)
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/http.go:1875 +0x157 fp=0xc000123c38 sp=0xc000123be0 pc=0x8070b7
github.com/valyala/fasthttp.writeResponse(0xc000194600?, 0x1aa7868?)
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/server.go:2575 +0x5b fp=0xc000123c58 sp=0xc000123c38 pc=0x8126fb
github.com/valyala/fasthttp.(*Server).serveConn(0xc0003a4200, {0x162c658?, 0xc0005c6008})
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/server.go:2416 +0x1667 fp=0xc000123ec8 sp=0xc000123c58 pc=0x811527
github.com/valyala/fasthttp.(*Server).serveConn-fm({0x162c658?, 0xc0005c6008?})
	<autogenerated>:1 +0x39 fp=0xc000123ef0 sp=0xc000123ec8 pc=0x820959
github.com/valyala/fasthttp.(*workerPool).workerFunc(0xc0001bed20, 0xc000036020)
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/workerpool.go:224 +0xa9 fp=0xc000123fa0 sp=0xc000123ef0 pc=0x81cb89
github.com/valyala/fasthttp.(*workerPool).getCh.func1()
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/workerpool.go:196 +0x38 fp=0xc000123fe0 sp=0xc000123fa0 pc=0x81c8f8
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc000123fe8 sp=0xc000123fe0 pc=0x4ad401
created by github.com/valyala/fasthttp.(*workerPool).getCh
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/workerpool.go:195 +0x1b0

goroutine 52 [chan receive]:
runtime.gopark(0x4b7c25?, 0x1a94400?, 0xa0?, 0x2b?, 0xc0003fe000?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc0003f5d08 sp=0xc0003f5ce8 pc=0x47e236
runtime.chanrecv(0xc000182240, 0xc0003f5f10, 0x1)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/chan.go:583 +0x49d fp=0xc0003f5d98 sp=0xc0003f5d08 pc=0x44d07d
runtime.chanrecv2(0xc0005e2200?, 0xc0005e2200?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/chan.go:447 +0x18 fp=0xc0003f5dc0 sp=0xc0003f5d98 pc=0x44cbb8
github.com/go-skynet/LocalAI/api.chatEndpoint.func2.1(0x0?)
	/home/runner/work/LocalAI/LocalAI/api/openai.go:432 +0xc5 fp=0xc0003f5fa0 sp=0xc0003f5dc0 pc=0xa9b745
github.com/valyala/fasthttp.NewStreamReader.func1()
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/stream.go:44 +0x38 fp=0xc0003f5fe0 sp=0xc0003f5fa0 pc=0x814b18
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0003f5fe8 sp=0xc0003f5fe0 pc=0x4ad401
created by github.com/valyala/fasthttp.NewStreamReader
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/stream.go:43 +0x37c

goroutine 53 [sleep]:
runtime.gopark(0x1840621e730?, 0xd1cb00?, 0x98?, 0x27?, 0x0?)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/proc.go:381 +0xd6 fp=0xc0005dff88 sp=0xc0005dff68 pc=0x47e236
time.Sleep(0x3b9aca00)
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/time.go:195 +0x135 fp=0xc0005dffc8 sp=0xc0005dff88 pc=0x4aa275
github.com/valyala/fasthttp.updateServerDate.func1()
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/header.go:2274 +0x1e fp=0xc0005dffe0 sp=0xc0005dffc8 pc=0x81cfde
runtime.goexit()
	/opt/hostedtoolcache/go/1.20.5/x64/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0005dffe8 sp=0xc0005dffe0 pc=0x4ad401
created by github.com/valyala/fasthttp.updateServerDate
	/home/runner/go/pkg/mod/github.com/valyala/fasthttp@v1.48.0/header.go:2272 +0x25

Additional context

@Pablo1107 Pablo1107 added the bug Something isn't working label Jul 7, 2023
@yunghoy
Copy link

yunghoy commented Jul 7, 2023

Same here. I think this is not the segmentation issue is not related to the memory size.
I have 64 GB ram and the docker container can consume a half of the memory.
The segmentation issue is happening on falcon-7b model

@Pablo1107
Copy link
Author

Same here. I think this is not the segmentation issue is not related to the memory size.
I have 64 GB ram and the docker container can consume a half of the memory.
The segmentation issue is happening on falcon-7b model

What exact file are you using?

@yunghoy
Copy link

yunghoy commented Jul 8, 2023

Tried all Bloke repository files and gpt4all-falcon file. I think MPT and Falcon models do not work. GPT4ALL is working. I think this Github repository is not maintained properly.

Obviously, we can only use MPT or Falcon but cannot use llama nor gpt4all due to license issue. Now talking about llama and gpt4all under K8S is meaningless. Since these llama and gpt4all models are only for your personal work or research, there will be no use of K8S. :p

@mudler
Copy link
Owner

mudler commented Jul 8, 2023

Tried all Bloke repository files and gpt4all-falcon file. I think MPT and Falcon models do not work. GPT4ALL is working. I think this Github repository is not maintained properly.

Please file issues for the problems you find - this is how it works. If you keep the things that work or not by yourself things will never get fixed.
This is a community, open source project - so everyone is trying to help each other here!

Obviously, we can only use MPT or Falcon but cannot use llama nor gpt4all due to license issue. Now talking about llama and gpt4all under K8S is meaningless. Since these llama and gpt4all models are only for your personal work or research, there will be no use of K8S. :p

You are wrong here, there are OpenLLama based models that can be used freely, and gpt4all models based on GPT-J. MPT with gpt4all should work.


I didn't tried Falcon neither MPT recently, as I'm busy with #726 , but I think the model you are trying is not the one I've tried it - that looks somewhat newer.

@bnusunny
Copy link
Contributor

bnusunny commented Jul 8, 2023

@mudler Thanks for building this great project. Could you share the Falcon 7B model file you tested with (#516)? This will unblock us to use Falcon with this nice tool.

@mudler
Copy link
Owner

mudler commented Jul 8, 2023

I had a quick look at the current state and seems most of the work to support falcon went to ggllm.cpp. I quickly give a shot at creating bindings and seems to work with wizardlm-uncensored: https://github.com/mudler/go-ggllm.cpp - I will integrate it in LocalAI soon, that should give support for 7b and 40b at least and GPU support

@mudler
Copy link
Owner

mudler commented Jul 13, 2023

I'm having a closer look at it this weekend, a spare attempt seems to work here with falcon-7b. I'm looking into refactoring the backends first to get rid of some hacks, but this shouldn't take long.

@mudler mudler linked a pull request Jul 14, 2023 that will close this issue
1 task
@mudler
Copy link
Owner

mudler commented Jul 15, 2023

Now master should have falcon working. I've been trying locally with : https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-7B-GGML/tree/main .

I've also kept the old ggml implementation as a fallback in the falcon-ggml backend

Note: you need to be extra-careful to have a matching prompt. Without it the model hallucinates pretty quickly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants