Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cancel stream generation if client disappears #792

Merged
merged 4 commits into from
Jul 24, 2023

Conversation

tmm1
Copy link
Contributor

@tmm1 tmm1 commented Jul 23, 2023

Description

I added Context and Cancel fields to OpenAIRequest, and then ensured that the new per-request context was wired up to the grpc client.

Now if I ctrl+c my curl request while its streaming, the LocalAI server will show:

7:15PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"llama2","choices":[{"index":0,"delta":{"content":" a"}
}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

7:15PM DBG Sending chunk failed: connection closed
Error rpc error: code = Canceled desc = context canceled

Notes for Reviewers

The next request after a cancellation is acting weird and produces some garbage tokens.

7:21PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"llama2","choices":[{"index":0,"delta":{"role":"assistant","content":""}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

7:21PM DBG Loading model llama from llama-2-13b-chat.ggmlv3.q4_K_M.bin
7:21PM DBG Model already loaded in memory: llama-2-13b-chat.ggmlv3.q4_K_M.bin
7:21PM DBG Model already loaded in memory: llama-2-13b-chat.ggmlv3.q4_K_M.bin
7:21PM DBG Sending chunk: {"object":"chat.completion.chunk","model":"llama2","choices":[{"index":0,"delta":{"content":" to"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

Error rpc error: code = Unavailable desc = error reading from server: EOF

The third request will spin up the backend again and that works. Until another request is cancelled anyway.

Signed commits

  • Yes, I signed my commits.

api/openai/request.go Outdated Show resolved Hide resolved
@mudler
Copy link
Owner

mudler commented Jul 23, 2023

thanks! nice addition, just a small note from my side regarding the context. If we keep using the top level we are also sure that we can cancel it (particularly useful for tests)

@mudler mudler changed the title Cancel stream generation if client disappears feat: cancel stream generation if client disappears Jul 23, 2023
@mudler mudler added the enhancement New feature or request label Jul 23, 2023
@tmm1
Copy link
Contributor Author

tmm1 commented Jul 23, 2023

The next request after a cancellation is acting weird and produces some garbage tokens.

@mudler any thoughts on what might be causing this or how to go about fixing it?

@mudler
Copy link
Owner

mudler commented Jul 24, 2023

The next request after a cancellation is acting weird and produces some garbage tokens.

@mudler any thoughts on what might be causing this or how to go about fixing it?

my guess is that as llama.cpp doesn't support multiple concurrent states - what probably happens is that the process is still busy inferencing, and another request tries to book the same context, failing

@mudler mudler enabled auto-merge (squash) July 24, 2023 19:54
@mudler mudler disabled auto-merge July 24, 2023 21:10
@mudler mudler merged commit 12fe093 into mudler:master Jul 24, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants