Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to abort streaming completion #974

Open
curran opened this issue Aug 29, 2023 · 1 comment
Open

Ability to abort streaming completion #974

curran opened this issue Aug 29, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@curran
Copy link

curran commented Aug 29, 2023

Is your feature request related to a problem? Please describe.

As a developer of an app that leverages LocalAI and Llama-2 for streaming completions, I want to give users the ability to "abort" or "cancel" the streaming response, so that my self-hosted instance is not spinning CPU / GPU cycles generating the rest of the stream that users won't even see.

Describe the solution you'd like

Ideally, I'd like to use the NodeJS OpenAI package API to abort the stream. As documented in https://github.com/openai/openai-node#streaming-responses , we should be able to invoke

stream.controller.abort()

or just break; out of the async loop.

Describe alternatives you've considered

I've tried the following two approaches.

import OpenAI from "openai";

const content = `
Please write JavaScript code that creates
a scatter plot with D3.js.

Use \`const\` and \`let\` instead of \`var\`.
Use the arrow function syntax.

## JavaScript code
`;

const openai = new OpenAI({
  apiKey: "",
  baseURL: "http://192.168.0.140:8080/v1",
});

const stream = await openai.chat.completions.create({
  model: "llama-2-7b-chat.ggmlv3.q4_0.bin",
  messages: [{ role: "user", content }],
  stream: true,
});

let keepGoing = true
setTimeout(() => {

  // Approach A: This appears to do nothing.
  stream.controller.abort();

  // Approach B:
  // This stops the client from iterating,
  // but the server keeps computing the response
  keepGoing = false;
}, 10 * 1000);


for await (const part of stream) {
  process.stdout.write(part.choices[0].delta.content);
  if(!keepGoing){
    break;
  }
}

Additional context

@curran curran added the enhancement New feature or request label Aug 29, 2023
@MysticalMount
Copy link

+1 on ability to cancel the stream, often we dont get what we want, and checking the LocalAI server its still processing previous generation burning a fair amount of resources in the process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants