Skip to content

Conversation

@slavarazum
Copy link

@slavarazum slavarazum commented Dec 21, 2022

Brings stream support to return partial progress with server-sent events.

⚠️ requests with best_of option can't be streamed.

You can think of it as the progress of AI typing.

The main advantage of using streams is that you can return a response to the user much earlier.
See examples above to compare the result.

Request completion model

openai-php-request.mp4

Stream completion model

openai-php-stream.mp4

Currently for completions model.

Usage

Pass the stream parameter and get a generator object.
You can iterate over it until stream ends.

$client = OpenAI::client('YOUR_API_KEY');

$stream = $client->completions()->create([
    'model' => 'davinci',
    'prompt' => 'PHP is',
    'stream' => true,
]);

$fullText = '';

foreach ($stream as $item) {
    $fullText += $item['choices'][0]['text'];
}

Each iteration of stream read returns the same object as non-streamed responses returns, except usage and finishReason parameters, which are not present in partial streamed responses.

Stream text results to the client with Laravel response:

response()->stream(
    function () use ($stream) {
        foreach ($stream as $item) {
            echo $item['choices'][0]['text'];
            ob_flush();
            flush();
        }
    },
    200,
    [
        'X-Accel-Buffering' => 'no',
    ]
);

You can create your own event stream (server-sent events) to send partially data to the client.

Example with Laravel response:

response()->stream(
    function () use ($stream) {
        foreach ($stream as $item) {
            echo 'data: ' . json_encode($item) . PHP_EOL . PHP_EOL;
            ob_flush();
            flush();
        }
    },
    200,
    [
        'Content-Type' => 'text/event-stream',
        'Connection' => 'keep-alive',
        'Cache-Control' => 'no-cache',
        'X-Accel-Buffering' => 'no',
    ]
);

The current draft tries to create an implementation with minimal API changes.

  • completions model now returns array|Generator
  • usage parameter marked as optional in CreateResponse
  • finishReason changed to nullable
  • HttpTransporter client dependency changed to GuzzleHttp\ClientInterface

I would be happy for discussion and bring this PR to a stable version.

@slavarazum
Copy link
Author

slavarazum commented Dec 22, 2022

In the new commit, replaced custom Stream object with Generator to stay more native.

You can iterate directly through returned generator object:

- foreach ($stream->read() as $item) {
+ foreach ($stream as $item) {
    $fullText += $item['choices'][0]['text'];
}

@slavarazum
Copy link
Author

@nunomaduro @gehrisandro Any thoughts or suggestions? 💭

@nunomaduro
Copy link
Contributor

@slavarazum Currently a little bit busy - I will check this as soon as possible.

@nunomaduro
Copy link
Contributor

@slavarazum There is any reason why this makes pull request does not make the client to use stream always?

@slavarazum
Copy link
Author

slavarazum commented Jan 8, 2023

@nunomaduro The main reason is that the stream parameter is optional and false by default in OpenAI API.
Stream response requires to handle iterable type.
It might be confusing if we set it to true by default.

However, in my opinion, stream responses absolutely necessary to provide better user experience if response expects as soon as possible on the client side.

@ijjimem
Copy link

ijjimem commented Jan 8, 2023

How to make it work now? Is it possible without changes to core code?

* @throws ErrorException|UnserializableResponse|TransporterException
*/
public function requestObject(Payload $payload): array;
public function requestObject(Payload $payload, bool $stream = false): array|Generator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you somehow, make this method return a single object - for both cases - and have the stream option sent on the Payload object?

Copy link
Author

@slavarazum slavarazum Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created new Response object which returns Transporter::requestObject. Stream option handling moved to Payload object.


private function stream(Generator $stream): Generator
{
foreach ($stream as $data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you create a single CreateResponse object, that behind the scenes allows to iterate on the stream?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateResponse for completions now implements Iterator interface which allows to iterate over the stream. How do you feel about that?

@slavarazum
Copy link
Author

@ijjimem Let's wait for the implementation of this PR. Continued working on it.

@gehrisandro
Copy link
Collaborator

Hi @slavarazum

Thank you very much for your work so far. And sorry, for my delayed response.

I had a look into your implementation and I can see some good starting points, but nevertheless I would like to make a step back and talk first about the use cases and how the usage should look like.

Mainly I can see two use cases or goals to be achieved:

  1. Return as quick as possible the completion retrieved so far. For example as a stream response in Laravel: return response()->stream(...)
  2. Do something with the completion retrieved so far and when done fetch again the completion retrieved so far (not sure if this one is clear, maybe the code example below helps to clarify).

To achieve this two different use cases the user needs a way to use the response in different ways. Therefore I think it's better to have a dedicated CreateResponseStream class which should be separated from the CreateResponse.

Example for use case 1:

$stream = $client->completions()->createStreamed([
    'model' => 'text-davinci-003',
    'prompt' => 'PHP is ',
    'max_tokens' => 100,
]); // CreateStreamResponse

return response()->stream(function () use ($stream) {
    foreach($stream->iterator() as $newPart){
        echo $newPart; // CompletionPartial
    }
});

Example for use case 2:

$stream = $client->completions()->createStreamed([
    'model' => 'text-davinci-003',
    'prompt' => 'PHP is ',
    'max_tokens' => 100,
]); // CreateStreamResponse

while(!$stream->finished()){
    $response = $stream->response(); // CreateResponse object with the full completion received so far
    sleep(1); // do some work with the response
}

Some explanations how I would structure the code:

CreateStreamResponse would implement a new interface StreamResponse with three methods:

  • iterator() returns a class which implements a new interface StreamResponseIterator (see below)
  • finished() return a boolean if the stream (completion) has finished (maybe completed() would be a better name, but it's a bit weird in context of "completions")
  • response() a Response object (in this case a CreateResponse) with the full completion retrieved so far

StreamResponseIterator interface would extend the Iterator interface with objects from a new interface ResponsePartial

StreamResponsePartial interface for a newly received part of the stream response. In case of the completions this would be an object holding a simple string. When listing fine tune events as a stream (the only other endpoint which supports streaming so far) this would be an object holding a new FineTune event.

CompletionPartial would implement StreamResponsePartial and holds only the new part of the completion as a string (and implements __toString())

@nunomaduro and @slavarazum: Can you come up with different use cases and what do you think about having a dedicated method (createStreamed()) and response (CreateStreamResponse)?

@slavarazum
Copy link
Author

Hi @gehrisandro 🙌

In my vision, stream option should be passed with other options to create method for 2 reasons:

  1. Stay consistent. Developers might expect that all options from OpenAI API Docs able to pass into create method.
  2. A stream response is not guaranteed even if we pass the stream parameter. For example when best_of option is presented too.

A method that returns the full content of a stream response might makes sense.

Let's define the final API.

  • create method should return a single object for both cases or separate?
  • to iterate through the stream should we call some method or it would be better to iterate over returned object?

Single object with isStream method and ability to iterate over it may provide more simple API:

$response = $client->completions()->create([
    // ...
]);

if ($response->isStream()) {
    foreach ($response as $part) {
        // ...
    }
}

vs

$response = $client->completions()->create([
    // ...
]);

if ($response instanceof CreateStreamResponse) {
    foreach ($response->iterator() as $part) {
        // ...
    }
}

@nunomaduro
Copy link
Contributor

nunomaduro commented Jan 18, 2023

Can I have examples, of our other OpenAI API Clients (in other languages) solved this problem?

ps: @slavarazum really super sorry if this issue is taking forever to decide, but currently I am so busy that's been difficult.

@slavarazum
Copy link
Author

@nunomaduro NP 🤝 Me too have some troubles in Ukraine with availability 😅 It's ok to not rush with this to create a truly good implementation. As I can see at first glance, currently it partially solved in other languages. So we can serve as a good example.

I have a look around libraries listed in docs and some others.

Examples:

Other:

@gehrisandro
Copy link
Collaborator

gehrisandro commented Jan 19, 2023

I tried various combinations with the stream and best_of parameter, because the OpenAI documentation is not very clear about, what happens if you provide both.

As far as I was able to see in my tests, if you provide both parameters together it still returns a stream response but it waits sending the stream until the completion is done. What means it is not faster than a request without the stream option.
But technically it is still a stream which contains the full completion within a single event. And in consequence it does not include the usage.

So I would still prefer to have two different methods but I understand your concern that developers probably are going to try the stream = true parameter on the normal create() method. To mitigate this issue we could throw an exception "stream is not support here, please use createStreamed() instead."

In the opposite we could throw an exception if best_of is passed to the createStreamed method as well even if it technically works, but there is no benefit and may leads to some confusion.

Additionally I found more combinations where the API has, at least in my opinion, a weird or unpredictable behaviour. For example if you pass n = 2 and stream = true the API returns the expected stream response but only with one completion instead of two. In other words the n parameter is completely ignored.
Update: I didn't test carefully enough. It actually returns two completions. Sorry about that.

@slavarazum Do you still think that having a single method is more convenient?
Personally I do not like the necessary if statements to determine how to handle the response.
Furthermore in most use cases developers will not reach for the stream option and therefore I think we should keep the "normal" create() method and response as simple as possible.

@slavarazum
Copy link
Author

Looks like separate createStreamed method with appropriate exceptions might make sense.

At the moment I see more use cases for streamed responses than for conventional ones. For reasons of the longer response time, normal requests are more likely to be suitable for background operations when the user is not waiting for a response as soon as possible. Perhaps I'm missing something, since the industry is just emerging.

@jhull
Copy link

jhull commented Feb 16, 2023

What is the latest on this? Wold LOVE to be able to start using this for streaming, but can't find a reasonable way of doing it anywhere ... hoping it's soon!

@slavarazum
Copy link
Author

@nunomaduro How do you feel about separate createStreamed method, any thoughts about implementation in general?

@genesiscz
Copy link

genesiscz commented Mar 12, 2023

Any update on this? Would love to see it working :-) I very like the implementation and agree on the single-method approach.

@CaReS0107
Copy link

Any update on this PR ? 🙏

@slavarazum
Copy link
Author

Working on refactoring with chat endpoint support. Trying to simplify the implementation.

@Pierquinto
Copy link

do someone of you deal with this on Laravel with InertiaJs and Vue3? I would love to see your implementation!

@oppsDayly
Copy link

oh why not support stream responses? it is so nice to user . it can tell user ai is working .if wait long time user may think ai is not working .

@huangdijia
Copy link

I also hope to support stream as soon as possible.

@slavarazum
Copy link
Author

Working on it exactly right now. Will update a PR with new Chat completions draft as soon as possible.
Stay tuned 😉

@Pierquinto When it will be finished I will share an examples with client side part.

@slavarazum
Copy link
Author

Let's continue here - #84

@slavarazum slavarazum closed this Mar 24, 2023
@slavarazum
Copy link
Author

@Pierquinto simple stream reading implementation with fetch:

async function askAi() {
  const response = await fetch("/ask-ai");

  const reader = response.body?.pipeThrough(new TextDecoderStream()).getReader();

  let delta = await reader.read();

  while (!delta.done) {
    // do something with the chunk

    delta = await reader.read();
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants