Output does not stream

Rather than streaming output I see all of the output show up at the end, all at once.

But, if I add a print statement before an output item is yielded, then I'll see text generated line-by-line from my print().

```diff
for item in stream:
    # Each item looks like this:
    # {'id': 'cmpl-00...', 'object': 'text_completion', 'created': .., 'model': '/path', 'choices': [
    #   {'text': '\n', 'index': 0, 'logprobs': None, 'finish_reason': None}
    # ]}
+   print(item["choices"][0]["text"], end="")
    yield item["choices"][0]["text"]
```

- Using llm 0.8 on macOS m1, installed via pipx
- And the latest llama-cpp-python, force-reinstalled with no pip cache, rebuilt with METAL -- following this repo's README.
- I see this running with default options, like `llm -m llamacode "My prompt here"` on a models added with and without the `--llama2-chat` option
- When I switch to an OpenAI model like `-m 4`, streaming works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Output does not stream #11

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Output does not stream #11

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions