-
-
Notifications
You must be signed in to change notification settings - Fork 19
Closed
Description
Rather than streaming output I see all of the output show up at the end, all at once.
But, if I add a print statement before an output item is yielded, then I'll see text generated line-by-line from my print().
for item in stream:
# Each item looks like this:
# {'id': 'cmpl-00...', 'object': 'text_completion', 'created': .., 'model': '/path', 'choices': [
# {'text': '\n', 'index': 0, 'logprobs': None, 'finish_reason': None}
# ]}
+ print(item["choices"][0]["text"], end="")
yield item["choices"][0]["text"]
- Using llm 0.8 on macOS m1, installed via pipx
- And the latest llama-cpp-python, force-reinstalled with no pip cache, rebuilt with METAL -- following this repo's README.
- I see this running with default options, like
llm -m llamacode "My prompt here"
on a models added with and without the--llama2-chat
option - When I switch to an OpenAI model like
-m 4
, streaming works.
Metadata
Metadata
Assignees
Labels
No labels