Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Ollama streamed responses #644

Merged

Conversation

dferrazm
Copy link
Contributor

@dferrazm dferrazm commented May 29, 2024

  • chat : Added support for streamed responses.
  • complete : Fixed. It was not working at all. Now it works with both streamed and non-streamed responses.

To generate non-streamed responses, call the methods without passing a block. To generate streamed responses, passes the block. Eg.

# non-streamed resps
resp = ollama.chat(messages: [{ role: "user", content: "Hi" }])
resp.chat_completion # => "Hello there!"

# streamed resps
resp = ollama.chat(messages: [{ role: "user", content: "Hi" }]) { |resp| print resp.chat_completion }
# Will print iteratively "Hello there!"
resp.chat_completion # => "Hello there!"

Note: Passing the stream paramater to method will not have any effect anymore.

Closes #550

JSON.parse(chunk_line)
rescue JSON::ParserError
# In some cases the chunk exceeds the buffer size and the JSON parser fails.
# TODO: How to better handle this?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not clear what we should do here @andreibondarev . wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the error just needs to be thrown? There's no graceful way to recover from this and still provide a useful response, I don't think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let's remove that and let it blow up, until we figure out a way to handle this.


yield json_chunk, size if block
yield Langchain::LLM::OllamaResponse.new(parsed_chunk, model: parameters[:model]) if block_given?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we start referencing these classes directly here since it's all under the same namespace: OllamaResponse.new .... It's cleaner that way, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep. it'll probably work too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

prompt_tokens + completion_tokens if done?
end

def done?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@andreibondarev
Copy link
Collaborator

@dferrazm I just tried this but nothing was streamed to me:

irb(main):005> llm = Langchain::LLM::Ollama.new(url: ENV["OLLAMA_URL"])
=>
#<Langchain::LLM::Ollama:0x0000000128a3a3d0
...
irb(main):006* llm.chat messages: [{role:"user", content:"hey"}] do |chunk|
irb(main):007*   puts chunk
irb(main):008> end
#<Langchain::LLM::OllamaResponse:0x000000012803b988>
=>
#<Langchain::LLM::OllamaResponse:0x0000000128038170
 @model="llama3",
 @prompt_tokens=nil,
 @raw_response=
  {"model"=>"llama3",
   "created_at"=>"2024-05-29T15:56:04.473077Z",
   "message"=>{"role"=>"assistant", "content"=>"Hey! How's it going?"},
   "done_reason"=>"stop",
   "done"=>true,
   "total_duration"=>11215288666,
   "load_duration"=>10900135000,
   "prompt_eval_count"=>11,
   "prompt_eval_duration"=>152727000,
   "eval_count"=>8,
   "eval_duration"=>156123000}>

@dferrazm
Copy link
Contributor Author

@andreibondarev you have to pass stream: true. Should it be set by default? Probably fallback to true if the block is given, right?

@dferrazm dferrazm marked this pull request as ready for review May 30, 2024 16:40
@andreibondarev andreibondarev merged commit bb1f89a into patterns-ai-core:main Jun 1, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix Ollama#chat() streaming
2 participants