Support Ollama streamed responses #644

dferrazm · 2024-05-29T15:28:37Z

chat : Added support for streamed responses.
complete : Fixed. It was not working at all. Now it works with both streamed and non-streamed responses.

To generate non-streamed responses, call the methods without passing a block. To generate streamed responses, passes the block. Eg.

# non-streamed resps
resp = ollama.chat(messages: [{ role: "user", content: "Hi" }])
resp.chat_completion # => "Hello there!"

# streamed resps
resp = ollama.chat(messages: [{ role: "user", content: "Hi" }]) { |resp| print resp.chat_completion }
# Will print iteratively "Hello there!"
resp.chat_completion # => "Hello there!"

Note: Passing the stream paramater to method will not have any effect anymore.

Closes #550

dferrazm · 2024-05-29T15:29:34Z

lib/langchain/llm/ollama.rb

+            JSON.parse(chunk_line)
+          rescue JSON::ParserError
+            # In some cases the chunk exceeds the buffer size and the JSON parser fails.
+            # TODO: How to better handle this?


it's not clear what we should do here @andreibondarev . wdyt?

I wonder if the error just needs to be thrown? There's no graceful way to recover from this and still provide a useful response, I don't think?

yeah, let's remove that and let it blow up, until we figure out a way to handle this.

lib/langchain/llm/aws_bedrock.rb

andreibondarev · 2024-05-29T15:40:54Z

lib/langchain/llm/ollama.rb


-          yield json_chunk, size if block
+          yield Langchain::LLM::OllamaResponse.new(parsed_chunk, model: parameters[:model]) if block_given?


Maybe we start referencing these classes directly here since it's all under the same namespace: OllamaResponse.new .... It's cleaner that way, right?

yep. it'll probably work too.

andreibondarev · 2024-05-29T15:47:40Z

lib/langchain/llm/response/ollama_response.rb

+      prompt_tokens + completion_tokens if done?
+    end
+
+    def done?


andreibondarev · 2024-05-29T15:57:13Z

@dferrazm I just tried this but nothing was streamed to me:

irb(main):005> llm = Langchain::LLM::Ollama.new(url: ENV["OLLAMA_URL"])
=>
#<Langchain::LLM::Ollama:0x0000000128a3a3d0
...
irb(main):006* llm.chat messages: [{role:"user", content:"hey"}] do |chunk|
irb(main):007*   puts chunk
irb(main):008> end
#<Langchain::LLM::OllamaResponse:0x000000012803b988>
=>
#<Langchain::LLM::OllamaResponse:0x0000000128038170
 @model="llama3",
 @prompt_tokens=nil,
 @raw_response=
  {"model"=>"llama3",
   "created_at"=>"2024-05-29T15:56:04.473077Z",
   "message"=>{"role"=>"assistant", "content"=>"Hey! How's it going?"},
   "done_reason"=>"stop",
   "done"=>true,
   "total_duration"=>11215288666,
   "load_duration"=>10900135000,
   "prompt_eval_count"=>11,
   "prompt_eval_duration"=>152727000,
   "eval_count"=>8,
   "eval_duration"=>156123000}>

dferrazm · 2024-05-29T16:25:58Z

@andreibondarev you have to pass stream: true. Should it be set by default? Probably fallback to true if the block is given, right?

dferrazm commented May 29, 2024

View reviewed changes

andreibondarev reviewed May 29, 2024

View reviewed changes

lib/langchain/llm/aws_bedrock.rb Show resolved Hide resolved

andreibondarev reviewed May 29, 2024

View reviewed changes

dferrazm force-pushed the ollama-chat-streaming branch from 83d5fff to 050eadd Compare May 30, 2024 16:29

dferrazm added 2 commits May 30, 2024 18:40

Fix some doc comments

b8e9838

Fix Ollama :chat and :complete streamed responses

ba9db3c

dferrazm force-pushed the ollama-chat-streaming branch from 050eadd to ba9db3c Compare May 30, 2024 16:40

dferrazm marked this pull request as ready for review May 30, 2024 16:40

Merge branch 'main' into ollama-chat-streaming

cb6791c

andreibondarev merged commit bb1f89a into patterns-ai-core:main Jun 1, 2024
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Ollama streamed responses #644

Support Ollama streamed responses #644

dferrazm commented May 29, 2024 •

edited

Loading

dferrazm May 29, 2024

andreibondarev May 29, 2024

dferrazm May 30, 2024

andreibondarev May 29, 2024

dferrazm May 29, 2024

dferrazm May 30, 2024

andreibondarev May 29, 2024

dferrazm May 30, 2024

andreibondarev commented May 29, 2024

dferrazm commented May 29, 2024


		yield json_chunk, size if block
		yield Langchain::LLM::OllamaResponse.new(parsed_chunk, model: parameters[:model]) if block_given?

Support Ollama streamed responses #644

Support Ollama streamed responses #644

Conversation

dferrazm commented May 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreibondarev commented May 29, 2024

dferrazm commented May 29, 2024

dferrazm commented May 29, 2024 •

edited

Loading