-
Notifications
You must be signed in to change notification settings - Fork 13.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ollama LLM: Implement async functionality #13306
Comments
🤖 Hello @valentimarco, I'm Dosu, a bot here to assist with bug resolution, answer questions, and guide you on becoming a contributor to LangChain. I'm here to help while we await the human maintainer's response. I've taken a look at your issue regarding the implementation of async functionality in Ollama LLM and chat model classes. I'll be providing a more comprehensive answer shortly. |
🤖 Thank you for your feature request, valentimarco. I understand that you're looking to implement async methods in the Ollama LLM and chat model classes. Based on the current implementation of the Ollama LLM and chat model classes in the LangChain codebase, you can indeed implement async methods such as _astream and _agenerate. Here's a sample code snippet on how you can do it: import asyncio
class ChatOllama(BaseChatModel, _OllamaCommon):
# ... existing code ...
async def _astream(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> AsyncIterator[ChatGenerationChunk]:
prompt = self._format_messages_as_text(messages)
async for stream_resp in self._create_stream(prompt, stop, **kwargs):
if stream_resp:
chunk = _stream_response_to_chat_generation_chunk(stream_resp)
yield chunk
if run_manager:
run_manager.on_llm_new_token(
chunk.text,
verbose=self.verbose,
)
async def _agenerate(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> ChatResult:
prompt = self._format_messages_as_text(messages)
final_chunk = await super()._astream_with_aggregation(
prompt, stop=stop, run_manager=run_manager, verbose=self.verbose, **kwargs
)
chat_generation = ChatGeneration(
message=AIMessage(content=final_chunk.text),
generation_info=final_chunk.generation_info,
)
return ChatResult(generations=[chat_generation]) Please note that you'll need to implement the async version of the For reference, you can check out these similar solved issues in the LangChain repository: I hope this helps! If you have any other questions or need further clarification, feel free to ask. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
@dosu-bot the callback must be |
🤖 To implement async methods in the Ollama LLM and chat model classes of the LangChain Python framework, you can use Python's built-in class ChatOllama(BaseChatModel, _OllamaCommon):
# ...
async def _generate(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> ChatResult:
"""Call out to Ollama's generate endpoint asynchronously.
Args:
messages: The list of base messages to pass into the model.
stop: Optional list of stop words to use when generating.
Returns:
Chat generations from the model
"""
prompt = self._format_messages_as_text(messages)
final_chunk = await super()._stream_with_aggregation(
prompt, stop=stop, run_manager=run_manager, verbose=self.verbose, **kwargs
)
chat_generation = ChatGeneration(
message=AIMessage(content=final_chunk.text),
generation_info=final_chunk.generation_info,
)
return ChatResult(generations=[chat_generation])
async def _stream(
self,
messages: List[BaseMessage],
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> Iterator[ChatGenerationChunk]:
prompt = self._format_messages_as_text(messages)
async for stream_resp in self._create_stream(prompt, stop, **kwargs):
if stream_resp:
chunk = _stream_response_to_chat_generation_chunk(stream_resp)
yield chunk
if run_manager:
run_manager.on_llm_new_token(
chunk.text,
verbose=self.verbose,
) Please note that the You can find more details about the structure of the Ollama LLM and chat model classes in the LangChain Python framework in the source code files libs/langchain/langchain/chat_models/ollama.py and libs/langchain/langchain/llms/ollama.py. The I hope this helps! If you have any further questions, feel free to ask. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Hi, @valentimarco, I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue you raised requested the implementation of async methods in the Ollama LLM and chat model classes. Dosu-bot provided a detailed response with code snippets and explanations on how to implement async functionality using Python's built-in Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you! |
Yes, i confim |
Feature request
Implement Async methods in ollama llm and chat model classes.
Motivation
Ollama implementation doesn't include the async methods _astream and _agenerate and i cannot create a async agent...
Your contribution
This is my first issue, i can try but i am working in 3 different projects right now...
The text was updated successfully, but these errors were encountered: