Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama LLM: Implement async functionality #13306

Closed
valentimarco opened this issue Nov 13, 2023 · 6 comments
Closed

Ollama LLM: Implement async functionality #13306

valentimarco opened this issue Nov 13, 2023 · 6 comments
Labels
🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features Ɑ: models Related to LLMs or chat model modules

Comments

@valentimarco
Copy link

Feature request

Implement Async methods in ollama llm and chat model classes.

Motivation

Ollama implementation doesn't include the async methods _astream and _agenerate and i cannot create a async agent...

Your contribution

This is my first issue, i can try but i am working in 3 different projects right now...

Copy link

dosubot bot commented Nov 13, 2023

🤖

Hello @valentimarco,

I'm Dosu, a bot here to assist with bug resolution, answer questions, and guide you on becoming a contributor to LangChain. I'm here to help while we await the human maintainer's response.

I've taken a look at your issue regarding the implementation of async functionality in Ollama LLM and chat model classes. I'll be providing a more comprehensive answer shortly.

@dosubot dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Nov 13, 2023
Copy link

dosubot bot commented Nov 13, 2023

🤖

Thank you for your feature request, valentimarco. I understand that you're looking to implement async methods in the Ollama LLM and chat model classes.

Based on the current implementation of the Ollama LLM and chat model classes in the LangChain codebase, you can indeed implement async methods such as _astream and _agenerate. Here's a sample code snippet on how you can do it:

import asyncio

class ChatOllama(BaseChatModel, _OllamaCommon):
    # ... existing code ...

    async def _astream(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> AsyncIterator[ChatGenerationChunk]:
        prompt = self._format_messages_as_text(messages)
        async for stream_resp in self._create_stream(prompt, stop, **kwargs):
            if stream_resp:
                chunk = _stream_response_to_chat_generation_chunk(stream_resp)
                yield chunk
                if run_manager:
                    run_manager.on_llm_new_token(
                        chunk.text,
                        verbose=self.verbose,
                    )

    async def _agenerate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        prompt = self._format_messages_as_text(messages)
        final_chunk = await super()._astream_with_aggregation(
            prompt, stop=stop, run_manager=run_manager, verbose=self.verbose, **kwargs
        )
        chat_generation = ChatGeneration(
            message=AIMessage(content=final_chunk.text),
            generation_info=final_chunk.generation_info,
        )
        return ChatResult(generations=[chat_generation])

Please note that you'll need to implement the async version of the _create_stream and _astream_with_aggregation methods as well. Also, the run_manager.on_llm_new_token method should be able to handle async calls.

For reference, you can check out these similar solved issues in the LangChain repository:

  1. MultiRetrievalQAChain async mode
  2. OpenAIFunctionsAgent | Streaming Bug

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@valentimarco
Copy link
Author

@dosu-bot the callback must be AsyncCallbackManagerForLLMRun and i need to implement a streaming bool attribute. Can you give me a example?

Copy link

dosubot bot commented Nov 13, 2023

🤖

To implement async methods in the Ollama LLM and chat model classes of the LangChain Python framework, you can use Python's built-in async and await keywords. Here's how you can modify the _generate and _stream methods to be asynchronous:

class ChatOllama(BaseChatModel, _OllamaCommon):
    # ...

    async def _generate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        """Call out to Ollama's generate endpoint asynchronously.

        Args:
            messages: The list of base messages to pass into the model.
            stop: Optional list of stop words to use when generating.

        Returns:
            Chat generations from the model
        """

        prompt = self._format_messages_as_text(messages)
        final_chunk = await super()._stream_with_aggregation(
            prompt, stop=stop, run_manager=run_manager, verbose=self.verbose, **kwargs
        )
        chat_generation = ChatGeneration(
            message=AIMessage(content=final_chunk.text),
            generation_info=final_chunk.generation_info,
        )
        return ChatResult(generations=[chat_generation])

    async def _stream(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[ChatGenerationChunk]:
        prompt = self._format_messages_as_text(messages)
        async for stream_resp in self._create_stream(prompt, stop, **kwargs):
            if stream_resp:
                chunk = _stream_response_to_chat_generation_chunk(stream_resp)
                yield chunk
                if run_manager:
                    run_manager.on_llm_new_token(
                        chunk.text,
                        verbose=self.verbose,
                    )

Please note that the _stream_with_aggregation and _create_stream methods also need to be asynchronous to work with the await and async for keywords, respectively. The 'streaming' bool attribute can be included in the **kwargs argument of these methods.

You can find more details about the structure of the Ollama LLM and chat model classes in the LangChain Python framework in the source code files libs/langchain/langchain/chat_models/ollama.py and libs/langchain/langchain/llms/ollama.py.

The AsyncCallbackManagerForLLMRun is an asynchronous version of the CallbackManagerForLLMRun class in the LangChain framework. It manages callbacks for a Language Model (LLM) run and inherits from the AsyncRunManager and LLMManagerMixin classes. The AsyncCallbackManagerForLLMRun class has several asynchronous methods that are triggered on specific events during an LLM run. These methods include on_llm_new_token, on_llm_end, and on_llm_error. Each of these methods uses the ahandle_event function to handle the event and pass the relevant information to the registered handlers. You can find more details about this in the source code file libs/langchain/langchain/callbacks/manager.py.

I hope this helps! If you have any further questions, feel free to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Copy link

dosubot bot commented Feb 12, 2024

Hi, @valentimarco,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue you raised requested the implementation of async methods in the Ollama LLM and chat model classes. Dosu-bot provided a detailed response with code snippets and explanations on how to implement async functionality using Python's built-in async and await keywords. Additionally, you sought further clarification on implementing the streaming bool attribute.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 12, 2024
@valentimarco
Copy link
Author

Yes, i confim

@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 12, 2024
@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 13, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 20, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features Ɑ: models Related to LLMs or chat model modules
Projects
None yet
Development

No branches or pull requests

1 participant