Ollama LLM: Implement async functionality #13306

valentimarco · 2023-11-13T21:59:16Z

Feature request

Implement Async methods in ollama llm and chat model classes.

Motivation

Ollama implementation doesn't include the async methods _astream and _agenerate and i cannot create a async agent...

Your contribution

This is my first issue, i can try but i am working in 3 different projects right now...

dosubot · 2023-11-13T21:59:34Z

🤖

Hello @valentimarco,

I'm Dosu, a bot here to assist with bug resolution, answer questions, and guide you on becoming a contributor to LangChain. I'm here to help while we await the human maintainer's response.

I've taken a look at your issue regarding the implementation of async functionality in Ollama LLM and chat model classes. I'll be providing a more comprehensive answer shortly.

dosubot · 2023-11-13T22:10:31Z

🤖

Thank you for your feature request, valentimarco. I understand that you're looking to implement async methods in the Ollama LLM and chat model classes.

Based on the current implementation of the Ollama LLM and chat model classes in the LangChain codebase, you can indeed implement async methods such as _astream and _agenerate. Here's a sample code snippet on how you can do it:

import asyncio

class ChatOllama(BaseChatModel, _OllamaCommon):
    # ... existing code ...

    async def _astream(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> AsyncIterator[ChatGenerationChunk]:
        prompt = self._format_messages_as_text(messages)
        async for stream_resp in self._create_stream(prompt, stop, **kwargs):
            if stream_resp:
                chunk = _stream_response_to_chat_generation_chunk(stream_resp)
                yield chunk
                if run_manager:
                    run_manager.on_llm_new_token(
                        chunk.text,
                        verbose=self.verbose,
                    )

    async def _agenerate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        prompt = self._format_messages_as_text(messages)
        final_chunk = await super()._astream_with_aggregation(
            prompt, stop=stop, run_manager=run_manager, verbose=self.verbose, **kwargs
        )
        chat_generation = ChatGeneration(
            message=AIMessage(content=final_chunk.text),
            generation_info=final_chunk.generation_info,
        )
        return ChatResult(generations=[chat_generation])

Please note that you'll need to implement the async version of the _create_stream and _astream_with_aggregation methods as well. Also, the run_manager.on_llm_new_token method should be able to handle async calls.

For reference, you can check out these similar solved issues in the LangChain repository:

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

valentimarco · 2023-11-13T22:14:33Z

@dosu-bot the callback must be AsyncCallbackManagerForLLMRun and i need to implement a streaming bool attribute. Can you give me a example?

dosubot · 2023-11-13T22:27:26Z

🤖

To implement async methods in the Ollama LLM and chat model classes of the LangChain Python framework, you can use Python's built-in async and await keywords. Here's how you can modify the _generate and _stream methods to be asynchronous:

class ChatOllama(BaseChatModel, _OllamaCommon):
    # ...

    async def _generate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> ChatResult:
        """Call out to Ollama's generate endpoint asynchronously.

        Args:
            messages: The list of base messages to pass into the model.
            stop: Optional list of stop words to use when generating.

        Returns:
            Chat generations from the model
        """

        prompt = self._format_messages_as_text(messages)
        final_chunk = await super()._stream_with_aggregation(
            prompt, stop=stop, run_manager=run_manager, verbose=self.verbose, **kwargs
        )
        chat_generation = ChatGeneration(
            message=AIMessage(content=final_chunk.text),
            generation_info=final_chunk.generation_info,
        )
        return ChatResult(generations=[chat_generation])

    async def _stream(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[ChatGenerationChunk]:
        prompt = self._format_messages_as_text(messages)
        async for stream_resp in self._create_stream(prompt, stop, **kwargs):
            if stream_resp:
                chunk = _stream_response_to_chat_generation_chunk(stream_resp)
                yield chunk
                if run_manager:
                    run_manager.on_llm_new_token(
                        chunk.text,
                        verbose=self.verbose,
                    )

Please note that the _stream_with_aggregation and _create_stream methods also need to be asynchronous to work with the await and async for keywords, respectively. The 'streaming' bool attribute can be included in the **kwargs argument of these methods.

You can find more details about the structure of the Ollama LLM and chat model classes in the LangChain Python framework in the source code files libs/langchain/langchain/chat_models/ollama.py and libs/langchain/langchain/llms/ollama.py.

The AsyncCallbackManagerForLLMRun is an asynchronous version of the CallbackManagerForLLMRun class in the LangChain framework. It manages callbacks for a Language Model (LLM) run and inherits from the AsyncRunManager and LLMManagerMixin classes. The AsyncCallbackManagerForLLMRun class has several asynchronous methods that are triggered on specific events during an LLM run. These methods include on_llm_new_token, on_llm_end, and on_llm_error. Each of these methods uses the ahandle_event function to handle the event and pass the relevant information to the registered handlers. You can find more details about this in the source code file libs/langchain/langchain/callbacks/manager.py.

I hope this helps! If you have any further questions, feel free to ask.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot · 2024-02-12T16:01:49Z

Hi, @valentimarco,

I'm helping the LangChain team manage their backlog and am marking this issue as stale. The issue you raised requested the implementation of async methods in the Ollama LLM and chat model classes. Dosu-bot provided a detailed response with code snippets and explanations on how to implement async functionality using Python's built-in async and await keywords. Additionally, you sought further clarification on implementing the streaming bool attribute.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you!

valentimarco · 2024-02-12T16:16:34Z

Yes, i confim

dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Nov 13, 2023

valentimarco closed this as completed Nov 13, 2023

valentimarco reopened this Nov 13, 2023

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 12, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 12, 2024

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 13, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 20, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama LLM: Implement async functionality #13306

Ollama LLM: Implement async functionality #13306

valentimarco commented Nov 13, 2023

dosubot bot commented Nov 13, 2023

dosubot bot commented Nov 13, 2023

valentimarco commented Nov 13, 2023

dosubot bot commented Nov 13, 2023

dosubot bot commented Feb 12, 2024

valentimarco commented Feb 12, 2024

Ollama LLM: Implement async functionality #13306

Ollama LLM: Implement async functionality #13306

Comments

valentimarco commented Nov 13, 2023

Feature request

Motivation

Your contribution

dosubot bot commented Nov 13, 2023

dosubot bot commented Nov 13, 2023

Sources

valentimarco commented Nov 13, 2023

dosubot bot commented Nov 13, 2023

Sources

dosubot bot commented Feb 12, 2024

valentimarco commented Feb 12, 2024