GPT4All chat error with async calls #5210

PiotrPmr · 2023-05-24T19:27:35Z

Hi, I believe this issue is related to this one: #1372

I'm using GPT4All integration and get the following error after running ConversationalRetrievalChain with AsyncCallbackManager:
ERROR:root:Async generation not implemented for this LLM.
Changing to CallbackManager does not fix anything.

The issue is model-agnostic, i.e., I have used ggml-gpt4all-j-v1.3-groovy.bin and ggml-mpt-7b-base.bin. The LangChain version I'm using is 0.0.179. Any ideas how this can be potentially solved or should we just wait for a new release fixing it?

Suggestion:

Release a fix, similar as in #1372

The text was updated successfully, but these errors were encountered:

poojatambe · 2023-05-31T10:06:01Z

@PiotrPmr Hi,
Any solution found for this issue?
Getting NotImplementedError: Async generation not implemented for this LLM error for langchain version 0.0.186 with gpt4all model.

ncfx · 2023-06-10T16:11:51Z

I'm having the same issue with another LlamaCpp LLM as well as a HuggingFaceHub LLM. I'm using LLMChain. Hoping someone can fix this!

khaledadrani · 2023-06-15T08:38:06Z

having the same issue, +1

khaledadrani · 2023-06-16T10:46:28Z

I managed to make a fix, and will be making a pr soon

suhailmalik07 · 2023-06-19T07:28:18Z

@khaledadrani eagerly waiting for it.

kesavazt · 2023-06-19T08:39:34Z

@khaledadrani If you could describe your solution before making a pr that would be helpful. Thanks.

khaledadrani · 2023-06-19T14:57:39Z

I am currently checking with the requirements for my pr to be ready for reviewing (format, linting, testing)
I will finish them ASAP and this is my first contribution ever in an open source project :)

what I did was implement _acall with support for async await, should I add just one test with async for this? is it enough to be accepted? https://github.com/khaledadrani/langchain/blob/32a041b8a2a5a8a6db36592b501e4ce9d54c219b/tests/unit_tests/llms/fake_llm.py

Edit; I also need to put a test here https://github.com/khaledadrani/langchain/blob/32a041b8a2a5a8a6db36592b501e4ce9d54c219b/tests/integration_tests/llms/test_gpt4all.py

khaledadrani · 2023-07-10T09:08:56Z

Hello again, I read in the contribution document that is it possible to add a jupyter notebook example https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md, however, I am unable to find any notebooks in the repository? Can someone tell me where should I put the example notebook? Thanks!

chrisedington · 2023-07-11T12:38:53Z

Surprised there isn't more community presence on this issue because GPT4ALL is so popular, would be great to see it merged. Thanks for the efforts @khaledadrani

khaledadrani · 2023-07-11T13:46:08Z

I think someone made an implementation already, but did not report to this issue? Can anyone confirm that it works? (it happened while rebasing my fork, I found almost the same implementation)

diegovazquez · 2023-07-17T17:04:46Z

A workaround for using ConversationalRetrievalChain with llamaCpp is implemet _acall function. This is not tested extensively.

from langchain.llms import LlamaCpp
from typing import Any, Dict, List, Generator, Optional
from langchain.callbacks.manager import AsyncCallbackManagerForLLMRun

class LlamaCppAsync(LlamaCpp):
    async def _acall(
            self,
            prompt: str,
            stop: Optional[List[str]] = None,
            run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
            **kwargs: Any,
    ) -> str:
        """Asynchronous Call the Llama model and return the output.

        Args:
            prompt: The prompt to use for generation.
            stop: A list of strings to stop generation when encountered.

        Returns:
            The generated text.

        Example:
            .. code-block:: python

                from langchain.llms import LlamaCpp
                llm = LlamaCpp(model_path="/path/to/local/llama/model.bin")
                llm("This is a prompt.")
        """
        if self.streaming:
            # If streaming is enabled, we use the stream
            # method that yields as they are generated
            # and return the combined strings from the first choices's text:
            combined_text_output = ""
            stream = self.stream_async(prompt=prompt, stop=stop, run_manager=run_manager)

            async for token in stream:
                combined_text_output += token["choices"][0]["text"]
            return combined_text_output
        else:
            params = self._get_parameters(stop)
            params = {**params, **kwargs}
            result = self.client(prompt=prompt, **params)
            return result["choices"][0]["text"]

    async def stream_async(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[AsyncCallbackManagerForLLMRun] = None,
    ) -> Generator[Dict, None, None]:
        """Yields results objects as they are generated in real time.

        BETA: this is a beta feature while we figure out the right abstraction.
        Once that happens, this interface could change.

        It also calls the callback manager's on_llm_new_token event with
        similar parameters to the OpenAI LLM class method of the same name.

        Args:
            prompt: The prompts to pass into the model.
            stop: Optional list of stop words to use when generating.

        Returns:
            A generator representing the stream of tokens being generated.

        Yields:
            A dictionary like objects containing a string token and metadata.
            See llama-cpp-python docs and below for more.

        Example:
            .. code-block:: python

                from langchain.llms import LlamaCpp
                llm = LlamaCpp(
                    model_path="/path/to/local/model.bin",
                    temperature = 0.5
                )
                for chunk in llm.stream("Ask 'Hi, how are you?' like a pirate:'",
                        stop=["'","\n"]):
                    result = chunk["choices"][0]
                    print(result["text"], end='', flush=True)

        """
        params = self._get_parameters(stop)
        result = self.client(prompt=prompt, stream=True, **params)
        for chunk in result:
            token = chunk["choices"][0]["text"]
            log_probs = chunk["choices"][0].get("logprobs", None)
            if run_manager:
                await run_manager.on_llm_new_token(
                    token=token, verbose=self.verbose, log_probs=log_probs
                )
            yield chunk

Then change

    question_gen_llm = LlamaCpp(
        model_path=LLM_MODEL_PATH,
        n_ctx=2048,
        streaming=True,
        callback_manager=question_manager,
        verbose=True,
    )

To

    question_gen_llm = LlamaCppAsync(
        model_path=LLM_MODEL_PATH,
        n_ctx=2048,
        streaming=True,
        callback_manager=question_manager,
        verbose=True,
    )

VladPrytula · 2023-07-19T09:13:01Z

have the same issue. Any updates on this?

khaledadrani · 2023-07-22T11:22:42Z

@VladPrytula is it not fixed for GPT4ALL? Was I mistaken in my previous comment?

VladPrytula · 2023-07-22T11:43:34Z

It is not fixed , I have added async manually to the class - and it kind of works , but I don’t like the result , it is effectively in compliance with async interface , but not async per se

Mabenan · 2023-07-22T13:10:11Z

For GPT4All you can use this class in your projects

from langchain.llms import GPT4All
from functools import partial
from typing import Any, List
from langchain.callbacks.manager import AsyncCallbackManagerForLLMRun
from langchain.llms.utils import enforce_stop_tokens

class AGPT4All(GPT4All):
    async def _acall(self, prompt: str, stop: List[str] | None = None, run_manager: AsyncCallbackManagerForLLMRun | None = None, **kwargs: Any) -> str:
        text_callback = None
        if run_manager:
            text_callback = partial(run_manager.on_llm_new_token, verbose=self.verbose)
        text = ""
        params = {**self._default_params(), **kwargs}
        for token in self.client.generate(prompt,streaming = True, **params):
            if text_callback:
                await text_callback(token)
            text += token
        if stop is not None:
            text = enforce_stop_tokens(text, stop)
        return text

auxon · 2023-08-04T20:56:00Z

@Mabenan I kind of got this working, but I am not sure I am using AsyncCallbackManagerForLLMRun correctly. Do you have an example of how to instantiate it properly to use AGPT4All?

Mabenan · 2023-08-04T21:19:08Z

@auxon I use it the following way

history = ConversationBufferMemory(ai_prefix="### Assistant", human_prefix="### Human")
template = """
            {history}
            ### Human: {input}
            ### Assistant:"""
prompt = PromptTemplate(template=template, input_variables=["history","input"])
streaminCallback = AsyncIteratorCallbackHandler()
            llmObj = AGPT4All(model = modelpath, verbose=False, allow_download=True,
                temp = properties["temp"],
                top_k = properties["top_k"],
                top_p = properties["top_p"],
                repeat_penalty = properties["repeat_penalty"],
                repeat_last_n = properties["repeat_last_n"],
                n_predict = properties["n_predict"],
                n_batch = properties["n_batch"],
                callbacks=[streaminCallback],
                n_threads= threads,
                streaming = True)
history.load_memory_variables({})
chain = ConversationChain(prompt=prompt, llm=llmObj, memory=history)
asyncio.create_task(chain.apredict(input=data["prompt"]))
start = datetime.now()
tokenCount = 0
async for respEntry in streaminCallback.aiter():
    now = datetime.now()
    diff = now - start
    tokenCount += 1
    print("Tokens per Second: " + str(tokenCount / diff.total_seconds()))
    compResp = compResp + respEntry
    yield respEntry
´´´

auxon · 2023-08-04T21:27:09Z

@Mabenan Thanks!

varshasathya · 2023-08-22T09:37:22Z

Hi
When I try to 'arun' the below chain using Sagemaker Endpoint, I'm receiving the following error.
chain = LLMChain(llm = SagemakerEndpoint(endpoint_name=llm_ENDPOINT,region_name=REGION_NAME, content_handler=content_handler), prompt=prompt)

NotImplementedError: Async generation not implemented for this LLM.

Is the Async calls available for Sagemaker Endpoint? If not, is there a workaround for the same.

Thanks in advance.

dosubot · 2023-11-21T16:03:23Z

Hi, @PiotrPmr! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you are experiencing an error with async calls in the GPT4All chat integration. It seems that other users have also reported the same issue and are waiting for a fix. User @khaledadrani has mentioned that they have made a fix and will be making a pull request soon. Additionally, user @Mabenan has provided a workaround for using GPT4All with async calls.

Before we close this issue, we wanted to check if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain community! Let us know if you have any further questions or concerns.

khaledadrani · 2023-11-28T12:33:39Z

Hello, it has been a long time. Was this fixed or not? Otherwise I will be returning to it ASAP?

dosubot · 2023-11-28T12:35:52Z

@baskaryan Could you please help @PiotrPmr with the issue they mentioned? They are still experiencing an error with async calls in the GPT4All chat integration and would appreciate your assistance. Thank you!

khaledadrani · 2023-12-09T18:07:37Z

#14495 as promised, here is the fix for this issue. Need reviewing obviously.

charlod · 2024-03-01T08:41:40Z

Hello, are there any updates on this? I see that #14495 is still open. Thank you!

khaledadrani · 2024-03-01T09:19:14Z

@charlod is the current implementation of ainvoke or acall (going to be deprecated) not working for you?

model_response = await qa.ainvoke(
               input={"query": "what is Python?"}
           )

charlod · 2024-03-07T16:03:33Z

I've just tested with the current ainvoke implementation, and it works. Thanks again!

baskaryan · 2024-03-29T23:22:46Z

I've just tested with the current ainvoke implementation, and it works. Thanks again!

closing!

khaledadrani mentioned this issue Jun 20, 2023

Async support for the generate method nomic-ai/gpt4all#752

Open

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 21, 2023

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Nov 28, 2023

khaledadrani mentioned this issue Dec 9, 2023

feature: implement _acall method for GPT4All(LLM) #14495

Closed

baskaryan closed this as completed Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT4All chat error with async calls #5210

GPT4All chat error with async calls #5210

PiotrPmr commented May 24, 2023 •

edited

Loading

poojatambe commented May 31, 2023

ncfx commented Jun 10, 2023

khaledadrani commented Jun 15, 2023

khaledadrani commented Jun 16, 2023

suhailmalik07 commented Jun 19, 2023

kesavazt commented Jun 19, 2023 •

edited

Loading

khaledadrani commented Jun 19, 2023 •

edited

Loading

khaledadrani commented Jul 10, 2023

chrisedington commented Jul 11, 2023 •

edited

Loading

khaledadrani commented Jul 11, 2023

diegovazquez commented Jul 17, 2023

VladPrytula commented Jul 19, 2023

khaledadrani commented Jul 22, 2023

VladPrytula commented Jul 22, 2023

Mabenan commented Jul 22, 2023 •

edited

Loading

auxon commented Aug 4, 2023

Mabenan commented Aug 4, 2023

auxon commented Aug 4, 2023

varshasathya commented Aug 22, 2023 •

edited

Loading

dosubot bot commented Nov 21, 2023

khaledadrani commented Nov 28, 2023

dosubot bot commented Nov 28, 2023

khaledadrani commented Dec 9, 2023

charlod commented Mar 1, 2024

khaledadrani commented Mar 1, 2024

charlod commented Mar 7, 2024

baskaryan commented Mar 29, 2024

GPT4All chat error with async calls #5210

GPT4All chat error with async calls #5210

Comments

PiotrPmr commented May 24, 2023 • edited Loading

Suggestion:

poojatambe commented May 31, 2023

ncfx commented Jun 10, 2023

khaledadrani commented Jun 15, 2023

khaledadrani commented Jun 16, 2023

suhailmalik07 commented Jun 19, 2023

kesavazt commented Jun 19, 2023 • edited Loading

khaledadrani commented Jun 19, 2023 • edited Loading

khaledadrani commented Jul 10, 2023

chrisedington commented Jul 11, 2023 • edited Loading

khaledadrani commented Jul 11, 2023

diegovazquez commented Jul 17, 2023

VladPrytula commented Jul 19, 2023

khaledadrani commented Jul 22, 2023

VladPrytula commented Jul 22, 2023

Mabenan commented Jul 22, 2023 • edited Loading

auxon commented Aug 4, 2023

Mabenan commented Aug 4, 2023

auxon commented Aug 4, 2023

varshasathya commented Aug 22, 2023 • edited Loading

dosubot bot commented Nov 21, 2023

khaledadrani commented Nov 28, 2023

dosubot bot commented Nov 28, 2023

khaledadrani commented Dec 9, 2023

charlod commented Mar 1, 2024

khaledadrani commented Mar 1, 2024

charlod commented Mar 7, 2024

baskaryan commented Mar 29, 2024

PiotrPmr commented May 24, 2023 •

edited

Loading

kesavazt commented Jun 19, 2023 •

edited

Loading

khaledadrani commented Jun 19, 2023 •

edited

Loading

chrisedington commented Jul 11, 2023 •

edited

Loading

Mabenan commented Jul 22, 2023 •

edited

Loading

varshasathya commented Aug 22, 2023 •

edited

Loading