Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

on_llm_new_token event broken in langchain_openai when streaming #19185

Closed
5 tasks done
theobjectivedad opened this issue Mar 16, 2024 · 2 comments
Closed
5 tasks done
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: models Related to LLMs or chat model modules 🔌: openai Primarily related to OpenAI integrations

Comments

@theobjectivedad
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Current implementation:

    def _stream(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[GenerationChunk]:
        params = {**self._invocation_params, **kwargs, "stream": True}
        self.get_sub_prompts(params, [prompt], stop)  # this mutates params
        for stream_resp in self.client.create(prompt=prompt, **params):
            if not isinstance(stream_resp, dict):
                stream_resp = stream_resp.model_dump()
            chunk = _stream_response_to_generation_chunk(stream_resp)
            yield chunk           
            if run_manager:
                run_manager.on_llm_new_token(
                    chunk.text,
                    chunk=chunk,
                    verbose=self.verbose,
                    logprobs=(
                        chunk.generation_info["logprobs"]
                        if chunk.generation_info
                        else None
                    ),
                )

I believe this would correct and produce the intended behavior:

    def _stream(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
        **kwargs: Any,
    ) -> Iterator[GenerationChunk]:
        params = {**self._invocation_params, **kwargs, "stream": True}
        self.get_sub_prompts(params, [prompt], stop)  # this mutates params
        for stream_resp in self.client.create(prompt=prompt, **params):
            if not isinstance(stream_resp, dict):
                stream_resp = stream_resp.model_dump()
            chunk = _stream_response_to_generation_chunk(stream_resp)
            
            if run_manager:
                run_manager.on_llm_new_token(
                    chunk.text,
                    chunk=chunk,
                    verbose=self.verbose,
                    logprobs=(
                        chunk.generation_info["logprobs"]
                        if chunk.generation_info
                        else None
                    ),
                )
            yield chunk

Error Message and Stack Trace (if applicable)

No response

Description

When streaming via langchain_openai.llms.base.BaseOpenAI._stream the yield appears before triggering the run manager event. This makes it impossible to invoke on_llm_new_token methods in a callback until the full response is received.

System Info

System Information
------------------
> OS:  Linux
> OS Version:  #21~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb  9 13:32:52 UTC 2
> Python Version:  3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]

Package Information
-------------------
> langchain_core: 0.1.30
> langchain: 0.1.11
> langchain_community: 0.0.27
> langsmith: 0.1.23
> langchain_openai: 0.0.8
> langchain_text_splitters: 0.0.1

Packages not installed (Not Necessarily a Problem)
--------------------------------------------------
The following packages were not found:

> langgraph
> langserve
@dosubot dosubot bot added Ɑ: models Related to LLMs or chat model modules 🔌: openai Primarily related to OpenAI integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Mar 16, 2024
@theobjectivedad
Copy link
Author

theobjectivedad commented Mar 16, 2024

Additionally, it would be good to pass stream_resp to the callback as well. This would allow clients to differentiate between multiple responses when n > 1. For example:

def _stream(
    self,
    prompt: str,
    stop: Optional[List[str]] = None,
    run_manager: Optional[CallbackManagerForLLMRun] = None,
    **kwargs: Any,
) -> Iterator[GenerationChunk]:
    params = {**self._invocation_params, **kwargs, "stream": True}
    self.get_sub_prompts(params, [prompt], stop)  # this mutates params
    for stream_resp in self.client.create(prompt=prompt, **params):
        if not isinstance(stream_resp, dict):
            stream_resp = stream_resp.model_dump()

        chunk = _stream_response_to_generation_chunk(stream_resp)

        if run_manager:
            run_manager.on_llm_new_token(
                chunk.text,
                chunk=chunk,
                verbose=self.verbose,
                logprobs=(
                    chunk.generation_info["logprobs"] if chunk.generation_info else None
                ),
                stream_resp=stream_resp,
            )

        yield chunk

I've verified on the OpenAI streaming response that it supports multiple generations (notice the index field):

data: {"id":"chatcmpl-93M3I2vZ1GbBKwQ4Pl2z71bxLKT67","object":"chat.completion.chunk","created":1710586780,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f2ebda25a","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":{"content":[]},"finish_reason":null}]}

data: {"id":"chatcmpl-93M3I2vZ1GbBKwQ4Pl2z71bxLKT67","object":"chat.completion.chunk","created":1710586780,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f2ebda25a","choices":[{"index":0,"delta":{"content":"Hello"},"logprobs":{"content":[{"token":"Hello","logprob":-0.0062218173,"bytes":[72,101,108,108,111],"top_logprobs":[]}]},"finish_reason":null}]}

data: {"id":"chatcmpl-93M3I2vZ1GbBKwQ4Pl2z71bxLKT67","object":"chat.completion.chunk","created":1710586780,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f2ebda25a","choices":[{"index":1,"delta":{"role":"assistant","content":""},"logprobs":{"content":[]},"finish_reason":null}]}

data: {"id":"chatcmpl-93M3I2vZ1GbBKwQ4Pl2z71bxLKT67","object":"chat.completion.chunk","created":1710586780,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f2ebda25a","choices":[{"index":1,"delta":{"content":"Hello"},"logprobs":{"content":[{"token":"Hello","logprob":-0.0062218173,"bytes":[72,101,108,108,111],"top_logprobs":[]}]},"finish_reason":null}]}

@sepiatone
Copy link
Contributor

sepiatone commented Mar 23, 2024

the problem of yielding before calling the run manager has been fixed by #18269. (I don't know enough to comment on the other enhancement you proposed)

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 22, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 29, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature Ɑ: models Related to LLMs or chat model modules 🔌: openai Primarily related to OpenAI integrations
Projects
None yet
Development

No branches or pull requests

2 participants