Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913

eyurtsev · 2024-02-02T00:39:51Z

Privileged issue

I am a LangChain maintainer, or was asked directly by a LangChain maintainer to create an issue here.

Issue Content

Goal

Improve streaming in LangChain for chat models / language models.

Background

Many chat and language models implement a streaming mode in which they stream tokens one at a time.

LangChain has a callback system that is useful for logging and important APIs like "stream", "stream_log" and "stream_events".

Currently many models incorrectly yield the token (chat generation) before invoking the callback.

Acceptance criteria

For a PR to be accepted and merged, the PR should:

Fix the code to make sure that the callback is called before the token is yielded
Link to this issue
Change ONE and only ONE model
FIx sync and async implementation if both are defined

Example PR

Here is an example PR that shows the fix for the OpenAI chat model:

#16909

Find models that need to be fixed

The easiest way to find places in the code that may need to be fixed is using git grep

git grep -C 5  "\.on_llm_new"

Examine the output to determine whether the callback is called before the token is yielded (correct) or after (needs to be fixed).

dosubot · 2024-02-02T00:40:04Z

🤖

It's great to see your interest in improving the streaming functionality in LangChain for chat models and language models. Your proposed goal and acceptance criteria for the PR are clear and well-defined. If you have a solution in mind and are planning to work on a fix, we encourage you to go ahead and create a pull request. Your contribution will be valuable in enhancing the LangChain platform. Thank you for your willingness to improve the codebase!

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

…16986) - **Description:** Invoke callback prior to yielding token in stream and astream methods for ChatMistralAI. - **Issue:** #16913

…angchain-ai#16986) - **Description:** Invoke callback prior to yielding token in stream and astream methods for ChatMistralAI. - **Issue:** langchain-ai#16913

@baskaryan

### This pull request makes the following changes: * Fixed issue #16913 Fixed the google gen ai chat_models.py code to make sure that the callback is called before the token is yielded  --------- Co-authored-by: Erick Friis <erick@langchain.dev>

**Description:** Invoke callback prior to yielding token in stream method for watsonx. **Issue:** [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](#16913) Co-authored-by: Robby <h0rv@users.noreply.github.com>

**Description:** Invoke callback prior to yielding token in stream method for Ollama. **Issue:** [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](#16913) Co-authored-by: Robby <h0rv@users.noreply.github.com>

@baskaryan

### This pull request makes the following changes: * Fixed issue langchain-ai#16913 Fixed the google gen ai chat_models.py code to make sure that the callback is called before the token is yielded  --------- Co-authored-by: Erick Friis <erick@langchain.dev>

…ai#17346) **Description:** Invoke callback prior to yielding token in stream method for watsonx. **Issue:** [Callback for on_llm_new_token should be invoked before the token is yielded by the model langchain-ai#16913](langchain-ai#16913) Co-authored-by: Robby <h0rv@users.noreply.github.com>

…ai#17348) **Description:** Invoke callback prior to yielding token in stream method for Ollama. **Issue:** [Callback for on_llm_new_token should be invoked before the token is yielded by the model langchain-ai#16913](langchain-ai#16913) Co-authored-by: Robby <h0rv@users.noreply.github.com>

#17625) **Description**: Invoke callback prior to yielding token in stream method for watsonx. **Issue**: #16913

@baskaryan

### This pull request makes the following changes: * Fixed issue langchain-ai#16913 Fixed the google gen ai chat_models.py code to make sure that the callback is called before the token is yielded  --------- Co-authored-by: Erick Friis <erick@langchain.dev>

…ai#17346) **Description:** Invoke callback prior to yielding token in stream method for watsonx. **Issue:** [Callback for on_llm_new_token should be invoked before the token is yielded by the model langchain-ai#16913](langchain-ai#16913) Co-authored-by: Robby <h0rv@users.noreply.github.com>

…ai#17348) **Description:** Invoke callback prior to yielding token in stream method for Ollama. **Issue:** [Callback for on_llm_new_token should be invoked before the token is yielded by the model langchain-ai#16913](langchain-ai#16913) Co-authored-by: Robby <h0rv@users.noreply.github.com>

langchain-ai#17625) **Description**: Invoke callback prior to yielding token in stream method for watsonx. **Issue**: langchain-ai#16913

…gingFaceEndpoint (#20366) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for HuggingFaceEndpoint - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in community HuggingFaceEndpoint - **Issue:** #16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>

…mafile (#20365) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for Llamafile - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in community llamafile.py - **Issue:** #16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.

…fra] (#20427) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for [DeepInfra] - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in [DeepInfra] - **Issue:** #16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.

…gingFaceEndpoint (langchain-ai#20366) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for HuggingFaceEndpoint - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in community HuggingFaceEndpoint - **Issue:** langchain-ai#16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>

…mafile (langchain-ai#20365) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for Llamafile - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in community llamafile.py - **Issue:** langchain-ai#16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.

…fra] (langchain-ai#20427) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for [DeepInfra] - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in [DeepInfra] - **Issue:** langchain-ai#16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.

…gFaceTextGenInference] (#20426) …gFaceTextGenInference) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for [HuggingFaceTextGenInference] - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in [HuggingFaceTextGenInference] - **Issue:** #16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>

…gingFaceEndpoint (langchain-ai#20366) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for HuggingFaceEndpoint - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in community HuggingFaceEndpoint - **Issue:** langchain-ai#16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>

…mafile (langchain-ai#20365) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for Llamafile - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in community llamafile.py - **Issue:** langchain-ai#16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.

…fra] (langchain-ai#20427) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for [DeepInfra] - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in [DeepInfra] - **Issue:** langchain-ai#16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.

…18629) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ & _astream_ methods in llms/ollama. - Issue: #16913 - Dependencies: None

…18628) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/openai. - Issue: #16913 - Dependencies: None

…dpoint) (#18627) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/pai_eas_endpoint. - Issue: #16913 - Dependencies: None

…#18626) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/replicate. - Issue: #16913 - Dependencies: None

…18625) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/sparkllm. - Issue: #16913 - Dependencies: None

…off_pro) (#18624) ## PR title community[patch]: Invoke callback prior to yielding token ## PR message - Description: Invoke callback prior to yielding token in _stream_ method in llms/titan_takeoff_pro. - Issue: #16913 - Dependencies: None

…#19392) **Description:** Invoke callback prior to yielding token for llama.cpp **Issue:** [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](#16913) **Dependencies:** None

…#19388) **Description:** Invoke callback prior to yielding token for Fireworks **Issue:** [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](#16913) **Dependencies:** None

…19389) **Description:** Invoke callback prior to yielding token for BaseOpenAI & OpenAIChat **Issue:** [Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913](#16913) **Dependencies:** None

…gingFaceEndpoint (#20366) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for HuggingFaceEndpoint - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in community HuggingFaceEndpoint - **Issue:** #16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>

…mafile (#20365) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for Llamafile - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in community llamafile.py - **Issue:** #16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.

…fra] (#20427) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for [DeepInfra] - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in [DeepInfra] - **Issue:** #16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.

…gFaceTextGenInference] (#20426) …gFaceTextGenInference) - [x] **PR title**: community[patch]: Invoke callback prior to yielding token fix for [HuggingFaceTextGenInference] - [x] **PR message**: - **Description:** Invoke callback prior to yielding token in stream method in [HuggingFaceTextGenInference] - **Issue:** #16913 - **Dependencies:** None - **Twitter handle:** @bolun_zhang If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com>

eyurtsev added help wanted Good issue for contributors good first issue Good for newcomers labels Feb 2, 2024

dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: openai Primarily related to OpenAI integrations labels Feb 2, 2024

mlFanatic mentioned this issue Feb 2, 2024

Update chat_models.py #16924

Merged

ccurme mentioned this issue Feb 3, 2024

langchain_mistralai[patch]: Invoke callback prior to yielding token #16986

Merged

efriis pushed a commit that referenced this issue Feb 4, 2024

langchain_mistralai[patch]: Invoke callback prior to yielding token (#…

0826d87

…16986) - **Description:** Invoke callback prior to yielding token in stream and astream methods for ChatMistralAI. - **Issue:** #16913

This was referenced Feb 9, 2024

community watsonx[patch]: Invoke callback prior to yielding token #17346

Merged

community ollama[patch]: Invoke callback prior to yielding token #17348

Merged

MateuszOssGit mentioned this issue Feb 16, 2024

watsonx[patch]: Invoke callback prior to yielding token when streaming #17625

Merged

eyurtsev pushed a commit that referenced this issue Feb 16, 2024

watsonx[patch]: Invoke callback prior to yielding token when streaming (

e25b722

#17625) **Description**: Invoke callback prior to yielding token in stream method for watsonx. **Issue**: #16913

baskaryan mentioned this issue Feb 23, 2024

community[patch]: callback before yield for _stream/_astream #17907

Merged

This was referenced Feb 27, 2024

langchain_openai[patch]: Invoke callback prior to yielding token #18211

Closed

langchain_nvidia_ai_endpoints[patch]: Invoke callback prior to yielding #18214

Closed

This was referenced Feb 28, 2024

langchain_openai[patch]: Invoke callback prior to yielding token #18269

Merged

langchain_nvidia_ai_endpoints[patch]: Invoke callback prior to yielding token #18271

Merged

langchain_groq[patch]: Invoke callback prior to yielding token #18272

Merged

This was referenced Apr 12, 2024

community[patch]: Invoke callback prior to yielding token fix for Llamafile #20365

Merged

community[patch]: Invoke callback prior to yielding token fix for HuggingFaceEndpoint #20366

Merged

This was referenced Apr 14, 2024

community[patch]: Invoke callback prior to yielding token fix [HuggingFaceTextGenInference] #20426

Merged

community[patch]: Invoke callback prior to yielding token fix [DeepInfra] #20427

Merged

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 3, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 10, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913

Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913

eyurtsev commented Feb 2, 2024

dosubot bot commented Feb 2, 2024 •

edited

Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913

Callback for on_llm_new_token should be invoked before the token is yielded by the model #16913

Comments

eyurtsev commented Feb 2, 2024

Privileged issue

Issue Content

Goal

Background

Acceptance criteria

Example PR

Find models that need to be fixed

dosubot bot commented Feb 2, 2024 • edited

dosubot bot commented Feb 2, 2024 •

edited