Skip to content

Replacement Character(�) appears in multibyte text output from Google VertexAI Web #6501

@pokutuna

Description

@pokutuna

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Make the model output long texts containing multibyte characters as a stream.

import { VertexAI } from "@langchain/google-vertexai-web";

const langchainModel = new VertexAI({
  model: "gemini-1.5-pro-001",
  location: "us-central1",
});

// EN: List as many Japanese proverbs as possible.
const prompt = "日本のことわざをできるだけたくさん挙げて";

const stream = await langchainModel.stream(prompt);
const reader = stream.getReader();
let buf = "";
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buf += value;
}
console.log(buf);

This code can be executed by creating a service account key from the Google Cloud Console and running it with the following command:
$ GOOGLE_WEB_CREDENTIALS=$(cat ./key.json) npx tsx sample.ts

Error Message and Stack Trace (if applicable)

(No errors or stack traces occur)

Output Example: Includes Replacement Characters (�)

## ���本の諺 (ことわざ)  -  できるだけたくさん!

**一般的な知������������**

* 石の上にも三年 (いしのうえにもさんねん) - Perseverance will pay off.
* 七転び八起き (ななころびやおき) - Fall seven times, stand up eight.
* 継続は力なり (けいぞくはちからなり) -  Persistence is power.
* 急がば回れ (い��がばまわれ) - Haste makes waste.
* 井の中の蛙大海を知らず (いのなかのかわずたいかいをしらず) - A frog in a well knows nothing of the great ocean.
* 良���は���に苦し (りょうやくはくちにくい) -  Good medicine tastes bitter.
* 猿も木から落ちる (さるもきからおちる) - Even monkeys fall from trees.
* 転石苔を生ぜず (てんせきこけをしょうぜず) - A rolling stone gathers no moss.
* 覆水盆に返らず (ふくすいぼんにかえらず) - Spilled water will not return to the tray.
* 後生の祭り (ごしょうの�����り) - Too late for regrets.
* 習うより慣れろ (ならうよりなれろ) -  Experience is the best teacher.
* 鉄は熱いうちに打て (てつはあついうちにうて) - Strike while the iron is hot.

...

Description

This is the same issue as #5285.
While #5285 is about @langchain/google-vertexai, this issue also occurs in @langchain/google-vertexai-web.

The problem occurs when a stream chunk is cut in the middle of a multibyte character.
For detailed reasons, please refer to #5285.

I will submit a Pull Request with the fix shortly.

System Info

  • macOS
  • node v20.12.2
  • langchain versions
    $ npm list --depth=1 | grep langchain
    ├─┬ @langchain/google-vertexai-web@0.0.25
    │ ├── @langchain/core@0.2.23
    │ └── @langchain/google-webauth@0.0.25
    ├─┬ @langchain/google-vertexai@0.0.25
    │ ├── @langchain/core@0.2.23 deduped
    │ └── @langchain/google-gauth@0.0.25
    ├─┬ langchain@0.2.15
    │ ├── UNMET OPTIONAL DEPENDENCY @langchain/anthropic@*
    │ ├── UNMET OPTIONAL DEPENDENCY @langchain/aws@*
    │ ├── UNMET OPTIONAL DEPENDENCY @langchain/cohere@*
    │ ├── UNMET OPTIONAL DEPENDENCY @langchain/community@*
    │ ├── @langchain/core@0.2.23 deduped
    │ ├── UNMET OPTIONAL DEPENDENCY @langchain/google-genai@*
    │ ├── @langchain/google-vertexai@0.0.25 deduped
    │ ├── UNMET OPTIONAL DEPENDENCY @langchain/groq@*
    │ ├── UNMET OPTIONAL DEPENDENCY @langchain/mistralai@*
    │ ├── UNMET OPTIONAL DEPENDENCY @langchain/ollama@*
    │ ├── @langchain/openai@0.2.6
    │ ├── @langchain/textsplitters@0.0.3
    

Metadata

Metadata

Assignees

No one assigned

    Labels

    auto:bugRelated to a bug, vulnerability, unexpected error with an existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions