Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Exception when running an OpenAI streaming model with complex tool parameters #601

Closed
Crokoking opened this issue Feb 4, 2024 · 4 comments · Fixed by #918
Closed
Labels
bug Something isn't working P2 High priority

Comments

@Crokoking
Copy link

Crokoking commented Feb 4, 2024

Describe the bug
When using the OpenAI streaming model with tool-parameters more complex than a string,
the token-estimation system throws an exception. This seems to be caused by the default JSON-parser being hard-coded to decode Map.class as Map<String, String> instead of the default GSON behavior.

Log and Stack trace

Exception in thread "main" java.util.concurrent.CompletionException: com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected a string but was BEGIN_ARRAY at line 1 column 10 path $.
	at java.base/java.util.concurrent.CompletableFuture.reportJoin(CompletableFuture.java:413)
	at java.base/java.util.concurrent.CompletableFuture.join(CompletableFuture.java:2118)
	at ca.codebuddy.demoapp.Main.main(Main.java:68)
Caused by: com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected a string but was BEGIN_ARRAY at line 1 column 10 path $.
	at com.google.gson.Gson.fromJson(Gson.java:1238)
	at com.google.gson.Gson.fromJson(Gson.java:1137)
	at com.google.gson.Gson.fromJson(Gson.java:1047)
	at com.google.gson.Gson.fromJson(Gson.java:1014)
	at dev.langchain4j.internal.GsonJsonCodec.fromJson(GsonJsonCodec.java:64)
	at dev.langchain4j.internal.Json.fromJson(Json.java:66)
	at dev.langchain4j.model.openai.OpenAiTokenizer.countArguments(OpenAiTokenizer.java:334)
	at dev.langchain4j.model.openai.OpenAiTokenizer.estimateTokenCountInToolExecutionRequests(OpenAiTokenizer.java:266)
	at dev.langchain4j.model.openai.OpenAiTokenizer.estimateTokenCountInForcefulToolExecutionRequest(OpenAiTokenizer.java:314)
	at dev.langchain4j.model.openai.OpenAiStreamingResponseBuilder.tokenUsage(OpenAiStreamingResponseBuilder.java:192)
	at dev.langchain4j.model.openai.OpenAiStreamingResponseBuilder.build(OpenAiStreamingResponseBuilder.java:167)
	at dev.langchain4j.model.openai.OpenAiStreamingChatModel.lambda$generate$2(OpenAiStreamingChatModel.java:158)
	at dev.ai4j.openai4j.StreamingRequestExecutor$2.onEvent(StreamingRequestExecutor.java:170)
	at okhttp3.internal.sse.RealEventSource.onEvent(RealEventSource.kt:101)
	at okhttp3.internal.sse.ServerSentEventReader.completeEvent(ServerSentEventReader.kt:108)
	at okhttp3.internal.sse.ServerSentEventReader.processNextEvent(ServerSentEventReader.kt:52)
	at okhttp3.internal.sse.RealEventSource.processResponse(RealEventSource.kt:75)
	at okhttp3.internal.sse.RealEventSource.onResponse(RealEventSource.kt:46)
	at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:519)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.IllegalStateException: Expected a string but was BEGIN_ARRAY at line 1 column 10 path $.
	at com.google.gson.stream.JsonReader.nextString(JsonReader.java:836)
	at com.google.gson.internal.bind.TypeAdapters$15.read(TypeAdapters.java:421)
	at com.google.gson.internal.bind.TypeAdapters$15.read(TypeAdapters.java:409)
	at com.google.gson.internal.bind.TypeAdapterRuntimeTypeWrapper.read(TypeAdapterRuntimeTypeWrapper.java:40)
	at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:186)
	at com.google.gson.internal.bind.MapTypeAdapterFactory$Adapter.read(MapTypeAdapterFactory.java:144)
	at com.google.gson.Gson.fromJson(Gson.java:1227)
	... 21 more

To Reproduce

public static void main(String[] args) {
        OpenAiStreamingChatModel chatModel = OpenAiStreamingChatModel.builder()
            .apiKey(System.getenv("OPENAI_API_KEY"))
            .modelName("gpt-3.5-turbo-0125")
            .build();

        Gson gson = new Gson();
        Map<String, Map<String, Object>> properties = gson.fromJson("""
            {
                "list": {
                    "type": "array",
                    "items": {
                      "type": "string"
                    },
                    "description": "The output list"
                }
            }
            """, Map.class);


        ToolParameters params = ToolParameters.builder()
            .required(Collections.singletonList("list"))
            .properties(properties).build();
        ToolSpecification spec = ToolSpecification.builder()
            .name("list")
            .description("List of US presidents")
            .parameters(params)
            .build();
        CompletableFuture<Response<AiMessage>> future = new CompletableFuture<>();

        List<ChatMessage> messages = Collections.singletonList(new UserMessage("List the US presidents"));
        chatModel.generate(messages, spec, new StreamingResponseHandler<>() {

            @Override
            public void onNext(String s) {
                System.out.println("Next: " + s);
            }

            @Override
            public void onError(Throwable throwable) {
                future.completeExceptionally(throwable);
            }

            @Override
            public void onComplete(Response<AiMessage> response) {
                future.complete(response);
            }
        });
        final Response<AiMessage> response = future.join();
        System.out.println(response.content().toolExecutionRequests().get(0).arguments());;
    }

Expected behavior
There should not be an exception. It works fine with the non-streaming model

Please complete the following information:

  • LangChain4j version: 0.26.1
  • Java version: 17
  • Spring Boot version (if applicable): none

Additional context
Replacing the tokenizer with one that uses base Gson fixed the issue for me but that is probably not the "proper" solution.

@Crokoking Crokoking added the bug Something isn't working label Feb 4, 2024
@langchain4j
Copy link
Owner

@Crokoking thanks a lot for reporting!

@cslcsl490
Copy link

did it be fixed?

langchain4j

@deepakn27
Copy link

deepakn27 commented Apr 5, 2024

can you help me in below code, I am getting same error, what should i change ?

How can i replace tokenizer ? (Replacing the tokenizer with one that uses base Gson fixed the issue for me but that is probably not the "proper" solution).

@bean
Tokenizer tokenizer() {
return new OpenAiTokenizer(MODEL_NAME);
}

@bean
CommandLineRunner ingestDocsForLangChain(
EmbeddingModel embeddingModel,
EmbeddingStore embeddingStore,
Tokenizer tokenizer,
ResourceLoader resourceLoader
) throws IOException {
return args -> {
Resource resource =
resourceLoader.getResource("classpath:service.txt");
var service = loadDocument(resource.getFile().toPath(), new TextDocumentParser());

        DocumentSplitter documentSplitter = DocumentSplitters.recursive(200, 0,
                tokenizer);

        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .documentSplitter(documentSplitter)
                .embeddingModel(embeddingModel)
                .embeddingStore(embeddingStore)
                .build();
        ingestor.ingest(List.of(service));
        }

}

@Crokoking
Copy link
Author

I made a copy of the OpenAiTokenizer class, added a gson field, and replaced the two calls to Json.fromJson() with calls to gson.fromJson(). Then i just set that new class as tokenizer when creating my ChatModel

@langchain4j langchain4j added the P2 High priority label Apr 10, 2024
langchain4j added a commit that referenced this issue Apr 16, 2024
… JSON (#918)

## Context
Fixes #601

## Change
Do not restrict Map key/value types when deserializing from JSON

## Checklist
Before submitting this PR, please check the following points:
- [X] I have added unit and integration tests for my change
- [X] All unit and integration tests in the module I have added/changed
are green
- [X] All unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules are green
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
- [ ] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
(only when a new module is added)

## Checklist for adding new embedding store integration
- [ ] I have added a {NameOfIntegration}EmbeddingStoreIT that extends
from either EmbeddingStoreIT or EmbeddingStoreWithFilteringIT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2 High priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants