Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] LanguageModelQueryRouter.parse(String choices) throws NumberFormatException when no results are found #588

Closed
stephanj opened this issue Feb 1, 2024 · 7 comments · Fixed by #593
Labels
bug Something isn't working

Comments

@stephanj
Copy link

stephanj commented Feb 1, 2024

Please provide as much details as possible, this will help us to deliver a fix as soon as possible.
Thank you!

Describe the bug
Using LanguageModelQueryRouter, when it gets a question which it can't answer it's trying to parse a String text value to Int

    protected Collection<ContentRetriever> parse(String choices) {
        return stream(choices.split(","))
                .map(String::trim)
                .map(Integer::parseInt)  <======================
                .map(idToRetriever::get)
                .collect(toList());
    }

Either filter strings that are only digits, or some other checks "higher up"

protected Collection<ContentRetriever> parse(String choices) {
    return Arrays.stream(choices.split(","))
            .map(String::trim)
            .filter(s -> s.matches("\\d+")) // Filter strings that are purely digits
            .map(Integer::parseInt)
            .map(idToRetriever::get)
            .collect(Collectors.toList());
}

OR

protected Collection<Optional<ContentRetriever>> parse(String choices) {
    return Arrays.stream(choices.split(","))
            .map(String::trim)
            .map(s -> {
                try {
                    int id = Integer.parseInt(s);
                    return Optional.ofNullable(idToRetriever.get(id));
                } catch (NumberFormatException e) {
                    return Optional.<ContentRetriever>empty();
                }
            })
            .collect(Collectors.toList());
}

Setup used:

ScoringModel scoringModel = CohereScoringModel.withApiKey(System.getenv("COHERE_API_KEY"))

ContentAggregator contentAggregator = ReRankingContentAggregator.builder()
    .scoringModel(scoringModel)
    .minScore(0.8) 
    .build();

QueryRouter queryRouter = new LanguageModelQueryRouter(chatModel, retriever);
            RetrievalAugmentor retrievalAugmentor = DefaultRetrievalAugmentor.builder()
                .contentAggregator(contentAggregator)
                .queryRouter(queryRouter)
                .build();

Log and Stack trace

java.util.concurrent.CompletionException: java.lang.NumberFormatException: For input string: "There is not enough information provided to determine the most suitable data source(s) for the user query "Question which has no answer"."
	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320) ~[na:na]
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1770) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]
Caused by: java.lang.NumberFormatException: For input string: "There is not enough information provided to determine the most suitable data source(s) for the user query "What about breakfast?"."
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67) ~[na:na]
	at java.base/java.lang.Integer.parseInt(Integer.java:661) ~[na:na]
	at java.base/java.lang.Integer.parseInt(Integer.java:777) ~[na:na]
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[na:na]
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[na:na]
	at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:1024) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[na:na]
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[na:na]
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:na]
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[na:na]
	at dev.langchain4j.rag.query.router.LanguageModelQueryRouter.parse(LanguageModelQueryRouter.java:101) ~[langchain4j-core-0.26.1.jar:na]
	at dev.langchain4j.rag.query.router.LanguageModelQueryRouter.route(LanguageModelQueryRouter.java:86) ~[langchain4j-core-0.26.1.jar:na]
	at dev.langchain4j.rag.DefaultRetrievalAugmentor.lambda$null$0(DefaultRetrievalAugmentor.java:135) ~[langchain4j-core-0.26.1.jar:na]
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[na:na]
	... 3 common frames omitted

To Reproduce
Probably ask a question where the AI Service can't find an answer

Expected behavior
Something like "Sorry, I can't give you an answer to that question"

Please complete the following information:

  • LangChain4j version: e.g. 0.26.1
  • Java version: 21
  • Spring Boot version (if applicable): 3.2.1

Additional context
Add any other context about the problem here.

@stephanj stephanj added the bug Something isn't working label Feb 1, 2024
@langchain4j
Copy link
Owner

@stephanj thank you so much for reporting! Going to fix this ASAP

@langchain4j
Copy link
Owner

langchain4j commented Feb 2, 2024

@stephanj in this case, when the model cannot decide, I guess there should be a configurable fallback behavior:

  • route to all available retrievers
  • route to no retrievers - RAG flow stops, user message just goes to the LLM without extra context
  • retry with threats like "if you do not provide a number, my boss will fire me" 😆 <- probably a bad idea
  • force the LLM to choose one/more options via tools/json_mode (but there is no APIs for this at the moment, I planned to address this option later) <- not sure it is a good idea

Any other ideas?
I guess "route to all available retrievers" is a good default behavior, but only when there is a reranker configured. Even if it will retrieve crap, it can be filtered out by reranker. But if no reranker, probably better to not route this query to any retriever at all and just skip RAG. WDYT?

@stephanj
Copy link
Author

stephanj commented Feb 2, 2024

Maybe the different listed strategies should be configurable for the developer?

@langchain4j
Copy link
Owner

@stephanj yes, of couse! I was just wondering which one should be a default one (without any specific configuration)

@stephanj
Copy link
Author

stephanj commented Feb 2, 2024

Probably "Route all" because I would prefer an answer instead of nothing.

@langchain4j
Copy link
Owner

Hmm. By "not route" I mean just not retrieve anything, pass this query directly to the LLM without any additional context.
I guess if the LLM cannot decide to which retriever to route the given query, chances are that it will be better to skip retrieval alltogether instead of retrieving garbage and stuffing it into the prompt which will lead to increased cost and latency + higher chance of hallucinations? 🤔

@langchain4j
Copy link
Owner

@stephanj FYI: #593

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants