Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] DocumentSplitter‘s maxOverlapSize parameter sometimes fails #460

Closed
xbzhang1994 opened this issue Jan 8, 2024 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@xbzhang1994
Copy link

xbzhang1994 commented Jan 8, 2024

DocumentSplitter‘s maxOverlapSize parameter sometimes fails .
Look at this method: dev.langchain4j.data.document.splitter.HierarchicalDocumentSplitter#overlapFrom

private String overlapFrom(String segmentText) {
        if (maxOverlapSize == 0) {
            return "";
        }

        SegmentBuilder overlapBuilder = new SegmentBuilder(maxOverlapSize, this::sizeOf, joinDelimiter());

        String[] sentences = new DocumentBySentenceSplitter(1, 0, null, null).split(segmentText);
        for (int i = sentences.length - 1; i >= 0; i--) {
            String part = sentences[i];
            if (overlapBuilder.hasSpaceFor(part)) {
                overlapBuilder.prepend(part);
            } else {
                return overlapBuilder.build();
            }
        }

        return "";
    }```

Should it be returned like this?

```private String overlapFrom(String segmentText) {
        if (maxOverlapSize == 0) {
            return "";
        }

        SegmentBuilder overlapBuilder = new SegmentBuilder(maxOverlapSize, this::sizeOf, joinDelimiter());

        String[] sentences = new DocumentBySentenceSplitter(1, 0, null, null).split(segmentText);
        for (int i = sentences.length - 1; i >= 0; i--) {
            String part = sentences[i];
            if (overlapBuilder.hasSpaceFor(part)) {
                overlapBuilder.prepend(part);
            } else {
                return overlapBuilder.build();
            }
        }

        **return return overlapBuilder.build();**
    }```

- LangChain4j version: e.g. 0.24.0

@xbzhang1994 xbzhang1994 added the bug Something isn't working label Jan 8, 2024
@xbzhang1994
Copy link
Author

I'm not sure if this is correct, but it can satisfy my understanding of the maxOverlapSize parameter.

@langchain4j
Copy link
Owner

@18856317221 thank you for reporting! It is indeed a bug, here is a fix: #464

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants