Skip to content

TokenTextSplitter Builder pattern fails when only some parameters are set #3166

@ChoHadam

Description

@ChoHadam

Bug description

The TokenTextSplitter's Builder pattern fails when only setting specific parameters (like only chunkSize). I think the cause is that Builder fields are initialized with Java's default values (0 for integers, false for booleans) instead of using the default constants defined in the TokenTextSplitter class.

Environment

  • Spring AI version: 1.0.0-M6 (through latest 1.0.0-M8)
  • Java version: 17

Steps to reproduce

  1. Use the TokenTextSplitter's Builder pattern setting only one parameter.
val splitter = TokenTextSplitter.builder()
    .withChunkSize(100)  // Only setting chunk size
    .build()
  1. Try to use the splitter to chunk a text that should produce multiple chunks.
  2. Observe that only a single chunk is returned containing the entire text, regardless of the chunkSize value.

Expected behavior

When using the Builder pattern with only specific parameters set (e.g., chunkSize), the splitter should properly chunk the text according to the specified parameter, while using the default values defined in the TokenTextSplitter class for the unspecified parameters.

Minimal Complete Reproducible example

import org.junit.jupiter.api.Test
import org.springframework.ai.document.Document
import org.springframework.ai.transformer.splitter.TokenTextSplitter
import kotlin.test.assertTrue


class TokenTextSplitterTest {
    @Test
    fun testBuilderWithOnlyChunkSize() {
        val splitter = TokenTextSplitter.builder()
            .withChunkSize(10) // Even with a small chunk size that should produce multiple chunks
            .build()

        val text = "This is a sample text for testing chunking functionality. " +
                "It should be split into chunks based on the specified parameters. " +
                "Let's see if it works correctly when only chunkSize is specified."

        // Try to split the text
        val chunks = splitter.split(
            Document.builder()
                .text(text)
                .build()
        )

        // This assertion fails - we expect multiple chunks but only get one
        assertTrue(chunks.size > 1)
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    ETLbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions