-
Notifications
You must be signed in to change notification settings - Fork 2k
Closed
Labels
Milestone
Description
Bug description
The TokenTextSplitter's Builder pattern fails when only setting specific parameters (like only chunkSize). I think the cause is that Builder fields are initialized with Java's default values (0 for integers, false for booleans) instead of using the default constants defined in the TokenTextSplitter class.
Environment
- Spring AI version: 1.0.0-M6 (through latest 1.0.0-M8)
- Java version: 17
Steps to reproduce
- Use the TokenTextSplitter's Builder pattern setting only one parameter.
val splitter = TokenTextSplitter.builder()
.withChunkSize(100) // Only setting chunk size
.build()- Try to use the splitter to chunk a text that should produce multiple chunks.
- Observe that only a single chunk is returned containing the entire text, regardless of the chunkSize value.
Expected behavior
When using the Builder pattern with only specific parameters set (e.g., chunkSize), the splitter should properly chunk the text according to the specified parameter, while using the default values defined in the TokenTextSplitter class for the unspecified parameters.
Minimal Complete Reproducible example
import org.junit.jupiter.api.Test
import org.springframework.ai.document.Document
import org.springframework.ai.transformer.splitter.TokenTextSplitter
import kotlin.test.assertTrue
class TokenTextSplitterTest {
@Test
fun testBuilderWithOnlyChunkSize() {
val splitter = TokenTextSplitter.builder()
.withChunkSize(10) // Even with a small chunk size that should produce multiple chunks
.build()
val text = "This is a sample text for testing chunking functionality. " +
"It should be split into chunks based on the specified parameters. " +
"Let's see if it works correctly when only chunkSize is specified."
// Try to split the text
val chunks = splitter.split(
Document.builder()
.text(text)
.build()
)
// This assertion fails - we expect multiple chunks but only get one
assertTrue(chunks.size > 1)
}
}