Skip to content

Conversation

sachinML
Copy link
Contributor

Refs #30200

  • Use split_documents(docs) after header-based splitting (preserves per-section metadata; overlap applied per document).
  • Overlap appears only when a single section exceeds chunk_size.
  • Overlap does not cross section/document boundaries.
  • Consider strip_headers=True to avoid a tiny header-only chunk; keep "" as a fallback separator if text lacks newlines/spaces.

@github-actions github-actions bot added langchain For docs changes to LangChain python For content related to the Python version of LangChain projects oss labels Oct 21, 2025
@mdrxy mdrxy merged commit 8790e49 into langchain-ai:main Oct 21, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

langchain For docs changes to LangChain oss python For content related to the Python version of LangChain projects

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants