# How to split by character

This is the simplest method. This splits based on a given character sequence, which defaults to `"\n\n"`. Chunk length is measured by number of characters.

1. How the text is split: by single character separator.
2. How the chunk size is measured: by number of characters.

To obtain the string content directly, use `.split_text`.

To create LangChain [Document](https://api.js.langchain.com/classes/langchain_core_documents.Document.html) objects (e.g., for use in downstream tasks), use `.createDocuments`.

In [2]:
import { CharacterTextSplitter } from "@langchain/textsplitters"

// Load an example document
const stateOfTheUnion = await Deno.readTextFile("../../../../examples/state_of_the_union.txt");

const textSplitter = new CharacterTextSplitter({
    separator: "\n\n",
    chunkSize: 1000,
    chunkOverlap: 200,
})
const texts = await textSplitter.createDocuments([stateOfTheUnion]);
console.log(texts[0])

Document {
  pageContent: "Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and th"... 839 more characters,
  metadata: { loc: { lines: { from: 1, to: 17 } } }
}


Use `.createDocuments` to propagate metadata associated with each document to the output chunks:

In [4]:
const metadatas = [{ document: 1 }, { document: 2 }]
const documents = await textSplitter.createDocuments(
    [stateOfTheUnion, stateOfTheUnion], metadatas
)
console.log(documents[0])

Document {
  pageContent: "Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and th"... 839 more characters,
  metadata: { document: 1, loc: { lines: { from: 1, to: 17 } } }
}


Use `.splitText` to obtain the string content directly:

In [6]:
(await textSplitter.splitText(stateOfTheUnion))[0]

[32m"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and th"[39m... 839 more characters