TXT

Plain Text

Plain text is a limited, yet universal and robust format for storing textual content.

Many of the formats supported by bconv are technically plain-text files (e.g. PubTator, BioC JSON, CoNLL), but use some mark-up to denote document structure, metadata, or annotations. The txt format, however, holds only the contents of a document in plain text, precluding the encoding of metadata and annotations, and supporting document structure only to a very limited extent.

The txt.json format is a simple wrapper for the txt format. It allows representing multiple documents in a single file and supports a document ID.

Examples

`txt` (single-doc)

Lidocaine-induced cardiac asystole.

Intravenous administration of a single 50-mg bolus of lidocaine in a 67-year-old man ...

→ Full example

`txt.json` (multi-doc)

[
  {
    "id": "354896",
    "text": "Lidocaine-induced cardiac asystole.\n\nIntravenous administration of ..."
  }
]

→ Full example

Sources

The Wikipedia articles on text files and plain text as a format provide information and further reading about many aspects of the format.

Notes

Document structure: Plain-text files are interpreted as a single document. Blank lines are interpreted as section boundaries, unless the single_section option is set, in which case the entire text is read as a single section. With the sentence_split option, line breaks are interpreted/inserted as sentence boundaries (in this case, bconv attempts no further sentence splitting when loading). Multiple documents per file can only be represented in the txt.json format.
Metadata: The filename (if available) is used as a fallback for inferring the document ID, if none was provided to the load() call.
Whitespace: Line breaks may be indicative of document structure, depending on the options single_section and sentence_split, as described above. When serialising text alongside stand-off annotations (eg. bionlp), do not use the sentence_split option, as it does not guarantee to preserve character offsets.

Loaders

`TXTLoader`

Properties

fmt	`txt`
native type	Document
lazy loading	no
supports text	yes
supports annotations	no
stream type	text

Options

name	type	default	purpose
single_section	bool	`False`	Conflate all content into a single section
sentence_split	bool	`False`	Interpret line breaks as given sentence boundaries

`TXTJSONLoader`

Properties

fmt	`txt.json`
native type	Collection
lazy loading	no
supports text	yes
supports annotations	no
stream type	text

Options

name	type	default	purpose
single_section	bool	`False`	Conflate all content into a single section
sentence_split	bool	`False`	Interpret line breaks as given sentence boundaries

Exporters

`TXTFormatter`

Properties

fmt	`txt`
supports text	yes
supports annotations	no
stream type	text

Options

`TXTJSONFormatter`

Properties

fmt	`txt.json`
supports text	yes
supports annotations	no
stream type	text

Options

bconv Documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TXT

Plain Text

Examples

`txt` (single-doc)

`txt.json` (multi-doc)

Sources

Notes

Loaders

`TXTLoader`

Properties

Options

`TXTJSONLoader`

Properties

Options

Exporters

`TXTFormatter`

Properties

Options

`TXTJSONFormatter`

Properties

Options

Clone this wiki locally

TXT

Plain Text

Examples

txt (single-doc)

txt.json (multi-doc)

Sources

Notes

Loaders

TXTLoader

Properties

Options

TXTJSONLoader

Properties

Options

Exporters

TXTFormatter

Properties

Options

TXTJSONFormatter

Properties

Options

Clone this wiki locally

`txt` (single-doc)

`txt.json` (multi-doc)

`TXTLoader`

`TXTJSONLoader`

`TXTFormatter`

`TXTJSONFormatter`