plainize: Light but powerful HTML to plain text conversion

Convert an HTML string into plain text.

Caveats

Currently is still very basic, but can be easily improved upon. I'm implementing this as I need it, or as I receive patches. Some things that come to mind:
- Whitespace isn't always converted consistently.
- Targets of links aren't converted
- Lists aren't converted nicely
- Preformatted text isn't kept preformatted
- Tables aren't converted nicely
- ...
There aren't any options yet. This isn't by design, just because I didn't need them yet.
Currently doesn't work on Node.js out of the box. It's easy to do so (it's done for the unit tests), but since html-to-text doesn't have any problems under Node.js, I didn't see the need so far.

I'm happily accepting patches that improves upon this.

Motivation

Most solutions found online for converting HTML to plaintext simply use regular expressions to strip tags. This doesn't always work correctly, and even if it did, such a solution only yields very basic results (inconsistent whitespace, includes content that shouldn't be included, ...). Moreover, such a solution is fundamentally hard to improve.

Some other implementations use a combination of the browser's textContent and innerHTML DOM properties. Because of varying availability and implementations of these properties, these solutions result in unreliable results across browsers.

html-to-text uses a custom HTML parser to convert to plain text. This works very well, but the major downside is that it has a lot of heavy dependencies, weighing in at more than 200kb of uncompressed JavaScript (±80kb compressed), which isn't convenient in browser environments.

How it works

Plainize uses the browser's DOM implementation to parse the HTML, and then iterates over the parsed DOM tree to convert individual elements into plain text.

Installation

yarn add plainize

or

npm install plainize

Usage

import plainize from 'plainize';

// Prints 'This is bold text.'
console.log(plainize('This is <b>bold</b> text.'));

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
test		test
.eslintignore		.eslintignore
.eslintrc		.eslintrc
.gitignore		.gitignore
.npmignore		.npmignore
README.md		README.md
package.json		package.json
plainize.js		plainize.js
webpack.config.js		webpack.config.js
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

plainize: Light but powerful HTML to plain text conversion

Caveats

Motivation

How it works

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

remko/plainize

Folders and files

Latest commit

History

Repository files navigation

plainize: Light but powerful HTML to plain text conversion

Caveats

Motivation

How it works

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages