Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Minimal RegEx-based JavaScript library for converting HTML to Text

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 tests
Octocat-spinner-32 .gitignore
Octocat-spinner-32 README.mdown
Octocat-spinner-32 Tests.html
Octocat-spinner-32 jsHtmlToText.js
README.mdown

jsHtmlToText is a basic regex-based HTML stripper.

How it works

  1. It removes all linebreaks, script-tags, styles, comments and DOCTYPE.
  2. Replaces most of block tags with double newlines.
  3. Replaces some of the block tags with a single newline.
  4. Replaces <br> with a single newline.
  5. Removes all html-tags left after steps 1-4
  6. Makes sure there are no more that two linebreaks in a row, removes linebreaks at the start and at the end of string, trims rightmost whitespaces and decodes html entities such as  

Extensions

You are able to change process of formatting by providing hooks in pre-processing, tag-replacing and post-processing. See tests/extensions.coffee for examples of hooks.

History

It was created for a custom WebTV widget framework, which did not support other existing implementations.

Use it when:

  • you need a minimal implementation
  • don't have access to a proper JavaScript implementation (it was originally written for Konqueror)
  • there no strong security requirements
Something went wrong with that request. Please try again.