Fast Haskell tagsoup parser
Haskell
Switch branches/tags
Nothing to show
Pull request Compare This branch is 6 commits ahead, 22 commits behind vshabanov:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Text/HTML/TagSoup/Fast
.gitignore
LICENSE
README.md
Setup.hs
fast-tagsoup-utf8-only.cabal

README.md

fast-tagsoup

Fast Haskell tagsoup parser.

Speeds of 20-200MB/sec were observed.

Works only with strict bytestrings.

This library is intended to be used in conjunction with the original tagsoup package:

import Text.HTML.TagSoup hiding (parseTags, renderTags)
import Text.HTML.TagSoup.Fast.Utf8Only

Besides speed fast-tagsoup correctly handles HTML <script> and <style> tags and converts tags to lower case. This fork purposefully removes support for parsing non-utf8 documents, to avoid dependency on text-icu. If you need to handle other encodings, refer to the original http://hackage.haskell.org/package/fast-tagsoup

This parser is used in production in BazQux Reader feeds and comments crawler.