Skip to content
Haskell library for extracting fragments from an HTML document using CSS selectors
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

TransversingCSS - Extract fragments of an HTML document using css selectors

The tagline says it all, you pass in a CSS query and an HTML string and you get back a list of fragments of the html that match that query. Only a subset of the spec is supported, full CSS3 support is planned.

For now you can select:

  • By tag name: table td a
  • By class names: .container .content
  • By Id: #oneId
  • By attribute: [hasIt] [exact=match] [contains*=text] [starts^=with] [ends$=with]
  • Union: a, span, p
  • Immediate children: div > p
  • Get jiggy with it: div[data-attr=yeah] > .mon, div, #oneThing

This module was initially thought as part of my web application testing library, but it may be useful for people doing web scraping too.

Example usage:

Given HTML:

    <title>a title</title>
      <a class="foo big">one</a>
      <li>The First</li>
      <li>The <a class="big bar">Second</a></li>

You can do:

import Text.XML.HXT.TransversingCSS

main :: IO ()
main = do
  html <- readFile "the_html_above.html"

  case findBySelector "a.big, h1" html of
    Left parseError -> putStrLn "Your CSS query was malformed."
    Right results -> print results

And your result would be:

["<a class=\"foo big\">one</a>", "<a class=\"big bar\">Second</a>", "<h1>Hello</h1>"]
You can’t perform that action at this time.