Skip to content
i5m edited this page Sep 13, 2010 · 8 revisions

A Fast, Enjoyable HTML Parser for Ruby

Hpricot is a very flexible HTML parser, based on Tanaka Akira’s HTree and John Resig’s jQuery, but with the scanner recoded in C. I’ve borrowed (what I believe to be) the best ideas from these wares to make Hpricot heaps of fun to use.

 # load the Family guy's home page
 require "hpricot" # need hpricot and open-uri
 require "open-uri"
 doc = Hpricot(open("http://www.fox.com/familyguy/index.htm"))
 # change the CSS class on list element ul
 (doc/"ul.site-nav").set("class", "new-site-nav")
 # remove the header
 (doc/"#header").remove
 # print the altered HTML
 puts doc

A Proper Start

Further Information

  • See hpricot.com for interactive demos
  • Hpricot mailing list: send an email to hpricot@librelist.com for information