Skip to content

Home

tenderlove edited this page · 19 revisions

Nokogiri

Nokogiri is a simple HTML / XML parser with much of it’s interface borrowed from Hpricot. It uses libxml2 to parse and search, so it is very fast.

Learn how to Generate HTML.

Here is how to parse HTML:


require 'nokogiri'

doc = Nokogiri::HTML.parse(<<-eohtml)


Hello World

This is an awesome document

I am a paragraph I am a link

eohtml

####

  1. Search for nodes by css
    doc.css(‘p > a’).each do |a_tag|
    puts a_tag.content
    end

####

  1. Search for nodes by xpath
    doc.xpath(‘//p/a’).each do |a_tag|
    puts a_tag.content
    end

####

  1. Or mix and match.
    doc.search(‘//p/a’, ‘p > a’).each do |a_tag|
    puts a_tag.content
    end

###

  1. Find attributes and their values
    doc.search(‘a’).first[‘href’]

Something went wrong with that request. Please try again.