Skip to content
Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser with XPath and CSS selector support.
Java Ruby C HTML Yacc Shell XSLT
Latest commit 334aec6 May 28, 2016 @flavorjones flavorjones update CHANGELOG
for libxml 2.9.4 and libxslt 1.1.29
Failed to load latest commit information.
bin Fix cli tool example links Jun 5, 2015
ext fix XML::Reader working with non-existent attributes May 28, 2016
lib 1) Added documentation for ParseOptions class. Mar 26, 2016
patches upgrade to libxml2 2.9.4 and libxslt 1.1.29. May 27, 2016
suppressions Fixing REE suppressions file to handle the general rb_gc_wipe_stack c… Dec 19, 2012
tasks add a task to run the test suite against the installed version Feb 20, 2016
test update test: expected value first May 28, 2016
.autotest removing execute bit. wtf Dec 1, 2009
.cross_rubies Update to rake-compiler-dock-0.5.0 with support for Ruby-2.3 cross bu… Jan 1, 2016
.editorconfig Add .editorconfig. Oct 23, 2013
.gemtest opting in to .gemtest Feb 2, 2011
.gitignore Moving patch files out of the `ports` dir Jun 6, 2015
.travis.yml .travis.yml: not use versioned jruby Feb 17, 2016
CHANGELOG.rdoc update CHANGELOG May 28, 2016 1) Added documentation for ParseOptions class. Mar 26, 2016
C_CODING_STYLE.rdoc astyle can be used to approximate the C coding style. Jun 5, 2012
Gemfile update dev dependencies Feb 17, 2016
LICENSE.txt Updated copyright notices to 2016 [ci skip] Jan 3, 2016
Manifest.txt move test_reader.rb into test/xml May 28, 2016 Removing reference to violence. Mar 24, 2016 Update to roadmap Feb 17, 2016
Rakefile add a task to run the test suite against the installed version Feb 21, 2016 Altered wording in "not a bug" standard response. Apr 26, 2012 More tenderlove -> sparklemotion Jun 21, 2012
appveyor.yml CI updates Aug 24, 2015
build_all safety clause to check Gemfile May 27, 2016
dependencies.yml upgrade to libxml2 2.9.4 and libxslt 1.1.29. May 27, 2016
test_all add test coverage for libxml-ruby conflicts Feb 16, 2016



Travis Build Status Appveyor Build Status Code Climate Version Eye


Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors.


  • XML/HTML DOM parser which handles broken HTML
  • XML/HTML SAX parser
  • XML/HTML Push parser
  • XPath 1.0 support for document searching
  • CSS3 selector support for document searching
  • XML/HTML builder
  • XSLT transformer

Nokogiri parses and searches XML/HTML using native libraries (either C or Java, depending on your Ruby), which means it's fast and standards-compliant.


If this doesn't work:

gem install nokogiri

then please start troubleshooting here:

There are currently 1,237 Stack Overflow questions about Nokogiri installation. The vast majority of them are out of date and therefore incorrect. Please do not use Stack Overflow.

Instead, tell us when the above instructions don't work for you. This allows us to both help you directly and improve the documentation.

Binary packages

Binary packages are available for some distributions.


There are open-source tutorials (to which we invite contributions!) here:


Nokogiri is a large library, but here is example usage for parsing and examining a document:

#! /usr/bin/env ruby

require 'nokogiri'
require 'open-uri'

# Fetch and parse HTML document
doc = Nokogiri::HTML(open(''))

puts "### Search for nodes by css"
doc.css('nav li a', 'article h2').each do |link|
  puts link.content

puts "### Search for nodes by xpath"
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
  puts link.content

puts "### Or mix and match."'nav li a', '//article//h2').each do |link|
  puts link.content


  • Ruby 1.9.3 or higher, including any development packages necessary to compile native extensions.

  • In Nokogiri 1.6.0 and later libxml2 and libxslt are bundled with the gem, but if you want to use the system versions:


Strings are always stored as UTF-8 internally. Methods that return text values will always return UTF-8 encoded strings. Methods that return a string containing markup (like to_xml, to_html and inner_html) will return a string encoded like the source document.


Some documents declare one encoding, but actually use a different one. In these cases, which encoding should the parser choose?

Data is just a stream of bytes. Humans add meaning to that stream. Any particular set of bytes could be valid characters in multiple encodings, so detecting encoding with 100% accuracy is not possible. libxml2 does its best, but it can't be right all the time.

If you want Nokogiri to handle the document encoding properly, your best bet is to explicitly set the encoding. Here is an example of explicitly setting the encoding to EUC-JP on the parser:

  doc = Nokogiri.XML('<foo><bar /><foo>', nil, 'EUC-JP')


  bundle install
  bundle exec rake


MIT. See the LICENSE.txt file.

Something went wrong with that request. Please try again.