Nokogiri (鋸) is a Rubygem providing HTML, XML, SAX, and Reader parsers with XPath and CSS selector support.
Clone or download
Latest commit 18ffedc Jan 13, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github add an <hr> to the PR template Jan 13, 2019
bin Fix cli tool example links Jun 5, 2015
concourse concourse: ignore changes to CHANGELOG when triggering Jan 13, 2019
ext ensure we trap XML errors while applying XSLT stylesheet Jan 13, 2019
lib version bump to v1.10.1 Jan 13, 2019
patches libxml2: remove patches present in 2.9.9 Jan 4, 2019
suppressions suppressing another ruby 2.6 valgrind warning Jan 6, 2019
test ensure we trap XML errors while applying XSLT stylesheet Jan 13, 2019
.autotest remove support for ruby 1.9.2, 1.9.3 and 2.0 Jun 7, 2016
.codeclimate.yml tell code climate to ignore generated files Jul 20, 2017
.cross_rubies Update Windows cross build for ruby-2.6 Dec 25, 2018
.editorconfig Add .editorconfig. Oct 23, 2013
.gemtest opting in to .gemtest Feb 2, 2011
.gitignore add simplecov so we can see test coverage Jan 7, 2019
.hoerc ensure hoe ignores .yardoc Jan 4, 2019 fix CHANGELOG typo Jan 13, 2019 setting expectations Jan 12, 2019 update README and CONTRIBUTING with link to CoC file Jan 12, 2019
C_CODING_STYLE.rdoc astyle can be used to approximate the C coding style. Jun 5, 2012
Gemfile add simplecov so we can see test coverage Jan 7, 2019
Gemfile-libxml-ruby simplify how we test with libxml-ruby loaded Feb 9, 2017 license-dependencies: bump all headers down by one Dec 26, 2018 formatting copyright holders in Dec 22, 2018
Manifest.txt libxml2: remove patches present in 2.9.9 Jan 4, 2019 update README and CONTRIBUTING with link to CoC file Jan 12, 2019 Update to roadmap Feb 17, 2016
Rakefile copyediting Rakefile's `add_file_to_gem` for clarity Jan 9, 2019 tweak formatting Mar 15, 2018 Altered wording in "not a bug" standard response. Apr 26, 2012 Added a missed word. Sep 24, 2016
appveyor.yml Appveyor: Add ruby-2.4 and ruby-head to build matrix Dec 26, 2017
build_all revise build_all to use the new `gem:jruby` task Dec 28, 2018
dependencies.yml update libxml to 2.9.9 final, libxslt to 1.1.33 final Jan 4, 2019



Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors.


Concourse CI

Code Climate Gem Version SemVer compatibility Tidelift dependencies


  • XML/HTML DOM parser which handles broken HTML
  • XML/HTML SAX parser
  • XML/HTML Push parser
  • XPath 1.0 support for document searching
  • CSS3 selector support for document searching
  • XML/HTML builder
  • XSLT transformer

Nokogiri parses and searches XML/HTML using native libraries (either C or Java, depending on your Ruby), which means it's fast and standards-compliant.


If this doesn't work:

gem install nokogiri

then please start troubleshooting here:

There are currently 1,237 Stack Overflow questions about Nokogiri installation. The vast majority of them are out of date and therefore incorrect. Please do not use Stack Overflow.

Instead, tell us when the above instructions don't work for you. This allows us to both help you directly and improve the documentation.

Binary packages

Binary packages are available for some distributions.


There are open-source tutorials (to which we invite contributions!) here:

Consider subscribing to Tidelift which provides license assurances and timely security notifications for your open source dependencies, including Nokogiri. Tidelift subscriptions also help the Nokogiri maintainers fund our automated testing which in turn allows us to ship releases, bugfixes, and security updates more often.

Security and Vulnerability Reporting

Please report vulnerabilities at

Full information and description of our security policy is in


Nokogiri is a large library, but here is example usage for parsing and examining a document:

#! /usr/bin/env ruby

require 'nokogiri'
require 'open-uri'

# Fetch and parse HTML document
doc = Nokogiri::HTML(open(''))

puts "### Search for nodes by css"
doc.css('nav li a', 'article h2').each do |link|
  puts link.content

puts "### Search for nodes by xpath"
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
  puts link.content

puts "### Or mix and match."'nav li a', '//article//h2').each do |link|
  puts link.content


  • Ruby 2.3.0 or higher, including any development packages necessary to compile native extensions.

  • In Nokogiri 1.6.0 and later libxml2 and libxslt are bundled with the gem, but if you want to use the system versions:

    • First, check out the long list of fixes and changes between releases before deciding to use any version older than is bundled with Nokogiri.

    • At install time, set the environment variable NOKOGIRI_USE_SYSTEM_LIBRARIES or else use the --use-system-libraries argument. (See for specifics.)

    • libxml2 >=2.6.21 with iconv support (libxml2-dev/-devel is also required)

    • libxslt, built with and supported by the given libxml2 (libxslt-dev/-devel is also required)


Strings are always stored as UTF-8 internally. Methods that return text values will always return UTF-8 encoded strings. Methods that return a string containing markup (like to_xml, to_html and inner_html) will return a string encoded like the source document.


Some documents declare one encoding, but actually use a different one. In these cases, which encoding should the parser choose?

Data is just a stream of bytes. Humans add meaning to that stream. Any particular set of bytes could be valid characters in multiple encodings, so detecting encoding with 100% accuracy is not possible. libxml2 does its best, but it can't be right all the time.

If you want Nokogiri to handle the document encoding properly, your best bet is to explicitly set the encoding. Here is an example of explicitly setting the encoding to EUC-JP on the parser:

  doc = Nokogiri.XML('<foo><bar /></foo>', nil, 'EUC-JP')


  bundle install
  bundle exec rake

Code of Conduct

We've adopted the Contributor Covenant code of conduct, which you can read in full in


This project is licensed under the terms of the MIT license.

See this license at