Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser with XPath and CSS selector support.
Java Ruby C HTML Yacc Shell XSLT
Branch: master
Failed to load latest commit information.
bin Allow bin/nokogiri to properly handle urls from STDIN
ext Merge remote-tracking branch 'remotes/larskanis/make-use-of-rake-comp…
lib Version bump to 1.6.7.rc2
patches Moving patch files out of the `ports` dir
suppressions Fixing REE suppressions file to handle the general rb_gc_wipe_stack c…
tasks Removing unused rake tasks file
test Fix test that has started failing on jruby
.autotest removing execute bit. wtf
.cross_rubies Make use of rake-compiler-dock for building Windows binary gems.
.editorconfig Add .editorconfig.
.gemtest opting in to .gemtest
.gitignore Moving patch files out of the `ports` dir
.travis.yml CI updates
CHANGELOG.ja.rdoc Version bump to 1.6.7.rc2
CHANGELOG.rdoc Version bump to 1.6.7.rc2
C_CODING_STYLE.rdoc astyle can be used to approximate the C coding style.
Gemfile Merge remote-tracking branch 'remotes/larskanis/make-use-of-rake-comp…
LICENSE.txt Convert README.rdoc to markdown fmt.
Manifest.txt Fixing hoe-related issues.
README.md Fixing hoe-related issues.
ROADMAP.md Add a ROADMAP note on document-centric API calls.
Rakefile Remove warning that should rarely matter
STANDARD_RESPONSES.md Altered wording in "not a bug" standard response.
Y_U_NO_GEMSPEC.md More tenderlove -> sparklemotion
appveyor.yml CI updates
build_all Update build_all to reflect new build process
dependencies.yml Upgraded to libxml2 2.9.2, and removed patches that no longer require…
test_all test_all: remove jruby 1.6 and add jruby 9k

README.md

Nokogiri

Status

Travis Build Status Appveyor Build Status Code Climate Version Eye

Description

Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors.

XML is like violence - if it doesn’t solve your problems, you are not using enough of it.

Features

  • XML/HTML DOM parser which handles broken HTML
  • XML/HTML SAX parser
  • XML/HTML Push parser
  • XPath 1.0 support for document searching
  • CSS3 selector support for document searching
  • XML/HTML builder
  • XSLT transformer

Nokogiri parses and searches XML/HTML using native libraries (either C or Java, depending on your Ruby), which means it's fast and standards-compliant.

Installation

If this doesn't work:

gem install nokogiri

then please start troubleshooting here:

http://www.nokogiri.org/tutorials/installing_nokogiri.html

There are currently 1,237 Stack Overflow questions about Nokogiri installation. The vast majority of them are out of date and therefore incorrect. Please do not use Stack Overflow.

Instead, tell us when the above instructions don't work for you. This allows us to both help you directly and improve the documentation.

Binary packages

Binary packages are available for some distributions.

Support

There are open-source tutorials (to which we invite contributions!) here: http://nokogiri.org/tutorials

Synopsis

Nokogiri is a large library, but here is example usage for parsing and examining a document:

  require 'nokogiri'
  require 'open-uri'

  # Fetch and parse HTML document
  doc = Nokogiri::HTML(open('http://www.nokogiri.org/tutorials/installing_nokogiri.html'))

  ####
  # Search for nodes by css
  doc.css('nav ul.menu li a').each do |link|
    puts link.content
  end

  ####
  # Search for nodes by xpath
  doc.xpath('//h2 | //h3').each do |link|
    puts link.content
  end

  ####
  # Or mix and match.
  doc.search('code.sh', '//h2').each do |link|
    puts link.content
  end

Requirements

  • Ruby 1.9.3 or higher, including any development packages necessary to compile native extensions.

  • In Nokogiri 1.6.0 and later libxml2 and libxslt are bundled with the gem, but if you want to use the system versions:

Encoding

Strings are always stored as UTF-8 internally. Methods that return text values will always return UTF-8 encoded strings. Methods that return a string containing markup (like to_xml, to_html and inner_html) will return a string encoded like the source document.

WARNING

Some documents declare one encoding, but actually use a different one. In these cases, which encoding should the parser choose?

Data is just a stream of bytes. Humans add meaning to that stream. Any particular set of bytes could be valid characters in multiple encodings, so detecting encoding with 100% accuracy is not possible. libxml2 does its best, but it can't be right all the time.

If you want Nokogiri to handle the document encoding properly, your best bet is to explicitly set the encoding. Here is an example of explicitly setting the encoding to EUC-JP on the parser:

  doc = Nokogiri.XML('<foo><bar /><foo>', nil, 'EUC-JP')

Development

  bundle install
  bundle exec rake

License

MIT. See the LICENSE.txt file.

Something went wrong with that request. Please try again.