The libxml gem provides Ruby language bindings for GNOME's Libxml2 XML toolkit. It is free software, released under the MIT License.
We think libxml-ruby is the best XML library for Ruby because:
Speed - Its much faster than REXML and Hpricot
Features - It provides an amazing number of featues
Conformance - It passes all 1800+ tests from the OASIS XML Tests Suite
libxml-ruby requires Ruby 1.8.4 or higher. It is dependent on the following libraries to function properly:
libm (math routines: very standard)
If you are running Linux or Unix you'll need a C compiler so the extension can be compiled when it is installed. If you are running Windows, then install the Windows specific RubyGem which includes an already built extension.
The easiest way to install libxml-ruby is via Ruby Gems. To install:
gem install libxml-ruby
If you are running Windows, make sure to install the Win32 RubyGem which includes an already built binary file. The binary is built against libxml2 version 2.7.2 and iconv version 1.11. Both of these are also included as pre-built binaries, and should be put either in the libxml/lib directory or on the Windows path. Due to a bug in ruby-gems, you cannot install the gem to a path that contains spaces (see rubyforge.org/tracker/?func=detail&aid=23003&group_id=126&atid=577).
The Windows binaries are built with MingW and include libxml-ruby, libxml2 and iconv. The gem also includes a Microsoft VC++ 2008 solution. If you wish to run a debug version of libxml-ruby on Windows, then it is highly recommended you use VC++.
Using libxml is easy. First decide what parser you want to use:
Generally you'll want to use the LibXML::XML::Parser which provides a tree based API.
For larger documents that don't fit into memory, or if you prefer an input based API, use the LibXML::XML::Reader.
To parse HTML files use LibXML::XML::HTMLParser.
If you are masochistic, then use the LibXML::XML::SaxParser, which provides a callback API.
Once you have chosen a parser, choose a datasource. Libxml can parse files, strings, URIs and IO streams. For each data source you can specify an LibXML::XML::Encoding, a base uri and various parser options. For more information, refer the LibXML::XML::Parser.document, LibXML::XML::Parser.file, LibXML::XML::Parser.io or LibXML:::XML::Parser.string methods (the same methods are defined on all four parser classes).
Beyond the basics of parsing and processing XML and HTML documents, libxml provides a wealth of additional functionality.
Most commonly, you'll want to use its LibXML::XML::XPath support, which makes it easy to find data inside a XML document. Although not as popular, LibXML::XML::XPointer provides another API for finding data inside an XML document.
Often times you'll need to validate data before processing it. For example, if you accept user generated content submitted over the Web, you'll want to verify that it does not contain malicious code such as embedded scripts. This can be done using libxml's powerful set of validators:
Relax Schemas (LibXML::XML::RelaxNG)
XML Schema (LibXML::XML::Schema)
Finally, if you'd like to use XSL Transformations to process data, then install the libxslt gem which is available at rubyforge.org/projects/libxsl/.
For in-depth information about using libxml-ruby please refer to its online Rdoc documentation.
All libxml classes are in the LibXML::XML module. The easiest way to use libxml is to require 'xml'. This will mixin the LibXML module into the global namespace, allowing you to write code like this:
require 'xml' document = XML::Document.new
However, when creating an application or library you plan to redistribute, it is best to not add the LibXML module to the global namespace, in which case you can either write your code like this:
require 'libxml' document = LibXML::XML::Document.new
Or you can utilize a namespace for you own work and include LibXML into it. For example:
require 'libxml' mdoule MyApplication include LibXML class MyClass def some_method document = XML::Document.new end end end
For simplicity's sake, the documentation uses the xml module in its examples.
In addition to being feature rich and conformation, the main reason people use libxml-ruby is for performance. Here are the results of a couple simple benchmarks recently blogged about on the Web (you can find them in the benchmark directory of the libxml distribution).
user system total real libxml 0.032000 0.000000 0.032000 ( 0.031000) Hpricot 0.640000 0.031000 0.671000 ( 0.890000) REXML 1.813000 0.047000 1.860000 ( 2.031000)
user system total real libxml 0.641000 0.031000 0.672000 ( 0.672000) hpricot 5.359000 0.062000 5.421000 ( 5.516000) rexml 22.859000 0.047000 22.906000 ( 23.203000)
Documentation is provided via rdoc. To generate the documentation, run the the command 'rake doc'. libxml-ruby's online documentation is generated using Hanna. To use hanna:
gem install mislav-hanna rake rdoc RDOCOPT="-S -T hanna"
Note that older versions of Rdoc, which ship with Ruby 1.8.x, will report a number of errors. To avoid them, install Rdoc 2.1 or higher from RubyForge (rdoc.rubyforge.org/). Once you have installed the gem, you'll have to disable the version of Rdoc that Ruby 1.8.x includes. An easy way to do that is rename the directory uby/lib/ruby/1.8/rdoc to ruby/lib/ruby/1.8/rdoc_old.
If you have any questions about using libxml-ruby, please send them to firstname.lastname@example.org. If you have found any bugs in libxml-devel, or have developed new patches, please submit them to Ruby Forge at rubyforge.org/tracker/?group_id=494.
See LICENSE for license information.