Cheat sheet

Bill Davenport edited this page Jan 22, 2018 · 23 revisions

A digest of most of the methods documented at nokogiri.org. Reading the source can help, too.

Topics not covered: RelaxNG validation or Builder See also: http://cheat.errtheblog.com/s/nokogiri

Strings are always stored as UTF-8 internally. Methods that return text values will always return UTF-8 encoded strings. Methods that return XML (like to_xml, to_html and inner_html) will return a string encoded like the source document.

More Resources

Creating and working with Documents

Nokogiri::HTML::Document Nokogiri::XML::Document

  doc = Nokogiri(string_or_io) # Nokogiri will try to guess what type of document you are attempting to parse
  doc = Nokogiri::HTML(string_or_io) # [, url, encoding, options, &block]
  doc = Nokogiri::XML(string_or_io) # [, url, encoding, options, &block]
    # set options with block {|config| config.noblanks.noent.noerror.strict }
    # OR with a bitmask {|config| config.options = Nokogiri::XML::ParseOptions::NOBLANKS | Nokogiri::XML::ParseOptions::NOENT}
    # http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/ParseOptions.html
  # doc = Nokogiri.parse(...)
  # doc = Nokogiri::XML.parse(...) #shortcut to Nokogiri::XML::Document.parse
  # doc = Nokogiri::HTML.parse(...) #shortcut to Nokogiri::HTML::Document.parse

  # document namespaces
  doc.collect_namespaces
  doc.remove_namespaces!
  doc.namespaces
  
  # shortcuts for creating new nodes
  doc.create_cdata(string, &block)
  doc.create_comment(string, &block)
  doc.create_element(name, *args, &block) # Create an element
      doc.create_element "div" # <div></div>
      doc.create_element "div", :class => "container" # <div class='container'></div>
      doc.create_element "div", "contents" # <div>contents</div>
      doc.create_element "div", "contents", :class => "container" # <div class='container'>contents</div>
      doc.create_element "div" { |node| node['class'] = "container" } # <div class='container'></div>
  doc.create_entity
  doc.create_text_node(string, &block)
  
  doc.root
  doc.root=node
  
  # A document is a Node, so see working_with_a_node

Working with Fragments

Nokogiri::XML::DocumentFragment Nokogiri::HTML::DocumentFragment

Generally speaking, unless you expect to have a DOCTYPE and a single root node, you don’t have a document, you have a fragment. For HTML, another rule of thumb is that documents have html and body tags, and fragments usually do not.

A fragment is a Node, but is not a Document. If you need to call methods that are only available on Document, like create_element, call fragment.document.create_element.

  fragment = Nokogiri::XML.fragment(string)
  fragment = Nokogiri::HTML.fragment(string, encoding = nil)
  # Note: Searching a fragment relative to the document root with xpath 
  # will probably not return what you expect. You should search relative to 
  # the current context instead. e.g.
  fragment.xpath('//*').size #=> 0
  fragment.xpath('.//*').size #=> 229

Working with a Nokogiri::XML::Node

  node = Nokogiri::XML::Node.new('name', document) # initialize a new node
  node = document.create_element('name') # shortcut
  
  node.document
  
  node.name # alias of node.node_name
  node.name= # alias of node.node_name=
  
  node.read_only?
  node.blank?
  
  # Type of Node
  node.type # alias of node.node_type
  node.cdata? # type == CDATA_SECTION_NODE
  node.comment? # type == COMMENT_NODE
  node.element? # type == ELEMENT_NODE alias node.elem? 
  node.fragment? # type == DOCUMENT_FRAG_NODE (Document fragment node)
  node.html? # type == HTML_DOCUMENT_NODE
  node.text? # type == TEXT_NODE
  node.xml? # type == DOCUMENT_NODE (Document node type)
  # other types not covered by a convenience method
    # ATTRIBUTE_DECL: Attribute declaration type
    # ATTRIBUTE_NODE: Attribute node type
    # DOCB_DOCUMENT_NODE: DOCB document node type
    # DOCUMENT_TYPE_NODE: Document type node type
    # DTD_NODE: DTD node type
    # ELEMENT_DECL: Element declaration type
    # ENTITY_DECL: Entity declaration type
    # ENTITY_NODE: Entity node type
    # ENTITY_REF_NODE: Entity reference node type
    # NAMESPACE_DECL: Namespace declaration type
    # NOTATION_NODE: Notation node type
    # PI_NODE: PI node type
    # XINCLUDE_END: XInclude end type
    # XINCLUDE_START: XInclude start type
  
  # Attributes, like a hash that maps string keys to string values
  node['src'] # aliases: node.get_attribute, node.attr.
  node['src'] = 'value' # alias node.set_attribute
  node.key?('src') # alias node.has_attribute?
  node.keys 
  node.values
  node.delete('src') # alias of node.remove_attribute
  node.each { |attr_name, attr_value| }
  # Node includes Enumerable, which works on these attribute names and values
  
  # Attribute Nodes
  node.attribute('src') # Get the attribute node with name src
    # Returns a Nokogiri::XML::Attr, a subclass of Nokogiri::XML::Node
    # that provides +.content=+ and +.value=+ to modify the attribute value
  node.attribute_nodes # returns an array of this' the Node attributes as Attr objects.
  node.attribute_with_ns('src', 'namespace') # Get the attribute node with name and namespace
  node.attributes # Returns a hash containing the node's attributes. 
    # The key is the attribute name without any namespace, 
    # the value is a Nokogiri::XML::Attr representing the attribute. 
    # If you need to distinguish attributes with the same name, but with different namespaces, use #attribute_nodes instead.
  
  
  
  
  # Traversing / Modifying
  # +node_or_tags+ can be a Node, a DocumentFragment, a NodeSet, or a string containing markup.
  ## Self
  node.traverse {|node| } # yields all children and self to a block, _recursively_.
  node.remove # alias of node.unlink # Unlink this node from its current context.
  node.replace(node_or_tags)
    # Replace this Node with +node_or_tags+.
    # Returns the reparented node (if +node_or_tags+ is a Node), 
    #   or returns a NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string).
  node.swap(node_or_tags) # like +replace+, but returns self to support chaining
  ## Siblings
  node.next # alias of node.next_sibling # Returns the next sibling node
  node.next=(node_or_tags) # alias of node.add_next_sibling 
    # Inserts node_or_tags after this node (as a sibling).
    # Returns the reparented node (if +node_or_tags+ is a Node)
    #   or returns a NodeSet if (if +node_or_tags is a DocumentFragment, NodeSet, or string.)
  node.after(node_or_tags) # like +next=+, but returns self to suppport chaining
  node.next_element # Returns the next Nokogiri::XML::Element sibling node.
  node.previous # alias of node.previous_sibling # Returns the previous sibling node
  node.previous=(node_or_tags) # alias of node.add_previous_sibling ?
    # Inserts node_or_tags before this node (as a sibling).
    # Returns the reparented node (if +node_or_tags+ is a Node)
    #   or returns a NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string.)
  node.before(node_or_tags) # just like +previous=+, but returns self to suppport chaining
  node.previous_element # Returns the previous Nokogiri::XML::Element sibling node.
  ## Parent
  node.parent
  node.parent=(node)
  ## Children
  node.child # returns a Node
  node.children # Get the list of children of this node as a NodeSet
  node.children=(node_or_tags)
    # Set the inner html for this Node
    # Returns the reparented node (if +node_or_tags+ is a Node), 
    #   or returns a NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string).
  node.elements # alias: node.element_children # Get the list of child Elements of this node as a NodeSet.
  node.add_child(node_or_tags)
    # Add +node_or_tags+ as a child of this Node.
    # Returns the reparented node (if +node_or_tags+ is a Node), 
    #   or returns a NodeSet (if +node_or_tags+ is a DocumentFragment, NodeSet, or string.)
  node << node_or_tags # like above, but returns self to support chaining, e.g. root << child1 << child2
  node.first_element_child # Returns the first child node of this node that is an element.
  node.last_element_child # Returns the last child node of this node that is an element.
  ## Content / Children
  node.content # aliases node.text node.inner_text node.to_str
  node.content=(string) # Set the Node's content to a Text node containing +string+. The string gets XML escaped, and will not be interpreted as markup.
  node.inner_html # (*args) children.map { |x| x.to_html(*args) }.join
  node.inner_html=(node_or_tags)
    # Sets the inner html of this Node to +node_or_tags+
    # Returns self.
    # Also see related method +children=+
  
  
  
  
  
  ## Searching below (see Working with a Nodeset below)
  # see docs for namespace bindings, variable bindings, and custom xpath functions via a handler class
  node.search(*paths) # alias: node / path # paths can be XPath or CSS
  node.at(*paths) # alias node % path # Search for the first occurrence of path. Returns nil if nothing is found, otherwise a Node. (like search(path, ns).first)
  node.xpath(*paths) # search for XPath queries
  node.at_xpath(*paths) # like xpath(*paths).first
  node.css(*rules) # search for CSS rules
  node.at_css(*rules) # like css(*rules).first
  node > selector # Search this node's immediate children using a CSS selector
  
  
  # Searching above
  node.ancestors # list of ancestor nodes, closest to furthest, as a NodeSet.
  node.ancestors(selector) # ancestors that match the selector
  
    
  # Where am I?
  node.path # Returns the path associated with this Node
  node.css_path # Get the path to this node as a CSS expression
  node.matches?(selector) # does this node match this selector?
  node.line # line number from input
  node.pointer_id # internal pointer number
  
  # Namespaces
  node.add_namespace(prefix, href) # alias of node.add_namespace_definition
    # Adds a namespace definition with prefix using href value. The result is as
    # if parsed XML for this node had included an attribute
    # ‘xmlns:prefix=value'. A default namespace for this node (“xmlns=”) can be
    # added by passing ‘nil' for prefix. Namespaces added this way will not show
    # up in #attributes, but they will be included as an xmlns attribute when
    # the node is serialized to XML.
  node.default_namespace=(url)
    # Adds a default namespace supplied as a string url href, to self. The
    # consequence is as an xmlns attribute with supplied argument were present
    # in parsed XML. A default namespace set with this method will now show up
    # in #attributes, but when this node is serialized to XML an “xmlns”
    # attribute will appear. See also #namespace and #namespace=
  node.namespace #   returns the default namespace set on this node (as with an “xmlns=” attribute), as a Namespace object.
  node.namespace=(ns)
    # Set the default namespace on this node (as would be defined with an
    # “xmlns=” attribute in XML source), as a Namespace object ns . Note that a
    # Namespace added this way will NOT be serialized as an xmlns attribute for
    # this node. You probably want #default_namespace= instead, or perhaps
    # #add_namespace_definition with a nil prefix argument.
  node.namespace_definitions
    # returns namespaces defined on self element directly, as an array of
    # Namespace objects. Includes both a default namespace (as in“xmlns=”), and
    # prefixed namespaces (as in “xmlns:prefix=”).
  node.namespace_scopes
    # returns namespaces in scope for self – those defined on self element
    # directly or any ancestor node – as an array of Namespace objects. Default
    # namespaces (“xmlns=” style) for self are included in this array; Default
    # namespaces for ancestors, however, are not. See also #namespaces
  node.namespaced_key?(attribute, namespace)
    # Returns true if attribute is set with namespace
  node.namespaces # Returns a Hash of {prefix => value} for all namespaces on this node and its ancestors.
    # This method returns the same namespaces as #namespace_scopes.
    # 
    # Returns namespaces in scope for self – those defined on self element
    # directly or any ancestor node – as a Hash of attribute-name/value pairs.
    # Note that the keys in this hash XML attributes that would be used to
    # define this namespace, such as “xmlns:prefix”, not just the prefix.
    # Default namespace set on self will be included with key “xmlns”. However,
    # default namespaces set on ancestor will NOT be, even if self has no
    # explicit default namespace.
  # see also attribute_with_ns


  # Rubyisms
  node <=> another_node # Compare two Node objects with respect to their Document. Nodes from different documents cannot be compared.
    # uses xmlXPathCmpNodes "Compare two nodes w.r.t document order"
  node == another_node # compares pointer_id
  node.clone # alias node.dup # Copy this node. An optional depth may be passed in, but it defaults to a deep copy. 0 is a shallow copy, 1 is a deep copy.

  # Visitor pattern
  node.accept(visitor)# calls visitor.visit(self)
  
  # Write it out (sorted from most flexible/hardest to use to least flexible/easiest to use)
  node.write_to(io, *options)
    # Write Node to +io+ with +options+. +options+ modify the output of
    # this method.  Valid options are:
    #
    # * +:encoding+ for changing the encoding
    # * +:indent_text+ the indentation text, defaults to one space
    # * +:indent+ the number of +:indent_text+ to use, defaults to 2
    # * +:save_with+ a combination of SaveOptions constants.
      # SaveOptions
        # AS_BUILDER: Save builder created document
        # AS_HTML: Save as HTML
        # AS_XHTML: Save as XHTML
        # AS_XML: Save as XML
        # DEFAULT_HTML: the default for HTML document
        # DEFAULT_XHTML: the default for XHTML document
        # DEFAULT_XML: the default for XML documents
        # FORMAT: Format serialized xml
        # NO_DECLARATION: Do not include declarations
        # NO_EMPTY_TAGS: Do not include empty tags
        # NO_XHTML: Do not save XHTML
    # e.g. node.write_to(io, :encoding => 'UTF-8', :indent => 2)
  node.write_html_to(io, options={}) # uses write_to with :save_with => DEFAULT_HTML option (libxml2.6 does dump_html)
  node.write_xhtml_to(io. options={}) # uses write_to with :save_with => DEFAULT_XHTML option (libxml2.6 does dump_html)
  node.write_xml_to(io, options={}) # uses write_to with :save_with => DEFAULT_XML option
  node.serialize # Serialize Node a string using +options+, provided as a hash or block. Uses write_to (via StringIO)
    # node.serialize(:encoding => 'UTF-8', :save_with => FORMAT | AS_XML)
    # node.serialize(:encoding => 'UTF-8') do |config|
    #   config.format.as_xml
    # end
  node.to_html(options={}) # serializes with :save_with => DEFAULT_HTML option (libxml2.6 does dump_html)
  node.to_xhtml(options={}) # serializes with :save_with => DEFAULT_XHTML option (libxml2.6 does dump_html)
  node.to_xml(options={}) # serializes with :save_with => DEFAULT_XML option
  node.to_s # document.xml? ? to_xml : to_html

  node.inspect
  node.pretty_print(pp) # to enhance pp

  # Utility
  node.encode_special_chars(str) # Encodes special characters :P
  node.fragment(tags) # Create a DocumentFragment containing tags that is relative to this context node.
  node.parse(string_or_io, options={})
    # Parse +string_or_io+ as a document fragment within the context of
    # *this* node.  Returns a XML::NodeSet containing the nodes parsed from
    # +string_or_io+.
  
  # External subsets, like DTD declarations
  node.create_external_subset(name, external_id, system_id)
  node.create_internal_subset(name, external_id, system_id)
  node.external_subset
  node.internal_subset
  
  # Other:
  node.description # Fetch the Nokogiri::HTML::ElementDescription for this node. Returns nil on XML documents and on unknown tags.
    # e.g. if node is an <img> tag: Nokogiri::HTML::ElementDescription['img']  Nokogiri::HTML::ElementDescription: img embedded image >
  node.decorate! # Decorate this node with the decorators set up in this node's Document. Used internally to provide Slop support and Hpricot compatibility via Nokogiri::Hpricot
  node.do_xinclude # options as a block or hash
    # Do xinclude substitution on the subtree below node. If given a block, a
    # Nokogiri::XML::ParseOptions object initialized from +options+, will be
    # passed to it, allowing more convenient modification of the parser options.

Working with a Nokogiri::XML::NodeSet

  nodes = Nokogiri::XML::NodeSet.new(document, list=[])
  
  # Set operations
  nodes | other_nodeset # UNION, i.e. merging the sets, returning a new set
  nodes + other_nodeset # UNION, i.e. merging the sets, returning a new set
  nodes & other_nodeset # INTERSECTION # i.e. return a new NodeSet with the common nodes only
  nodes - other_nodeset # DIFFERENCE Returns a new NodeSet containing the nodes in this NodeSet that aren't in other_nodeset
  nodes.include?(node)
  nodes.empty?
  nodes.length # alias nodes.size
  nodes.delete(node) # Delete node from the Nodeset, if it is a member. Returns the deleted node if found, otherwise returns nil.

  # List operations (includes Enumerable)
  nodes.each {|node| }
  nodes.first
  nodes.last
  nodes.reverse # Returns a new NodeSet containing all the nodes in the NodeSet in reverse order
  nodes.index(node) # returns the numeric index or nil
  nodes[3] # element at index 3
  nodes[3,4] # return a NodeSet of size 4, starting at index 3
  nodes[3..6] # or return a NodeSet using a range of indexes
  # alias nodes.slice
  nodes.pop # Removes the last element from set and returns it, or nil if the set is empty
  nodes.push(node) # alias nodes << node # Append node to the NodeSet.
  nodes.shift # Returns the first element of the NodeSet and removes it. Returns nil if the set is empty.
  nodes.filter(expr) # Filter this list for nodes that match expr. WHAT DOES THIS RETURN? NodeSet? Array?
    # find_all { |node| node.matches?(expr) }
  
  nodes.children # Returns a new NodeSet containing all the children of all the nodes in the NodeSet
  
  # Content
  nodes.inner_html(*args) # Get the inner html of all contained Node objects
  nodes.inner_text # alias nodes.text
  
  # Convenience modifiers
  nodes.remove # alias of nodes.unlink # Unlink this NodeSet and all Node objects it contains from their current context.
  nodes.wrap("<div class='container'></div>") # wrap new xml around EACH NODE in a Nodeset
  nodes.before(datum) # Insert datum before the first Node in this NodeSet # e.g. first.before(datum)
  nodes.after(datum) # Insert datum after the last Node in this NodeSet # e.g. last.after(datum)
  nodes.attr(key, value) # set the attribute key to value on all Node objects in the NodeSet
  nodes.attr(key) { |node| 'value' } # set the attribute key to the result of the block on all Node objects in the NodeSet
    # alias nodes.attribute, nodes.set
  nodes.remove_attr(name) # removes the attribute from all nodes in the nodeset
  nodes.add_class(name) # Append the class attribute name to all Node objects in the NodeSet.
  nodes.remove_class(name = nil) # if nil, removes the class attrinute from all nodes in the nodeset
  
  # Searching
  nodes.search(*paths) # alias nodes / path
  nodes.at(*paths) # alias nodes % path
  nodes.xpath(*paths)
  nodes.at_xpath(*paths)
  nodes.css(*rules)
  nodes.at_css(*rules)
  nodes > selector # Search this NodeSet's nodes' immediate children using CSS selector selector
  
  # Writing out
  nodes.to_a # alias nodes.to_ary # Return this list as an Array
  nodes.to_html(*args)
  nodes.to_s
  nodes.to_xhtml(*args)
  nodes.to_xml(*args)
  
  # Rubyisms
  nodes == nodes # Two NodeSets are equal if the contain the same number of elements and if each element is equal to the corresponding element in the other NodeSet
  nodes.dup # Duplicate this node set
  nodes.inspect

Miscellany

  nc = Nokogiri::HTML::NamedCharacters # a Nokogiri::HTML::EntityLookup
  nc[key] # like nc.get(key).try(:value) # e.g. nc['gt'] (62) or nc['rsquo'] (8217)
  nc.get(key) # returns an Nokogiri::HTML::EntityDescription
    # e.g. nc.get('rsquo') #=>  #<struct Nokogiri::HTML::EntityDescription value=8217, name="rsquo", description="right single quotation mark, U+2019 ISOnum">
  
  # Adding a Processing Instruction (like <?xml-stylesheet?>)
  # Nokogiri::XML::ProcessingInstruction http://nokogiri.org/tutorials/modifying_an_html_xml_document.html
  pi = Nokogiri::XML::ProcessingInstruction.new(doc, "xml-stylesheet",'type="text/xsl" href="foo.xsl"')
  doc.root.add_previous_sibling(pi)

Reader parsers

Reader parsers can be used to parse very large XML documents quickly without the need to load the entire document into memory or write a SAX document parser. The reader makes each node in the XML document available exactly once, only moving forward, like a cursor.

  reader = Nokogiri::XML::Reader(string_or_io)
    # attrs
    # .encoding
    # .errors
    # .source

  # Reading
  reader.each {|node|  } # node and reader are the same object. shortcut for while(node = self.read) yield(node); end;
  reader.read # Move the Reader forward through the XML document.

  node.name
  node.local_name

  # Attributes
  node.attribute('src')
  node.attribute_at(1)
  node.attribute_count
  node.attribute_nodes
  node.attributes
  node.attributes?

  # Content
  node.empty_element?
  node.self_closing?
  node.value # Get the text value of the node if present as a utf-8 encoded string. Does NOT advance the reader.
  node.value? # Does this node have a text value?
  node.inner_xml # Read the contents of the current node, including child nodes and markup into a utf-8 encoded string. Does NOT advance the reader
  node.outer_xml # Does NOT advance the reader

  node.base_uri # Get the xml:base of the node
  node.default? # Was an attribute generated from the default value in the DTD or schema?
  node.depth

  # Namespaces and the rest
  node.namespace_uri # Get the URI defining the namespace associated with the node
  node.namespaces # Get a hash of namespaces for this Node
  node.prefix # Get the shorthand reference to the namespace associated with the node.
  node.xml_version # Get the XML version of the document being read
  node.lang # Get the xml:lang scope within which the node resides.
  node.node_type
    # one of 
    # TYPE_ATTRIBUTE
    # TYPE_CDATA
    # TYPE_COMMENT
    # TYPE_DOCUMENT
    # TYPE_DOCUMENT_FRAGMENT
    # TYPE_DOCUMENT_TYPE
    # TYPE_ELEMENT
    # TYPE_END_ELEMENT
    # TYPE_END_ENTITY
    # TYPE_ENTITY
    # TYPE_ENTITY_REFERENCE
    # TYPE_NONE
    # TYPE_NOTATION
    # TYPE_PROCESSING_INSTRUCTION
    # TYPE_SIGNIFICANT_WHITESPACE
    # TYPE_TEXT
    # TYPE_WHITESPACE
    # TYPE_XML_DECLARATION
  node.state # Get the state of the reader

XSD Validation

XSD XSD::XMLParser XSD::XMLParser::Nokogiri

  xsd = Nokogiri::XML::Schema(string_or_io_to_schema_file)
  doc = Nokogiri::XML(File.read(PO_XML_FILE))
  
  xsd.valid?(doc) # => true/false
   
  xsd.validate(doc) # returns an an array of SyntaxError s
  xsd.validate(doc).each do |syntax_error|
    syntax_error.error?
    syntax_error.fatal?
    syntax_error.none?
    syntax_error.to_s
    syntax_error.warning?
    
    # undocumented attributes
    syntax_error.code R
    syntax_error.column R
    syntax_error.domain R
    syntax_error.file R
    syntax_error.int1 R
    syntax_error.level R
    syntax_error.line R
    syntax_error.str1 R
    syntax_error.str2 R
    syntax_error.str3 R
  end
  
  
  # http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/Schema.html
  # http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/AttributeDecl.html
  # http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/DTD.html
  # http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/ElementDecl.html
  # http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/ElementContent.html
  # http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/EntityDecl.html
  # http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/EntityReference.html
  
  doc.validate # validate it against its DTD, if it has one

CSS Parsing

Nokogiri::CSS Nokogiri::CSS::Node Nokogiri::CSS::Parser Nokogiri::CSS::SyntaxError Nokogiri::CSS::Tokenizer Nokogiri::CSS::Tokenizer::ScanError

  # http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/CSS.html
  Nokogiri::CSS.parse('selector') # => returns an AST
  Nokogiri::CSS.xpath_for('selector', options={})
  
  # http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/CSS/Node.html
    # attr: type, value
    #methods
    # accept(visitor)
    # find_by_type
    # new
    # preprocess!
    # to_a
    # to_type
    # to_xpath
  # http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/CSS/Parser.html # a Racc generated Parser

XSLT Transformation

Nokogiri::XSLT Nokogiri::XSLT::Stylesheet

  doc   = Nokogiri::XML(File.read('some_file.xml'))
  xslt  = Nokogiri::XSLT(File.read('some_transformer.xslt'))
  puts xslt.transform(doc) # [, xslt_parameters]
  #   xslt.serialize(doc) # to am xml string
  #   xslt.apply_to(doc, params=[]) # equivalent to xslt.serialize(xslt.transform(doc, params))

SAX Parsing

Event-driving XML parsing appropriate for reading very large XML files without reading the entire document into memory. The best documentation is in this file.

# Document template
# Define any or all of these methods to get their notifications:
# Your document doesn't have to subclass Nokogiri::XML::SAX::Document, 
# doing so just saves you from having to define all the sax methods, 
# rather than the few you need.
class MyDocument < Nokogiri::XML::SAX::Document
  def xmldecl(version, encoding, standalone)
  end
  def start_document
  end
  def end_document
  end
  def start_element(name, attrs = [])
  end
  def end_element(name)
  end
  def start_element_namespace(name, attrs = [], prefix = nil, uri = nil, ns = [])
  end
  def end_element_namespace(name, prefix = nil, uri = nil)
  end
  def characters(string)
  end
  def comment(string)
  end
  def warning(string)
  end
  def error(string)
  end
  def cdata_block(string)
  end
end

# Standard Parser
parser = Nokogiri::XML::SAX::Parser.new(MyDocument.new) # [, encoding = 'UTF-8]
# A block can be passed to the parse methods to get the ParserContext before parsing, but you probably don't need that
parser.parse(string_or_io)
parser.parse_io(io) # [, encoding = "ASCII"]
parser.parse_file(filename)
parser.parse_memory(string)

# If you want HTML correction features, instantiate this parser instead
parser = Nokogiri::HTML::SAX::Parser.new(MyDoc.new)

(If you're a weirdo,) You can stream the XML manually using Nokogiri::SAX::PushParser The best documentation is this file.

Slop decorator (Don’t use this)

The ::Slop decorator implements method_missing such that methods may be used instead of CSS or XPath. See the bottom of this page Nokogiri.Slop Nokogiri::XML::Document#slop! Nokogiri::Decorators::Slop

doc = Nokogiri::Slop(string_or_io)
doc = Nokogiri(string_or_io).slop!
doc = Nokogiri::HTML(string_or_io).slop!
doc = Nokogiri::XML(string_or_io).slop!

doc = Nokogiri::Slop(<<-eohtml)
  <html>
    <body>
      <p>first</p>
      <p>second</p>
    </body>
  </html>
eohtml
assert_equal('second', doc.html.body.p[1].text)


doc = Nokogiri::Slop <<-EOXML
<employees>
  <employee status="active">
    <fullname>Dean Martin</fullname>
  </employee>
  <employee status="inactive">
    <fullname>Jerry Lewis</fullname>
  </employee>
</employees>
EOXML

# navigate!
doc.employees.employee.last.fullname.content # => "Jerry Lewis"

# access node attributes!
doc.employees.employee.first["status"] # => "active"

# use some xpath!
doc.employees.employee("[@status='active']").fullname.content # => "Dean Martin"
doc.employees.employee(:xpath => "@status='active'").fullname.content # => "Dean Martin"

# use some css!
doc.employees.employee("[status='active']").fullname.content # => "Dean Martin"
doc.employees.employee(:css => "[status='active']").fullname.content # => "Dean Martin"
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.