Skip to content

lesson: define a basic terminology

Brian Maddy edited this page Nov 11, 2013 · 5 revisions

This Tutorial is known to work with om version 3.0.4.
Please update this wiki to reflect any other versions that have been tested.

Goals

  • Define a simple OM Terminology for XML metadata
  • Create OM Documents based on your Terminology
  • Create and update XML nodes using the OM Terminology
  • Inspect OM Documents to find out what XPath queries are being used for a given Term
  • Use OM's API to access the underlying Nokogiri Document and the Nodesets it returns from QPath queries

Explanation

Steps

Step 1: Think about what the XML is going to look like

For this first example we want to model simple, flat XML. Let's say the root node of our XML documents is called fields and we have elements for title and author.

<fields>
  <title>ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know.</title>
  <author>Horn, Zoia</author>
</fields>

Note that we do not have any namespaces, attributes on elements, schema declarations, or any other joyful XML features. OM does provide ways to handle these, but it does not require them. We will look at each of those separately in other lessons.

Step 2: Define the Terminology

Now we'll create a file called book_metadata.rb

Paste the following code into that file:

require "om"
class BookMetadata 
  # This include statement adds the behaviors of an OM Document to your class
  include OM::XML::Document

  set_terminology do |t|
    t.root(path: "fields")
    t.title
    t.author
  end

  # This method is called when you create new XML documents from scratch.
  # It must return a Nokogiri::Document.  Other than that, you can make your "default" documents look however you want.
  def self.xml_template
    Nokogiri::XML.parse("<fields/>")
  end
end

Step 3: Create an OM Document based on your Terminology

Open up an irb console (Ruby Interactive Console). Rather than simply calling irb on the command line, Use bundler to ensure that your dependencies are handled predictably.

bundle console
require "./book_metadata"
newdoc = BookMetadata.new
puts newdoc.to_xml
<?xml version="1.0"?>
<fields/> 

Now you have an empty OM document that was initialized using the BookMetadata.xml_template method you defined.

Step 4: Use the Terminology to modify the XML Document and Render it as XML

Because this Document is a BookMetadata object, you can use the Terminology to set and retrieve the values of the Terms you've defined.

newdoc.author = "Horn, Zoia"
 => "Horn, Zoia" 
newdoc.title = "ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know."
 => "ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know." 
puts newdoc.to_xml
<?xml version="1.0"?>
<fields>
  <author>Horn, Zoia</author>
  <title>ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know.</title>
</fields>

As you can see, calling .to_xml has returned an XML document with the title and author set to the values you provided.

OM makes it easy to update these elements.

newdoc.author = ["Horn, Zoia", "Hypatia"]
 => ["Horn, Zoia", "Hypatia"] 
puts newdoc.to_xml
<?xml version="1.0"?>
<fields>
  <author>Horn, Zoia</author>
  <title>ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know.</title>
  <author>Hypatia</author>
</fields>

Step 5: Access the Underlying Nokogiri Document and Stored XPath Queries

Each OM Document you create is basically just a wrapper around a Nokogiri Document and the Document's Terminology is basically just a handy structure that remembers XPath queries for you. You can access the inner Nokogiri Document by calling .ng_xml on the OM Document and you can get the stored XPath query by calling .xpath on any of the terms.

Since OM simply runs XPath queries against that underlying Nokogiri document, you don't need to do anything to keep the OM Document in sync with the Nokogiri Document. You can use the Nokogiri API to make any changes you want to the Nokogiri Document and the OM Document will reflect those changes.

newdoc.title.xpath
 => "//title" 
newdoc.author.xpath
 => "//author"
newdoc.ng_xml
 => #<Nokogiri::XML::Document:0x80776da4 name="document" children=[#<Nokogiri::XML::Element:0x8090bbb0 name="fields" children=[#<Nokogiri::XML::Element:0x80818e74 name="author" children=[#<Nokogiri::XML::Text:0x80573868 "Horn, Zoia">]>, #<Nokogiri::XML::Element:0x804a22e0 name="title" children=[#<Nokogiri::XML::Text:0x805795c4 "ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know.">]>]>]> 

Step 6: Retrieve the Nokogiri Nodeset returned by a Term's XPath query

When you access a Term's values, OM is just running an XPath query for you and returning the values from the XML Nodes that were returned from the query. If you want to get the Nokogiri Nodeset from the XPath Query instead of the value from those Nodes, call .nodeset on the term.

newdoc.author.nodeset
 => [#<Nokogiri::XML::Element:0x80818e74 name="author" children=[#<Nokogiri::XML::Text:0x80573868 "Horn, Zoia">]>] 

Next Step

Go on to Lesson: Define a Terminology with a nested hierarchy of Terms or return to the Tame your XML with OM page.