Skip to content

lesson: define a terminology with a nested hierarchy of terms

jcoyne edited this page Oct 22, 2014 · 16 revisions

This lesson is known to work with om version 3.0.4.
Please update this wiki to reflect any other versions that have been tested.

Goals

Explanation

Steps

Step 1: Think about what the XML is going to look like

Most XML is not flat. It has hierarchies of nodes nested in semantically relevant ways. For example, we might group a person's given name and family name within a name node

<fields>
  <title>ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know.</title>
  <name>
    <givenName>Zoia</givenName>
    <familyName>Horn</familyName>
    <role>
      <text>Author</text>
      <code>AUT</code>
    </role>
  </name>
</fields>

Step 2: Define the Terminology

Now we'll create a file called fancy_book_metadata.rb

Paste the following code into that file:

require "om"
class FancyBookMetadata
  include OM::XML::Document

  set_terminology do |t|
    t.root(path: "fields")
    t.title
    
    # The underscore is purely to avoid namespace conflicts.
    t.name_ {
      t.family_name(path: "familyName")
      t.given_name(path: "givenName")
      t.role {
        t.text
        t.code
      }
    }
  end
  
  # This method is called when you create new XML documents from scratch.
  # It must return a Nokogiri::Document.  Other than that, you can make your "default" documents look however you want.
  def self.xml_template
    Nokogiri::XML.parse("<fields/>")
  end
end

Note that we are using the :path option to define Terms with names like family_name that correspond to xml elements with names like familyName. This allows you to have consistent method names that use Ruby conventions (or whatever conventions you prefer) even though the actual element names in your XML can widely vary.

Step 3: Create an OM Document based on your Terminology

Restart the console

bundle console

Require the FancyBookMetadata class definition.

require "./fancy_book_metadata"
fancybook = FancyBookMetadata.new
puts fancybook.to_xml
<?xml version="1.0"?>
<fields/> 

Now you can use the terms to edit the XML. Call the Terms according to how they are nested in the Terminology. For example, in this Terminology we have nested given_name inside name, so you can call .name.given_name on your document.

fancybook.name.given_name = "Zoia"
 => "Zoia" 
fancybook.name.family_name = "Horn"
 => "Horn" 
fancybook.name.role.text = "author"
 => "author" 
fancybook.name.role.code = "AUT"
 => "AUT" 
puts fancybook.to_xml
<?xml version="1.0"?>
<fields>
  <name><givenName>Zoia</givenName><familyName>Horn</familyName><role><text>author</text><code>AUT</code></role></name>
</fields>
 => nil 

Notice that we never had to explicitly create the <role> node before inserting the <text> and <code> nodes into it. OM handled that for us.

Step 4: Handling Multiple Trees of Nodes

Say we have two authors. We should create two <name> nodes, each with a <givenName>, and <familyName> and <role>

<fields>
  <title>ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know.</title>
  <name>
    <givenName>Zoia</givenName>
    <familyName>Horn</familyName>
    <role>
      <text>Author</text>
      <code>AUT</code>
    </role>
  </name>
  <name>
    <givenName>Julius</givenName>
    <familyName>Caesar</familyName>
    <role>
      <text>Contributor</text>
      <code>CON</code>
    </role>
  </name>
</fields>

How do we prevent it from all getting bunched up into one <name> node like below, where everything is dumped into one <name> node?

<fields>
  <title>ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know.</title>
  <name>
    <givenName>Zoia</givenName>
    <givenName>Julius</givenName>
    <familyName>Horn</familyName>
    <familyName>Caesar</familyName>
    <role>
      <text>Author</text>
      <code>AUT</code>
    </role>
    <role>
      <text>Contributor</text>
      <code>CON</code>
    </role>
  </name>
</fields>

The answer is to specifically address which node you want to create/read/update by passing its index to the Term. These indexes start at 0 like Arrays in Ruby, Java, C, etc, so fancybook.name(1) will be addressed to the second name entry in the document, not the first.

fancybook.name(1).family_name = "Caesar"
 => "Caesar" 
fancybook.name(1).given_name = "Julius"
 => "Julius" 
fancybook.name(1).role.text = "Contributor"
 => "Contributor" 
fancybook.name(1).role.code = "CON"
 => "CON" 
puts fancybook.to_xml
<?xml version="1.0"?>
<fields>
  <name><givenName>Zoia</givenName><familyName>Horn</familyName><role><text>author</text><code>AUT</code></role></name>
  <name><familyName>Caesar</familyName><givenName>Julius</givenName><role><text>Contributor</text><code>CON</code></role></name>
</fields>

Step 5: How to work with "parent" elements

Say I want to iterate over all of the names in a document and then do something to their child nodes. It is tempting to think that I could use fancybook.name.each do |name| ... but that won't work because when you use a Term to access elements, OM returns the values of the elements rather than the elements themselves. This means that when you call fancybook.name it's returning an array of Strings that are all the child elements' values concatenated together.

fancybook.name
 => ["ZoiaHornauthor", "CaesarJuliusContributor"] 

That's not useful for the kind of task we're trying to handle here. The solution is to iterate over the Nokogiri Nodeset. Then you can use anything from the Nokogiri API to navigate through the nodeset and update nodes.

fancybook.name.nodeset.each {|namenode| puts "Node: "; puts namenode.inspect}
Node: 
#<Nokogiri::XML::Element:0x80a0c2a8 name="name" children=[#<Nokogiri::XML::Text:0x8085fc34 "">, #<Nokogiri::XML::Element:0x809d0690 name="givenName" children=[#<Nokogiri::XML::Text:0x8085f284 "Zoia">]>, #<Nokogiri::XML::Element:0x8043ce7c name="familyName" children=[#<Nokogiri::XML::Text:0x80861df4 "Horn">]>, #<Nokogiri::XML::Element:0x8082799c name="role" children=[#<Nokogiri::XML::Text:0x808663e0 "author">]>]>
Node: 
#<Nokogiri::XML::Element:0x809ec4bc name="name" children=[#<Nokogiri::XML::Text:0x80a81058 "">, #<Nokogiri::XML::Element:0x809f3758 name="familyName" children=[#<Nokogiri::XML::Text:0x80aae530 "Caesar">]>, #<Nokogiri::XML::Element:0x80a06934 name="givenName" children=[#<Nokogiri::XML::Text:0x80aade3c "Julius">]>, #<Nokogiri::XML::Element:0x80a0c7bc name="role" children=[#<Nokogiri::XML::Text:0x80ab2c34 "Contributor">]>]>
 => 0 

Next Step

Go on to Lesson: Make Terms that reference attributes on XML elements or return to the Tame your XML with OM page.