Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
tree: a8a6f2ed3e
Fetching contributors…

Cannot retrieve contributors at this time

executable file 407 lines (307 sloc) 18.247 kb

RDF::RDFa reader/writer

RDFa parser for RDF.rb.

DESCRIPTION

RDF::RDFa is an RDFa reader and writer for Ruby using the RDF.rb library suite.

FEATURES

RDF::RDFa parses RDFa into statements or triples.

  • Fully compliant RDFa 1.1 parser.
  • Template-based Writer to generate XHTML+RDFa.
    • Writer uses user-replacable Haml-based templates to generate RDFa.
  • If available, Uses Nokogiri for parsing HTML/SVG, falls back to REXML otherwise (and for JRuby)
  • RDFa tests use SPARQL for most tests due to Rasqal limitations. Other tests compare directly against N-triples.

Install with 'gem install rdf-rdfa'

Important changes from previous versions

RDFa is an evolving standard, undergoing some substantial recent changes partly due to perceived competition with Microdata. As a result, the RDF Webapps working group is currently looking at changes in the processing model for RDFa. These changes are now being tracked in {RDF::RDFa::Reader}:

RDFa 1.1 Lite

This version fully supports the limited syntax of RDFa Lite 1.1. This includes the ability to use @property exclusively.

Remove RDFa Profiles

RDFa Profiles were a mechanism added to allow groups of terms and prefixes to be defined in an external resource and loaded to affect the processing of an RDFa document. This introduced a problem for some implementations needing to perform a cross-origin GET in order to retrieve the profiles. The working group elected to drop support for user-defined RDFa Profiles (the default profiles defined by RDFa Core and host languages still apply) and replace it with an inference regime using vocabularies. Parsing of @profile has been removed from this version.

Vocabulary Expansion

One of the issues with vocabularies was that they discourage re-use of existing vocabularies when terms from several vocabularies are used at the same time. As it is common (encouraged) for RDF vocabularies to form sub-class and/or sub-property relationships with well defined vocabularies, the RDFa vocabulary expansion mechanism takes advantage of this.

As an optional part of RDFa processing, an RDFa processor will perform limited OWL 2 RL Profile entailment, specifically rules prp-eqp1, prp-eqp2, cax-sco, cax-eqc1, and cax-eqc2. This causes sub-classes and sub-properties of type and property IRIs to be added to the output graph.

{RDF::RDFa::Reader} implements this using the #expand method, which looks for rdfa:usesVocabulary properties within the output graph and performs such expansion. See an example in the usage section.

RDF Collections (lists)

One significant RDF feature missing from RDFa was support for ordered collections, or lists. RDF supports this with special properties rdf:first, rdf:rest, and rdf:nil, but other RDF languages have first-class support for this concept. For example, in Turtle, a list can be defined as follows:

[ a schema:MusicPlayList;
  schema:name "Classic Rock Playlist";
  schema:numTracks 5;
  schema:tracks (
    [ a schema:MusicRecording; schema:name "Sweet Home Alabama";       schema:byArtist "Lynard Skynard"]
    [ a schema:MusicRecording; schema:name "Shook you all Night Long"; schema:byArtist "AC/DC"]
    [ a schema:MusicRecording; schema:name "Sharp Dressed Man";        schema:byArtist "ZZ Top"]
    [ a schema:MusicRecording; schema:name "Old Time Rock and Roll";   schema:byArtist "Bob Seger"]
    [ a schema:MusicRecording; schema:name "Hurt So Good";             schema:byArtist "John Cougar"]
  )
]

defines a playlist with an ordered set of tracks. RDFa adds the @inlist attribute, which is used to identify values (object or literal) that are to be placed in a list. The same playlist might be defined in RDFa as follows:

<div vocab="http://schema.org/" typeof="MusicPlaylist">
  <span property="name">Classic Rock Playlist</span>
  <meta property="numTracks" content="5"/>

  <div rel="tracks" inlist="">
    <div typeof="MusicRecording">
      1.<span property="name">Sweet Home Alabama</span> -
      <span property="byArtist">Lynard Skynard</span>
     </div>

    <div typeof="MusicRecording">
      2.<span property="name">Shook you all Night Long</span> -
      <span property="byArtist">AC/DC</span>
    </div>

    <div typeof="MusicRecording">
      3.<span property="name">Sharp Dressed Man</span> -
      <span property="byArtist">ZZ Top</span>
    </div>

    <div typeof="MusicRecording">
      4.<span property="name">Old Time Rock and Roll</span>
      <span property="byArtist">Bob Seger</span>
    </div>

    <div typeof="MusicRecording">
      5.<span property="name">Hurt So Good</span>
      <span property="byArtist">John Cougar</span>
    </div>
  </div>
</div>

This basically does the same thing, but places each track in an rdf:List in the defined order.

Property relations

The @property attribute has been updated to allow for creating URI references as well as object literals.

  1. If an element contains @property but no @rel, @datatype or @content and it contains a resource attribute (such as @href, @src, or @resource)
    1. Generate an IRI object. Furthermore, sub-elements do not chain, i.e., the subject in effect when the @property is processed is also in effect for sub-elements.
    2. Otherwise, generate a literal as before.

For example:

<a vocab="http://schema.org" property="url" href="http://example.com">
  <span property="title">NBA Eastern Conference ...</span>
</a>

results in

<> schema:url <http://example.com>;
   schema:title "NBA Eastern Conference".

Magnetic @about/@typeof

The @typeof attribute has changed; previously, it always created a new subject, either using a resource from @about, @resource and so forth. This has long been a source of errors for people using RDFa. The new rules cause @typeof to bind to a subject if used with @about, otherwise, to an object, if either used alone, or in combination with some other resource attribute (such as @href, @src or @resource).

For example:

Gregg Kellogg Manu Sporny

results in

<http://greggkellogg.net/foaf#me> a foaf:Person;
  foaf:name "Gregg Kellogg";
  foaf:knows <http://manu.sporny.org/#this> .
<http://manu.sporny.org/#this> a foaf:Person;
  foaf:name "Manu Sporny" .

Note that if the explicit @href is not present, i.e.,

<div typeof="foaf:Person" about="http://greggkellogg.net/foaf#me">
  <p property="name">Gregg Kellogg</span>
  <a href="knows" typeof="foaf:Person">
    <span property="name">Manu Sporny</span>
  </a>
</div>

this results in

<http://greggkellogg.net/foaf#me> a foaf:Person;
  foaf:name "Gregg Kellogg";
  foaf:knows [ 
        a foaf:Person;
        foaf:name "Manu Sporny" 
  ].

Property chaining

If used without @rel, but with @typeof and a resource attribute, @property will cause chaining to another object just like @rel. The effect of this and other changes is to allow pretty much all RDFa to be marked up using just @property; @rel/@rev is no longer required. Although, @rel and @rev have useful features that @property does not, so it's worth keeping them in your toolkit.

Support for HTML5 time element

The time element allows the creation of a datatyped-literal based on the lexical scope of either the @datetime attribute, or the element content. We parse it according to xsd:date, xsd:time, xsd:dateTime, xsd:gYear, xsd:gYearMonth, and xsd:duration. If it matches none of these, a plain literal is emitted.

The time element is described in the WHATWG version of the HTML5 spec. This is related to RDFa ISSUE-97.

Support for HTML5 data element

This is an alternate way of adding data using the @value property. Similar to meta

The data element is described in the WHATWG version of the HTML5 spec. This is related to RDFa ISSUE-113

Usage

Reading RDF data in the RDFa format

graph = RDF::Graph.load("etc/doap.html", :format => :rdfa)

Reading RDF data with vocabulary expansion

graph = RDF::Graph.load("etc/doap.html", :format => :rdfa, :vocab_expansion => true)

or

graph = RDF::RDFa::Reader.open("etc/doap.html").expand

Reading Processor Graph

graph = RDF::Graph.load("etc/doap.html", :format => :rdfa, :rdfagraph => :processor)

Reading Both Processor and Output Graphs

graph = RDF::Graph.load("etc/doap.html", :format => :rdfa, :rdfagraph => [:output, :processor])

Writing RDF data using the XHTML+RDFa format

require 'rdf/rdfa'

RDF::RDFa::Writer.open("etc/doap.html") do |writer|
  writer << graph
end

Note that prefixes may be chained between Reader and Writer, so that the Writer will use the same prefix definitions found during parsing:

prefixes = {}
graph = RDF::Graph.load("etc/doap.html", :prefixes => prefixes)
puts graph.dump(:rdfa, :prefixes => prefixes)

Template-based Writer

The RDFa writer uses Haml templates for code generation. This allows fully customizable RDFa output in a variety of host languages. The default template generates human readable HTML5 output. A minimal template generates HTML, which is not intended for human consumption.

To specify an alternative Haml template, consider the following:

require 'rdf/rdfa'

RDF::RDFa::Writer.buffer(:haml => RDF::RDFa::Writer::MIN_HAML) << graph

The template hash defines four Haml templates:

  • doc: Document Template, takes an ordered list of _subject_s and yields each one to be rendered. From {RDF::RDFa::Writer#render_document}:

    {include:RDF::RDFa::Writer#render_document}

    This template takes locals lang, prefix, base, title in addition to subjects to create output similar to the following:

    <!DOCTYPE html>
    <html prefix='xhv: http://www.w3.org/1999/xhtml/vocab#' xmlns='http://www.w3.org/1999/xhtml'>
      <head>
        <base href="http://example/">
        <title>Document Title</title>
      </head>
      <body>
        ...
      </body>
    </html>
    

    Options passed to the Writer are used to supply lang and base locals. prefix is generated based upon prefixes found from the default profiles, as well as those provided by a previous Reader. title is taken from the first top-level subject having an appropriate title property (as defined by the heading_predicates option).

  • subject: Subject Template, take a subject and an ordered list of predicate_s and yields each _predicate to be rendered. From {RDF::RDFa::Writer#render_subject}:

    {include:RDF::RDFa::Writer#render_subject}

    The template takes locals rel and typeof in addition to predicates and subject to create output similar to the following:

    <div resource="http://example/">
      ...
    </div>
    

    Note that if typeof is defined, in this template, it will generate a textual description.

  • property_value: Property Value Template, used for predicates having a single value; takes a predicate, and a single-valued Array of objects. From {RDF::RDFa::Writer#render_property}:

    {include:RDF::RDFa::Writer#render_property}

    In addition to predicate and objects, the template takes inlist to indicate that the property is part of an rdf:List.

    Also, if the predicate is identified as a heading predicate (via :heading_predicates option), it will generate a heading element, and may use the value as the document title.

    Each object is yielded to the calling block, and the result is rendered, unless nil. Otherwise, rendering depends on the type of object. This is useful for recursive document descriptions.

    Creates output similar to the following:

    <div class='property'>
      <span class='label'>
        xhv:alternate
      </span>
      <a property='xhv:alternate' href='http://rdfa.info/feed/'>http://rdfa.info/feed/</a>
    </div>
    

    Note the use of methods defined in {RDF::RDFa::Writer} useful in rendering the output.

  • property_values: Similar to property_value, but for predicates having more than one value. Locals are identical to property_values, but objects is expected to have more than one value. Described further in {RDF::RDFa::Writer#render_property}.

    In this case, and unordered list is used for output. Creates output similar to the following:

    <div class='property'>
      <span class='label'>
        xhv:bookmark
      </span>
      <ul rel='xhv:bookmark'>
        <li>
          <a href='http://rdfa.info/2009/12/12/oreilly-catalog-uses-rdfa/'>
            http://rdfa.info/2009/12/12/oreilly-catalog-uses-rdfa/
          </a>
        </li>
          <a href='http://rdfa.info/2010/05/31/new-rdfa-checker/'>
            http://rdfa.info/2010/05/31/new-rdfa-checker/
          </a>
        </li>
      </ul>
    </div>
    

    If property_values does not exist, repeated values will be replecated using property_value.

  • Type-specific templates. To simplify generation of different output types, the template may contain a elements indexed by a URI. When a subject with an rdf:type matching that URI is found, subsequent Haml definitions will be taken from the associated Hash. For example:

    { :document => "...", :subject => "...", :property_value => "...", :property_values => "...", RDF::URI("http://schema.org/Person") => { :subject => "...", :property_value => "...", :property_values => "...", } }

Dependencies

Documentation

Full documentation available on Rubydoc.info

Principle Classes

  • {RDF::RDFa::Format}
    • {RDF::RDFa::HTML} Asserts :html format, text/html mime-type and .html file extension.
    • {RDF::RDFa::XHTML} Asserts :html format, application/xhtml+xml mime-type and .xhtml file extension.
    • {RDF::RDFa::SVG} Asserts :svg format, image/svg+xml mime-type and .svg file extension.
  • {RDF::RDFa::Reader}
    • {RDF::RDFa::Reader::Nokogiri}
    • {RDF::RDFa::Reader::REXML}
  • {RDF::RDFa::Profile}
  • {RDF::RDFa::Writer}

Additional vocabularies

  • {RDF::PTR}
  • {RDF::RDFA}
  • {RDF::XHV}
  • {RDF::XML}
  • {RDF::XSI}

TODO

  • Add support for LibXML and REXML bindings, and use the best available
  • Consider a SAX-based parser for improved performance

Resources

Author

Contributors

Contributing

  • Do your best to adhere to the existing coding conventions and idioms.
  • Don't use hard tabs, and don't leave trailing whitespace on any line.
  • Do document every method you add using YARD annotations. Read the tutorial or just look at the existing code for examples.
  • Don't touch the .gemspec, VERSION or AUTHORS files. If you need to change them, do so on your private branch only.
  • Do feel free to add yourself to the CREDITS file and the corresponding list in the the README. Alphabetical order applies.
  • Do note that in order for us to merge any non-trivial changes (as a rule of thumb, additions larger than about 15 lines of code), we need an explicit public domain dedication on record from you.

License

This is free and unencumbered public domain software. For more information, see http://unlicense.org/ or the accompanying {file:UNLICENSE} file.

FEEDBACK

Something went wrong with that request. Please try again.