Skip to content

Machine Readable Serialization With Atom

jrochkind edited this page Feb 1, 2013 · 26 revisions

We provide a standard way to serialize a BentoSearch::Results set of BentoSearch::ResultItems to a machine-readable serialization based on the Atom format.

The atom response isn't just for 'atom feed reader' software, although it can be used that way. But it's been enhanced with elements from other vocabularies, to serve as a machine-readable serialization that expresses nearly every part of a BentoSearch::Item. The Atom-based serialization can be used to provide an API response for bento_search powered results from your app.

You can serialize any BentoSearch::Results set to enhanced Atom using the bento_search/atom_results.atom.builder view template. Atom requires a feed name and author, so you need to provide, or some rather dumb defaults will be used. You might typically add an atom response format to a rails controller action that's already providing an HTML results page, like so:

# ...
respond_to do |format|
   format.html # default view
   format.atom do
      render( :template => "bento_search/atom_results",              
              :locals   => {
                 :atom_results     => @results,
                 :feed_name        => "Acme results",
                 :feed_author_name => "MyCorp"
              }      
      ) 
end

The resulting serialization is Atom, although it has some idiosyncracies. It's also been enhanced with some opensearch elements with metadata about the result set itself. EAch individual atom:entry representing an individual item has been enhanced with elements from the prism, dcterms, and bibo (just one at the moment!) vocabularies/namespaces, as well as elements from atom itself. We attempt to completely express everything modeled in a BentoSearch::ResultItem.

Experience shows that even if one tried to stick to the letter of the specs of those namespaces, the consumer would still need to discover and code for application-specific choices and idiosyncracies. With this in mind, we have also sometimes been willing to violate the letter or spirit of some of the related vocabularies/specs, when complying completely would have been too expensive without clear benefit to clients, or even counter-productive to clients. So it goes.

Here is a sample serialized enhanced atom result.

The developer of a consumer software is also encouraged to check the source at ./app/views/bento_search/atom_results.atom.builder and ./app/views/bento_search/_atom_item.atom.builder as the obviously last word on implementation details.

Notes/Docs

Here are some notes to be aware of when using the Atom-based serialization as fully expressive machine-readable results:

  • There is limited feed-level metadata provided via opensearch namespaced elements, principally opensearch:totalItems.

    • There is other feed-level metadata which can usefully be provided in atom/opensearch/etc, but there was no good way to do it in bento_search without knowing the details of the host app. If you'd like, you can write your own rails view template replacing bento_search/atom_results providing more complete feed-level metadata, but still re-using the bento_search/atom_item partial for rendering the individual item/entries. For instance, atom:link elements with rel: self, alternate, prev, next, first, last, search (to an opensearch description). Or opensearch:Query
    • an atom:id attribute is required by atom at the feed-level -- the built-in implementation just use the current application request URL that is delivering the atom results.
  • The updated element is required by atom at both feed and entry, but bento_search (and it's target search engines) don't really track updated_at, so it will always be filled out to the current timestamp, sorry.

  • Unique atom:id elements, with URI values, are required by the Atom standard. bento_search will try to construct a basic opaque non-resolvable unique identifier URI by default, using engine_id, entry unique_id, and application base URL. In some cases it won't be able to do so, and will leave out <id>, violating the standard. If you'd like to take control of this to ensure an is present to your specifications, including resolvability in your app, simply configure a decorator for the bento_search engine, which provides a custom uri_identifier method.

  • serial publication info is present using the prism vocabulary. prism:coverDate using yyyy-mm-dd format (or just yyyy), as well as prism:volume, prism:number, prism:startingPage, prism:endingPage, prism:issn, prism:isbn.

    • prism:doi is a bare DOI without any kind of URI encapsulation, such as 10.1109/MIC.2005.74 (the prism standard is a bit confusing on this; actually doi might technically not be part of the prism version who's namespace we're using, sorry.).
  • almost every entry-level element is optional in the ResultItem model, so may or may not be present in the serialization.

  • The atom:summary can be plain text or html, marked as specified in the atom spec with attribute type="text" or html. If html, it may include <b class="bento_search_highlight"> tags demarcating search-in-context highlighting. (highlighting of title not currently available in atom response. do you need it?)

  • For some dcterms elements we provide a custom 'vocabulary' attribute specifying the vocabulary. For instance, one or more dcterms:type or dcterms:language elements may be provided, specifying values according to different vocabularies, where available. An element with no vocabulary attribute is generally an uncontrolled label suitable for presenting directly to the user (usually in English). The dcterms:type vocabulary='http://purl.org/NET/bento_search/ontology' element is bento_search's internal vocabulary. *

    <dcterms:type vocabulary="http://schema.org/">http://schema.org/Article</dcterms:type>
    <dcterms:type>Journal Article</dcterms:type>
    <dcterms:type vocabulary="http://purl.org/NET/bento_search/ontology">Article</dcterms:type>
    <dcterms:language vocabulary="http://dbpedia.org/resource/ISO_639-1">es</dcterms:language>
    <dcterms:language vocabulary="http://dbpedia.org/resource/ISO_639-3">spa</dcterms:language>
    <dcterms:language>Spanish</dcterms:language>
  • the dcterms:type element with no vocabulary attribute is often a label passed directly from the underlying search service.

  • The only element we're currently using from bibo is bibo:oclcnum . Most of our search engines provide oclcnums rarely, but if one is present, that's where it will be serialized.

  • ResultItem#link (main link, only URL in model) and ResultItem#other_links are included if present (as added/modified by your local decorators).

    • if a main #link is present, it'll be included as atom:link rel="alternate". If it's not present, there will be no atom:link rel="alternate", violating the Atom standard in some cases.
    • #other_links by default are rel=related, but can specify their own rel, their own content type, and will have title filled out with their BentoSearch::Link#label.