Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Content Semantics for a Massively Distributed API
PMP is designed to aggregate content from hundreds of publishers, running dozens of very different content management systems and push content to a large variety of target platforms (web, mobile etc.). This is a challenging task the complexity of which may not be underestimated.
Reaching uniform agreement on standards, in a massively distributed eco-system is hard. For PMP, it is also counter-productive since every publisher has requirements specific to it and edge-cases that are important to the publisher, but not necessarily to another publisher. An attempt to solve the distribution problem using a highly-centralized, consensus-based approach will hit what Jeff Eaton calls a Platypus Problem: inexplicable, emergent complexity.
PMP avoided the creation of a new “monster” standard by instead adopting following principles:
The goal was in no way to come up with rigid, centrally-standardized content type definitions that all publishers would have to use. Rather, a framework was created to allow each participant to tune their level of participation in any standard, and independently launch new standards when necessary.
Framework implemented by PMP highly encourages local innovation, through decentralization. It promotes building of vibrant communities that prefer collaboration around similar.
Profiles: Extension Points.
Collection.doc+JSON is flexible enough to accommodate most news content types, it is clearly too loosely defined to be sufficient in and of itself. Any specific application will need something that allows tailoring of Collection.doc+JSON with additional semantics. In the Hypermedia world, the standard that allows such tailoring is: Profile link relation type
Profiles and Media Types
A media type defines both the semantics and the serialization of a specific type of content. In many cases, media types have some extensibility or openness built-in, so that specific instances of the media type can layer additional semantics on top of the media type's foundations. In this case, a profile is the appropriate mechanism to signal that the original semantics and processing model of the media type still applies, but that an additional processing model can be used to extract additional semantics. This is in contrast to a new media type, which instead of just adding processing rules and semantics, in most cases defines a complete set of processing rules and semantics.
As an example, XHTML is not a profile of XML but a new media type because it introduces a complete new perspective of the underlying XML structures, and from the XHTML point of view, exposing the raw XML is not all that useful for clients. However, hCard (see Section 5.1) is a profile of (X)HTML because it adds processing rules that allow a client to extract additional semantics from a representation, without changing any of the processing rules and semantics of (X)HTML itself.
While the line between a media type and a profile might not always be easy to draw, the intention of profiles is not to replace media types, but to add a more lightweight and runtime-capable mechanism that allows servers and clients to be more explicit in how a specific instance of a media type represents concepts that are not defined by the media type itself, but by additional conventions (the profile processing rules and semantics).
A profile link defines additional semantics for a message body of a Collection Document and uniquely identifies the sub-type of a document.
Collection.Doc is a generic media type intended to standardize solutions for common requirements in content web APIs. Most applications will however require to use various document sub-types that define additional document-type-specific semantics.
The href attribute of a profile link MUST be a valid URI that uniquely identifies the sub-type of a document. The URI SHOULD be dereferenceable and SHOULD point to a document explaining additional semantics.
Depending on the application needs, the dereferenceable URI of the profile link may point to any web document that can serve as a human-centric or machine-centric documentation. Examples include: another Collection Document document, an ALPS document or even a PDF.
Profiles MAY be made inheritable using the "extends" link relation allowing re-use of profile definitions and collaboration around profile definitions. Profiles are used solely via linking. There is no requirement for a central registry of profiles. An index of profiles can be created for discoverability, but innovation around profiles is intentionally decentralized.
Recursive Media Type
Collection.doc+JSON media type is a recursive document: items that are part of the main document are fully qualified documents themselves. This allows PMP
to treat 'document assets' as top-level documents. Specifically: audio,
video and image assets in a story document are themselves documents, as well.
Such uniform approach allows for a great deal of flexibility and, among
other things, is great for implementing a flexible content/assets rights management
PMP ships with a set of Profile definitions for some baseline document types (content types): Story, Media etc.
Enabling Local Innovation
While the baseline profiles defined by PMP will hopefully be useful in a large number of scenarios, and will serve as good references for defining future profiles, we clearly have no intention imagining that it will satisfy 100% of needs for all publishers in a large network such as PMP.
Here's the winner: any publisher or a group of publishers in PMP can define and immediately start using new Profile (document type) definitions, completely independently and without any centralized or bureaucratic decision-making process!
Profile definitions in PMP are 100% self-service, and the only rule is: "Don't do evil". With a lot of flexibility comes a lot of responsibility. Profile definitions are made into a fully self-service process to foster local innovation, not--to create chaos. Publishers are highly encouraged to not re-invent the wheel and to try create profiles that can serve a large group of publishers. Greater the number of publishers who use a specific profile, and API clients that can understand the profile--more useful a profile is for everybody.
IMPORTANT: When defining a new profile, publishers are strongly encouraged to extend the Base Content Profile. Extending the Base Content Profile is the easiest way to comply with some light system-wide conventions and it is intentionally generic enough that the fact of extending should not limit business needs of a publisher. API Client's reference implementation is guaranteed to understand the semantics of the Base Content Profile. The same is true for any certified API client and the PMP Search, so by extending the base profile even brand-new profiles will have properties that any compatible API client can recognize and they will be immediately search-compatible.
What Makes a Profile?
A profile can be described as additional semantics that can be used to process a resource representation, such as constraints, conventions, extensions, or any other aspects that do not alter the basic media type semantics.
The additional semantics can be:
- Custom properties in the
- Additional/custom link relation types in the
In PMP and Collection.doc+JSON, a proper profile definition is a document of
Per IETF spec, a profile's href attribute, does not have to be a dereferencable link. According to the Profile spec:
the target URI does not necessarily have to identify a dereferencable resource (or even use a dereferencable URI scheme), and clients can treat the occurrence of a specific URI in the same way as an XML namespace URI and invoke specific behavior based on the assumption that a specific profile target URI signals that a resource representation follows a specific profile. ~ http://tools.ietf.org/html/draft-wilde-profile-link-04#section-1
Collection.doc+JSON media type hardens the requirement and requires profile's href attribute to point to a dereferenceable URL that returns a Collection.doc+JSON document which has profile='profile'. To learn more about Profile definitions, you can reference Profile profile wiki page
PMP makes use of schemas to aid in validating published content. The JSON Schema specification is the accepted standard for schemas.
Schemas are just like any other Collection.doc+JSON document, but with a "schema" attribute that holds the JSON Schema object. To learn more about how to define schemas for PMP, reference the [Schema profile wiki page]((https://github.com/publicmediaplatform/pmpdocs/wiki/Schema-profile).