Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Options for describing APIs associated with Datasets #2

Open
nickevansuk opened this issue Nov 2, 2019 · 7 comments
Open

Options for describing APIs associated with Datasets #2

nickevansuk opened this issue Nov 2, 2019 · 7 comments

Comments

@nickevansuk
Copy link
Contributor

nickevansuk commented Nov 2, 2019

Options for describing APIs associated with Datasets

This proposal includes details of several alternative mechanisms for describing Web APIs related the open data sets. The focus of this discussion is on describing the Open Booking API.

schema:Action

Previous design iterations of this specification suggested the use of Action and a way of specifying various endpoints, as below. The idea was that each endpoint would indicate the next endpoint in the flow to call. As the number of specification features increased, it became increasingly apparent that this dynamic approach increased implementation complexity significantly. Additionally in reality implementers were unlikely to blindly follow dynamic URLs provided by API endpoints during each booking operation for security reasons.

To maintain flexibility in the endpoint paths, dynamic URLs must therefore be declared in a discovery document (e.g. at .wellknown, similar to OpenID Connect Discovery 1.0) or within the dataset site using potentialAction.

One dataset site based discovery approach would look as below - which is stretching the Action beyond its intended purpose given the number of endpoints - and essentially creating our own API description language, while there are many good alternatives (such as Open API) that already exist. Note that as this approach is not supported by Google for SEO, we would also likely be required to provide a JSON-LD WebAPI in the markup of the dataset site.

{
  "@type": "Dataset",
  ...
  "potentialAction": {
    "@type": "OpenBookingAction",
    "target": [
      {
        "@type": "EntryPoint",
        "urlTemplate": "https://example.com/api/order-quote-templates/{uuid}",
        "encodingType": ["application/vnd.openactive+jsonld; model=2.0, booking=1.0"],
        "httpMethod": "PUT"
      },
      {
        "@type": "EntryPoint",
        "urlTemplate": "https://example.com/api/order-quotes/{uuid}",
        "encodingType": ["application/vnd.openactive+jsonld; model=2.0, booking=1.0"],
        "httpMethod": "PUT"
      },
      {
        "@type": "EntryPoint",
        "urlTemplate": "https://example.com/api/orders/{uuid}",
        "encodingType": ["application/vnd.openactive+jsonld; model=2.0, booking=1.0"],
        "httpMethod": "PUT"
      },
      {
        "@type": "EntryPoint",
        "urlTemplate": "https://example.com/api/orders/{uuid}",
        "encodingType": ["application/vnd.openactive+jsonld; model=2.0, booking=1.0"],
        "httpMethod": "PATCH"
      },
      {
        "@type": "EntryPoint",
        "urlTemplate": "https://example.com/api/orders/{uuid}",
        "encodingType": ["application/vnd.openactive+jsonld; model=2.0, booking=1.0"],
        "httpMethod": "DELETE"
      },
      ...
    ],
    "supportingData": {
      "@type": "DataFeed",
      "distribution": [
        {
          "@type": "DataDownload",
          "name": "Order",
          "additionalType": "https://schema.org/Order",
          "encodingFormat": ["application/vnd.openactive.booking+json; version=1.0"],
          "contentUrl": "https://example.com/api/feeds/offers"
        }
      ]
    }
  }
}

schema:WebAPI

We have contributed to the current discussion on WebAPI, based on the current WADG0001 WebAPI type extension specification, considering additional suggested amendments applied, and the new Data Catalog Vocabulary (DCAT) - Version 2 - which has suggested bringing WebAPI more inline with DCAT v2. This would effectively make it a superset of the DCAT v2 functionality, which is also useful for simplifying implementation.

The focus here is on a OpenAPI / Swagger definition of the endpoints available at a specific base URL. If this Swagger document was to be maintained centrally by OpenActive, the implementer would be able to vary the base URL, however the names of the endpoints would be fixed. This lends itself much less to a "discovery-based" approach than the Hydra and Action alternatives above.

{
  "@context": "http://schema.org/",
  "@type": "Dataset",
  ...
  "accessService": {
    "@type": "WebAPI",
    "name": "Google Knowledge Graph Search API",
    "description": "The Knowledge Graph Search API lets you find entities in the Google Knowledge Graph. The API uses standard schema.org types and is compliant with the JSON-LD specification.",
    "documentation": "https://developers.google.com/knowledge-graph/",
    "termsOfService": "https://developers.google.com/knowledge-graph/terms",
    "logo": "https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png",
    "license": "https://creativecommons.org/licenses/by/3.0/",
    "provider": {
      "@type": "Organization",
      "name": "Google Inc.",
      "contactPoint": [
        {
          "@type": "ContactPoint",
          "name": "Google",
          "url": "https://google.com"
        }
      ],
    },
    "version": [
      "1.0.0"
    ],
    "endpointURL": [
      {
        "@type": "EntryPoint",
        "url": "https://example.com/api/openbooking/",
        "contentType": "application/json"
      }
    ],
    "apiTransport": "HTTPS",
    "conformsTo": [
      "https://www.openactive.io/open-booking-api/1.0/#core",
      "https://www.openactive.io/open-booking-api/1.0/#attendee-details",
      "https://www.openactive.io/open-booking-api/1.0/#approval-flow"
    ],
    "endpointDescription": [
      {
        "@type": "EntryPoint",
        "contentType": "application/vnd.oai.openapi+json;version=2.0",
        "url": "https://www.openactive.io/open-booking-api/1.0/swagger.json"
      },
    ],
    "potentialAction": [
      {
        "@type": "ConsumeAction",
        "name": "API Client Registration",
        "target": "https://exampleforms.com/get-me-an-api-access-key"
      }
    ]
  }
}

DCAT v2

The recently published Data Catalog Vocabulary (DCAT) - Version 2 appears to cover OpenActive's own discovery requirements comprehensively, and seems to be designed with standards-compliant APIs in mind, though it is missing properties from WebAPI that could still be useful for SEO. We have suggested ways that WebAPI can align more closely to this, to give it the power of both.

This approach is inline with the WebAPI suggestion, having a fixed Swagger definition of the endpoints paths available at a variable base URL. The implementer of a particular standard specification can therefore vary the base URL, however the names of the endpoints are mandatory, unless they were to create a new amended copy of the Swagger document - which would be inefficient and overly complex compared to the Action approaches above.

Using DCAT v2:

  • dcat:accessService ordinarily links the Distribution to the DataService, so we would need to repurpose it here to link the Dataset to the DataService.
  • dct:conformsTo can be used to reference the specific implementation of the specification, and could be used to indicate supported profiles too. The example below includes profiles only, as these also indicate the version of the specification.
  • dct:accessRights describes the terms of access
  • dcat:endpointDescription an be used to contain the Open API / Swagger definition of the API, describing individual endpoints from the specified base URL.
  • dcat:endpointURL describes the "The root location or primary endpoint of the service", which is recommending that a base URL be specified.
  • dcat:landingPage the URL to obtain access to the API via e.g. a web form (note dcat:accessURL is not appropriate for for dct:DataService, as it matches the property-chain dcat:accessService/dcat:endpointURL).

Example

{
  "@type": "Dataset",
  ...
  "dcat:accessService": {
    "@type": "dct:DataService",
    "dct:title": "Open Booking API for Acme Leisure",
    "dct:description": "An API that provides access to make bookings for sessions and facilities, conforming to Open Booking API 1.0.",
    "dcat:landingPage": "https://exampleforms.com/get-me-an-api-access-key",
    "dcat:endpointDescription": "https://www.openactive.io/open-booking-api/1.0/swagger.json",
    "dct:conformsTo": [
      "https://www.openactive.io/open-booking-api/1.0/#core",
      "https://www.openactive.io/open-booking-api/1.0/#attendee-details",
      "https://www.openactive.io/open-booking-api/1.0/#approval-flow"
    ],
    "dcat:endpointURL": "https://example.com/api/openbooking/"
  }
}

Also note that the publisher, keyword and language properties could be duplicated from the Dataset to aid SEO

SEO Note

To ensure we create maximum exposure for Open Booking API implementations in terms of SEO and data catalogues, we will likely want to include JSON-LD WebAPI in the markup, as well as DCAT v2 terms embedded in the HTML using RDFa, as the current Dataset Sites do.

Gaps still to be filled

Note the above does not allow for referencing Dataset / WebAPI certifications, or for stating profiles of opportunity data.

OpenActive Certified / Open Data Certificates

For describing certificates relating to a Dataset, for example "Open Data Certificate" or "OpenActive Certified", could we propose in schema.org that they add schema:Dataset to the domain of schema:hasCredential? Then also propose a rename of schema:educationalLevel to a superseding, more generic pending:credentialLevel?

Example below:

{
  "@context": "http://schema.org/",
  "@type": "Dataset",
  "name" : "British Cycling Let's Ride Sessions",
  "url": "http://data.letsride.co.uk/",
  ...
  "hasCredential": {
    "@type": "EducationalOccupationalCredential",
    "name": "Open Data Certificate - Silver level",
    "description": "Open Data Certificate is a free online tool developed and maintained by the Open Data Institute, to assess and recognise the sustainable publication of quality open data. It assess the legal, practical, technical and social aspects of publishing open data using best practice guidance. This data has achieved Silver level on 15 July 2016 which means extra effort went in to support and encourage feedback from people who use this open data.",
    "url": "https://certificates.theodi.org/en/datasets/214126/certificate",
    "dateCreated": "2016-07-04",
    "dateModified": "2016-07-15",
    "validFor": "P5Y",
    "credentialLevel": {
      "@type": "DefinedTerm",
      "name": "Silver level",
      "description": "This data has achieved Silver level which means extra effort went in to support and encourage feedback from people who use this open data.",
      "termCode": "SILVER"
    },
    "credentialCategory": {
      "@type": "DefinedTerm",
      "name": "Open Data Certificate",
      "description": "Open Data Certificate is a free online tool developed and maintained by the Open Data Institute, to assess and recognise the sustainable publication of quality open data. It assess the legal, practical, technical and social aspects of publishing open data using best practice guidance.",
      "termCode": "ODI-CERT"
    },
    "recognizedBy": {
      "@type": "Organization",
      "name": "Open Data Institute",
      "url": "https://theodi.org/"
    }
  }
}

Would we also need schema:hasCredential to apply to the WebAPI? Do we have separate certification for discovery (i.e data quality) and booking (i.e. capability)?

Data quality profiles for datasets

Could we use the same DCAT approach for Dataset to include dct:conformsTo? So we recommend to schema.org that they include schema:Dataset in the domain of pending:conformsTo (assuming they accept it for WebAPI?).

{
  "@context": "http://schema.org/",
  "@type": "Dataset",
  ...
  "conformsTo": [
    "https://www.openactive.io/modelling-opportunity-data/2.0/#core-sessions"
  ]
}

Questions

Thoughts very welcome on the following:

Discovery vs defined endpoints with a base URL

The initial thoughts gathered from the OpenActive W3C Community Group were that allowing flexibility for the name of each endpoint within a Web API to vary per-implementation, and for such endpoints to be discoverable, would be good practice and avoid being overly prescriptive for implementers. This follows patterns such as OpenID Connect Discovery 1.0 and OAuth 2.0 Authorization Server Metadata, that allow each endpoint URL to be fully configurable.

Implementation experience has shown that in practice for the Open Booking API such flexibility is both (i) not desired by booking system implementers, who often request a recommended naming approach; and (ii) complicates tooling and client design as an extra level of indirection and flexibility needs to exist in the data contract.

Prescribed endpoint paths would mean that we can use schema:WebAPI and dcat:DataService with e.g. a standard Open API / Swagger document, rather than discovery documents that needs to be interrogated and a client configured (automatically or manually) for each booking system a broker connects to.

For booking system providers: does anyone have an objection to Open Booking 1.0 using prescribed endpoint paths within a base URL, to simplify implementation?

Beta implementation

Given the current inconsistent state of the extensions to schema:WebAPI, and the additional work required for both implementers and tooling to allow for varying endpoint URLs, the simplest approach for early implementations of the Open Booking API appears to be to use the more developed DCAT v2 approach, using terms from the WebAPI where they have already been promoted to pending.schema.org. Hence this proposal recommends the following be used for beta implementations in the first instance:

  "@context": "http://schema.org/",
  "@type": "Dataset",
  ...
  "accessService": {
    "@type": "WebAPI",
    "name": "Google Knowledge Graph Search API",
    "description": "The Knowledge Graph Search API lets you find entities in the Google Knowledge Graph. The API uses standard schema.org types and is compliant with the JSON-LD specification.",
    "documentation": "https://developers.google.com/knowledge-graph/",
    "termsOfService": "https://developers.google.com/knowledge-graph/terms",
    "endpointURL": "https://example.com/api/openbooking/",
    "conformsTo": [
      "https://www.openactive.io/open-booking-api/1.0/#core-sessions",
      "https://www.openactive.io/open-booking-api/1.0/#core-facilities",
      "https://www.openactive.io/open-booking-api/1.0/#core-courses",
      "https://www.openactive.io/open-booking-api/1.0/#attendee-details",
      "https://www.openactive.io/open-booking-api/1.0/#approval-flow"
    ],
    "endpointDescription": "https://www.openactive.io/open-booking-api/1.0/swagger.json",
    "landingPage": "https://exampleforms.com/get-me-an-api-access-key",
  }
}

This metadata can easily be expanded into the proposed full schema:WebAPI once agreed, by updating the library that does the transformation - without any additional work required by implementers.

@nickevansuk
Copy link
Contributor Author

nickevansuk commented Nov 2, 2019

Question for booking system developers: does anyone have any objection to using prescribed endpoint paths within a base URL for Open Booking 1.0, to simplify implementation?

So e.g. you would have the ability to customise the first part, but not the second part:
https://example.com/api/openbooking/offers/{uuid}

This approach, as well as being inline with schema.org, would greatly simplify API discovery and implementation, and has been applied in CR2 of the Open Booking API.

@nickevansuk
Copy link
Contributor Author

nickevansuk commented Jul 6, 2020

To evolve the above proposal based on further developments, both in conformance certification, and in WebAPI discussion:

  • The base URL is generally the preferred approach, and is consistent with WebAPI discussions.
  • schema:hasCredential can be used to link the Dataset to the ConformanceCertificate that can like to either a centrally hosted or self-hosted certificate (noting that the certification covers the whole dataset, which includes the accessService, rather than just the API), while also allowing other certificates may be included via EducationalOccupationalCredential (as the "Open Data Certificate" example above illustrates)
  • conformsTo should list only the specification versions to which the API conforms (we already have schemaVersion that covers the model version for the Dataset), and ConformanceCertificate can cover the specific features that are implemented
  • The case of endpointURL should be updated to match the conventions of schema.org (endpointUrl) rather than that of DCAT v2.
{
  "@context": "http://schema.org/",
  "@type": "Dataset",
  ...
  "hasCredential": "https://openactive.io/openactive-test-suite/example-output/controlled/certification/",
  "accessService": {
    "@type": "WebAPI",
    "name": "Google Knowledge Graph Search API",
    "description": "The Knowledge Graph Search API lets you find entities in the Google Knowledge Graph. The API uses standard schema.org types and is compliant with the JSON-LD specification.",
    "documentation": "https://developers.google.com/knowledge-graph/",
    "termsOfService": "https://developers.google.com/knowledge-graph/terms",
    "endpointUrl": "https://example.com/api/openbooking/",
    "conformsTo": [
      "https://www.openactive.io/open-booking-api/1.0/"
    ],
    "endpointDescription": "https://www.openactive.io/open-booking-api/1.0/swagger.json",
    "landingPage": "https://exampleforms.com/get-me-an-api-access-key",
  }
}

@nickevansuk
Copy link
Contributor Author

nickevansuk commented Jul 6, 2020

Additionally regarding Data Quality Profile for datasets, the final gap referenced above, there is an important distinction to make:

  • The spec conformance certification, which is about the implementing system’s capabilities, can be controlled and assured by that system, and is likely to change only with a new version release of the system.
  • The Data Quality Profile is about the data within the system i.e. what the customers of the system have used it for. This cannot be assured or controlled by the system, and is likely to change dynamically over time.

Hence as the Data Quality Profile is derived from the live data, it should not be part of the dataset metadata, but rather a tool that consumes the dataset - referenced from the documentation.

The potentially puts it out-of-scope of the Dataset API Discovery spec

@nickevansuk
Copy link
Contributor Author

nickevansuk commented Jul 6, 2020

Alternatively it might be preferable to have hasCredential specified on the BookingService, as it's the product that is certified rather than the specific Dataset.

This would force custom-built systems to include a bookingService, which might duplicate the information from the publisher property, however this would also make the name of the custom-built system explicit, and make the modelling of the "system" vs. the "publisher" consistent.

In a similar way, even if e.g.dateCreated and dateModified are the same timestamp, it is still correct to model them as data duplicated across both.

This also makes the proposed oa:awardedTo the inverse property of schema:hasCredential.

{
  "@context": "http://schema.org/",
  "@type": "Dataset",
  ...
  "bookingService": {
    "@type": "BookingService",
    "name": "AcmeBooker",
    "softwareVersion": "2.0",
    "url": "https://acmebooker.example.com/",
    "hasCredential": "https://openactive.io/openactive-test-suite/example-output/controlled/certification/"
  },
  "accessService": {
    "@type": "WebAPI",
    "name": "Google Knowledge Graph Search API",
    "description": "The Knowledge Graph Search API lets you find entities in the Google Knowledge Graph. The API uses standard schema.org types and is compliant with the JSON-LD specification.",
    "documentation": "https://developers.google.com/knowledge-graph/",
    "termsOfService": "https://developers.google.com/knowledge-graph/terms",
    "endpointUrl": "https://example.com/api/openbooking/",
    "conformsTo": [
      "https://www.openactive.io/open-booking-api/1.0/"
    ],
    "endpointDescription": "https://www.openactive.io/open-booking-api/1.0/swagger.json",
    "landingPage": "https://exampleforms.com/get-me-an-api-access-key",
  }
}

@nickevansuk
Copy link
Contributor Author

Also note the response here: webapi-discovery/rfcs#11, which lends more support to the base URI approach referenced in #2 (comment)

@nickevansuk
Copy link
Contributor Author

nickevansuk commented Oct 13, 2020

Furthermore, in favour of the base URI (instead of URI discovery of every endpoint) note the recently released RFC8820 - June 2020, which replaces RFC7320 - July 2014 by the same author, and is updated as follows:

... specifications MUST NOT constrain, or define the structure or the semantics for any path component.

is replaced with

Specifications MUST NOT define a fixed prefix for their URI paths -- for example, "/myapp" -- unless allowed by the scheme definition.

Note that this does not apply to Applications defining a structure of a URI's path "under" a resource controlled by the server. Because the prefix is under control of the party deploying the Application, collisions and rigidity are avoided, and the risk of erroneous client assumptions is reduced.
For example, an Application might define "app_root" as a deployment-controlled URI prefix. Application-defined resources might then be assumed to be present at "{app_root}/foo" and "{app_root}/bar".

This explicitly supports the use of Base URIs when defining APIs, which fits with the approach of OpenAPI 3.

@thill-odi
Copy link
Contributor

Current approach is to offload this into endpointDescription.url, which should point to Swagger documents or similar. For discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants