Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
3 contributors

Users who have contributed to this file

@MKonopko @davidbernick @cdvoisin
542 lines (411 sloc) 25.7 KB

GA4GH Authentication and Authorization Infrastructure (AAI) OpenID Connect Profile (DRAFT RFC)


Version Date Editor Notes
0.91 2017- Craig Voisin Added terminology links
0.9 2017- Mikael Linden, Craig Voisin, David Bernick Initial working version

Abstract

This specification profiles the OpenID Connect protocol to provide a federated (multilateral) authentication and authorisation infrastructure for greater interoperability between Genomics institutions in a manner specifically applicable to (but not limited to) the sharing of restricted datasets.

In particular, this specification introduces a JWT syntax for an access token to enable an OIDC provider (called an OIDC broker) to embed some key claims to the access token and to enable a downstream access token consumer (called a Claims Clearinghouse) to locate the OIDC broker’s userinfo endpoint for requesting the rest of the claims. This specification is suggested to be used together with others that specify the syntax and semantics of the claims exchanged.

Table of Contents

Introduction

This document profiles using OpenID Connect (OIDC) Servers for use in authenticating the identity of researchers desiring to access clinical and genomic resources from data holders adhering to GA4GH standards, and to enable data holders to obtain security-related attributes of those researchers. This is intended to be endorsed as a GA4GH standard, implemented by GA4GH Driver Projects, and shared broadly.

To help assure the authenticity of identities used to access data from GA4GH Driver Projects, and other projects that adopt GA4GH standards, the Data Use and Researcher Identity (DURI) Work Stream is in the process of developing a standard of “claims”. This standard assumes that some claims provided by brokers described in this document will conform to the DURI researcher-identity policy and standard. This standard does NOT assume that GA4GH Claims will be the only ones used.

In this standard, we aim at developing an approach that enables data holders’ systems to recognize and accept identities from multiple brokers -- allowing for a federated approach. An organization can still use this spec and not support multiple brokers, though they will find in that case that it’s just using a proscriptive version of OIDC.

Requirements Notation and Conventions

This specification inherits terminology from the OpenID Connect and the OAuth 2.0 Framework (RFC 6749) specifications.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this specification are to be interpreted as described in RFC2119.

Terminology

Claim source service -- a service that manages claims and delivers them to OIDC identity brokers. For instance, a data owner.

Identity provider (IdP) service - a service that provides to users an identity, authenticates it; and provides claims to a broker using standard protocols, such as OpenID Connect, SAML or other federation protocols. Example: eduGAIN, Google Identity, Facebook, NIH ERACommons. IdPs MAY be claims sources.

OIDC Identity Broker service (aka “identity broker”, sometimes called an “IdP proxy”) - An OIDC Provider service that authenticates a user (potentially by an Identity Provider), collects their claims from internal and/or upstream claim sources and issues conformant OIDC claims to be consumed by Claim Clearinghouses. Brokers may also be Claim Clearinghouses of other upstream Brokers (i.e. create a chain of brokers like in the Flow of Claims diagram).

OIDC Claim Clearinghouse service (aka “Claim Clearinghouse” aka “claim consumer”) - A consumer of Identity Broker claims (an OIDC Relying Party or a service downstream) that makes an authorization decision at least in part based on inspecting GA4GH claims and allows access to a specific set of underlying resources in the target environment or platform. This abstraction allows for a variety of models for how systems consume these claims in order to provide access to resources. Access can be granted by either issuing new access tokens for downstream services (i.e. the Claim Clearinghouse may act like an authorization server) or by providing access to the underlying resources directly (i.e. the Claim Clearinghouse may act like a resource server). Some Claim Clearinghouses may issue access tokens that contain a new set of GA4GH claims and/or a subset of GA4GH claims that they received for downstream consumption.

Data Holder - An organization that protects a specific set of data. They hold data (or its copy) and respects and enforces the data owner's decisions on who can access it. A data owner can also be a data holder. Data holders run an OIDC Claim Clearinghouse Server at a minimum.

Data Owner - An organization that manages data and, in that role, has capacity to decide who can access it. For instance, a Data Access Committee. A Data owner is likely to be a claim source.

Relevant Specifications

OIDC Spec - Authorization Code Flow and Implicit Flow will generate id_tokens and access_tokens from the OIDC Broker.

JWT - Both the access_token and the resultant claims are in a JWT format. Specific implementations MAY extend this structure with their own service-specific response names as top-level members of this JSON object. Recommended “extensions” are in the Permissions section. The JWT specified here follows JWS headers specification. https://tools.ietf.org/html/rfc7515

JWS - The specific JWT to use for this spec.

Transport Layer Security (TLS, RFC 5246). Information passed among clients, Applications, Brokers, and Claim Clearinghouses MUST be protected using TLS.

OIDC Discovery

Flow of Claims

FlowOfClaims

Note: the above diagram shows how claims flow from a Claim Source (e.g. database) to a Claim Clearinghouse that uses them. This does not label all of the Relying Party relationships along this chain, where each recipient in the chain is typically -- but not always -- the relying party of the auth flow that fetches the claims from upstream.

Profile Requirements

Client/Application Conformance (user-Agent/Relying Party)

  1. Confidential clients MUST implement OIDC Authorization Code Flow (with Confidential Client) http://openid.net/specs/openid-connect-basic-1_0.html

  2. Public Clients MAY implement OIDC Implicit Flow (http://openid.net/specs/openid-connect-implicit-1_0.html)

    1. MUST use “id_token token” response_type for authentication.
  3. Conform to revocation requirements.

  4. Protection of Confidential Information

    1. Sensitive information (e.g., including client secrets, authorization codes, id_tokens, access_tokens) will be passed over HTTP and MUST be protected using transport layer security (TLS).

    2. All responses that contain tokens, secrets, or other sensitive information MUST include the following HTTP response header fields and values:

      1. Cache-Control: no-store

      2. Pragma: no-cache

  5. An application MAY choose to use multiple access tokens coming from a set of Brokers to get access to all the resources an application may need.

Conformance for Brokers

  1. Identity Brokers operate downstream from IdPs or provide their own IdP service. They issue id_tokens and access_tokens (and potentially refresh tokens) for consumption within the GA4GH compliant environment.

    1. A broker MUST issue both id_tokens and access_tokens.

      1. Brokers SHOULD issue tokens as JWTs in GA4GH-specified format
    2. Access_tokens MUST be in JWT

      1. Access tokens for GA4GH use MUST be in this format.

      2. MAY have a limited set of claims with a larger list of claims accessed in /userinfo

      3. Broker SHOULD include a “ga4gh_userinfo_claims” claim as an array of string claim names that can be retrieved via /userinfo in the GA4GH-specified format, or include the empty list if there are no further claims.

  2. Broker MUST support OIDC Discovery spec

    1. MUST include and support proper Metadata (ie must have a jwks_uri as required that’s reachable by a Claim Clearinghouse)
  3. Broker MUST support public-facing /userinfo endpoint

    1. When presented with a valid access token /userinfo MAY return claims in a specified JWT format

    2. MAY implement the OIDC claims request parameter on /userinfo to subset which claim information will be returned. If the Broker does not support the OIDC claims request parameter, then the Broker MUST NOT include claims_parameter_supported in the discovery service and all claims for the provided scopes eligible for release to the requestor MUST be returned.

Conformance for Claim Clearinghouses (consuming Access Tokens to give access to data)

  1. Claim Clearinghouses MUST trust at least one OIDC Identity Broker.

    1. Claim Clearinghouses MAY trust more than one Broker
  2. Claim Clearinghouses MUST either check the validity of the JWT or treat the token as opaque.

    1. If treating the token as a JWT a Claim Clearinghouse MUST

      1. Check the Token’s signature via JWKS or having stored the public key

        1. A metadata URL (.well-known URL) SHOULD be used here to use the jwks_uri parameter.
      2. Check iss attribute to ensure a trusted broker has generated the token

      3. Check exp to ensure the token has not expired

      4. MAY additionally check aud to make sure Relying Party is trusted (client_id).

    2. If treating the token as an opaque a Claim Clearinghouse MUST know in advance where to find a corresponding /userinfo. This limits the functionality of accepting tokens from multiple OIDC brokers.

  3. Claim Clearinghouse or downstream applications MAY use /userinfo (derived from the access_token JWT’s iss) to request more claims and MAY make use of the OIDC claims request parameter to subset which claims are requested.

    1. Claim Clearinghouses or downstream applications MAY check for a “ga4gh_userinfo_claims” claim for a list of additional claims in the GA4GH-specified format that may assist in making an access decision.
  4. Claim Clearinghouses service can be a broker itself and would follow the Conformance For Brokers.

GA4GH JWT Format

A well-formed JSON Web Token (JWT) consists of three concatenated Base64url-encoded strings, separated by dots (.) The three sections are: header, payload and signature. The access token and JWT with full claims use the same format, though the JWT with the full claims will have extended claims. These JWTs follow https://tools.ietf.org/html/rfc7515 (JWKS).

This profile is agnostic to the format of the id_token.

Access_token issued by broker

Header:

{
 "typ": "JWT",
 "alg": "RS256",
 ["kid": "xxxxx"](https://tools.ietf.org/html/rfc7515#section-4.1.4)
}

Payload:

{
 "iss": "https://\<issuer website\>/",
 "sub": "<someone@someone.com>",
 “idp”: “google”,
 "aud": [
  "client_id",
  "client_id2"
 ],
 "iat": 1553545136,
 "exp": 1553631536,
 "scope": "openid \<ga4gh-spec-scopes\>",
 "ga4gh_userinfo_claims": ["claim_name_1", "claim_name_2.substructure_name"],
 <ga4gh-spec-claims>
}
  • iss: MUST be able to be appended with .well-known/openid-configuration to get spec of broker.

  • sub: authenticated user unique identifier

  • idp: (optional) SHOULD contain the IDP the user used to auth with. Such as “Google”. This does not have to be unique and can be used just to help inform if that’s what a data owner or data holder needs.

  • aud: MUST contain the Oauth Client ID of the relying party. MAY contain other strings or identifiers as well.

  • iat: time issued

  • exp: time expired

  • scope: scopes verified. Must include “openid”. Will also include any <ga4gh-spec-scopes> needed for the GA4GH compliant environment (e.g. “ga4gh” is the scope for RI claims).

  • ga4gh_userinfo_claims: A list of OIDC claim names that are present from the /userinfo endpoint that are incomplete in that are attached to this access token. For complex OIDC claims with substructure, a dot-notation MAY be used to more precisely indicate which sub-claims contain more information within the /userinfo endpoint. Non-normative examples include: [“ga4gh”] : indicates that some RI claims are available beyond what is included in the access token but does not indicate which ones. [“ga4gh.ControlledAccessGrants”, “ga4gh.AffiliationAndRoles”] : indicates that only those two specific RI claims that exist within the “ga4gh” claim object would have additional content not included within the access token.

  • <ga4gh-spec-claims>: Claims included as part of a GA4GH standard specification based on the scopes provided. This content MAY be incomplete (i.e. a subset of data elements) and more may be fetched as indicated by ga4gh_userinfo_claims. A non-normative example of <ga4gh-spec-claims> is: "ga4gh": {ga4gh claims}

Claims sent to Data Holder by a Broker via /userinfo

Only the GA4GH claims truly must be as proscribed here. Refer to OIDC Spec for more information.

{
 "iss": "https://\<issuer website\>/",
 "sub": "<someone@someone.com>",
 "idp": "google",
 "aud": [
  "client_id",
  "client_id2"
 ],
 "iat": 1553545136,
 "exp": 1553631536,
 \<ga4gh-spec-claims\>
}

As a non-normative example, a valid <ga4gh-spec-claims> entry would be:

"ga4gh": {ga4gh claims}

Authorization/Claims

User attributes and claims are being developed in GA4GH Researcher Identity Claims document by the DURI work stream.

Token Revocation

Claim Authority Revokes Claim

Given that claims can cause downstream access tokens to be minted by Claim Clearinghouses and such downstream access tokens may have little knowledge or no connectivity to sources of claims, it can be challenging to build robust revocation capabilities across highly federated and loosely coupled systems. During the lifetime of the downstream access token, some systems may require that claims are no longer inspected nor updated.

In the event that a claim’s authority revokes a claim within the claim’s source system, downstream Brokers, Claim Clearinghouses, and other Authorization or Resource Servers MUST at a minimum provide a means to limit the lifespan of any given access tokens generated as a result of claims. To achieve this goal, servers involved with access may employ one or more of the following options:

  1. Have each claim authorization be paired with an expiry date and a time of being issued. Expiry dates would require users to log in occasionally via an Identity Broker in order to refresh claims. On a refresh, expiry timestamps can be extended from what the previous claim may have indicated.

  2. Provide refresh tokens at every level in the system hierarchy and use short-lived access tokens. This may require all contributing system to support OIDC offline access refresh tokens to deal with execution of processes where the user is no longer actively involved. In the event that refresh tokens experience errors, the systems involved must eventually revoke the ability for downstream access tokens to be replaced via refresh tokens (although some level of delay to reach out to a user to try to resolve the issue may be desirable).

  3. Provide some other means for downstream Claim Clearinghouses or other systems that create downstream access tokens to be informed of a material change in upstream claims such that action can be taken to revoke the token, revoke the refresh token, or revoke the access privileges associated with such tokens.

Revoking Access from Bad Actors

In the event that a system detects that a user is misbehaving or has falsified claims despite previous assurances that access was appropriate, there MUST be a mechanism to withdrawal access from existing tokens and update claims to prevent further tokens from being minted.

  1. Systems MUST have a means to revoke existing refresh tokens or remove permissions from access tokens that are sufficiently long-lived enough to warrant taking action.

  2. A process MUST exist, manual or automated, to eventually remove related claims from the claim issuer’s repository.

Limited Damage of Leaked Tokens

In order to limit damage of leaked tokens, systems MUST provide all of the following:

  1. Be able to leverage mechanisms in place for revoking claims to also limit exposure of leaked tokens.

  2. Follow best practices for the safekeeping of refresh tokens or longer lived tokens (should longer lived tokens be needed).

  3. Limit the life of refresh tokens or long lived keys before an auth challenge occurs or otherwise the refresh token simply fails to generate more access tokens.

Appendix

Examples of broker technologies

Examples of suites that provide both functionalities in a single package are : Auth0.com, Keycloak (open source), Hydra (open source), OpenAM, Okta, Globus Auth, Gen3 Fence, ELIXIR, NIH/VDS, AWS Cognito.

Why Brokers?

We have found that there are widely used Identity Providers (IdP) such as Google Authentication. These authentication mechanisms provide no authorization information (custom claims or scopes) but are so pervasive at the institution level that they cannot be ignored. The use of a “brokers” and “clearinghouses” enables “inserting” information into the usual OIDC flow so that Google identities can be used but claims and scopes can be customized.

For instance, if a stack is using just Google Auth, it can confirm some semblance of identity, but the Google IdP gives no ability to insert claims into the tokens it returns. This is true of many social logins and even institutional SAML like ERACommons. Brokers like Auth0, Keycloak and others enable any number of IdPs also give the stack owner the ability to insert into the token claims that can be used by decision-making systems downstream.

Here is a diagram: https://www.lucidchart.com/invitations/accept/68f3089b-0c9b-4e64-acd2-abffae3c0c43 of a full-broker. This is one possible way to use this spec.

flow

In this diagram, the Data Owner Claim Clearinghouse, the Data Holder Claim Clearinghouse and the Identity Broker are all different entities. However, in many cases, we expect the OIDC Broker and Data Owner to be the same entity and even be operated in the same OIDC stack.

Examples of implementations that provide both Identity Brokering and Data Owner Claim Clearinghouse services are: ELIXIR, Auth0, Keycloak, Globus auth, Okta, Hydra, AWS Cognito. These can be Identity Brokers and/or Claim Clearinghouses. They’re not usually only used for Claim consumption (akin to a OAuth2 Resource Server in many ways). NGINX and Apache both offer reverse proxies for “Claim Consumption Only” functionality -- https://github.com/zmartzone/lua-resty-openidc (with https://github.com/cdbattags/lua-resty-jwt) and https://github.com/zmartzone/mod_auth_openidc respectively.

Data holders and data owners should explore their options to decide what best fits their needs.

Services parties are responsible for providing

Data Holders:

Data holders are expected to protect their resources within a Claim Clearinghouse Server. These Servers should be able to get claims from one or more Brokers, with researcher authentication provided by one or more Identity Providers. Note: Most Claim Clearinghouses can provide access to resources based on information in Claims -- if not the Claim Clearinghouses themselves then in some downstream application that protects data.

Data Owners:

Data owners are expected to run an OIDC Broker that has /userinfo endpoint. A valid access token from a OIDC Identity Broker trusted by the data owner can be used and claims sent back to the user wishing to access data.

Data owners are not required to implement or operate an Identity Provider (though they may choose to do so) or an Identity Broker.

Data Owners may choose to operate an OIDC Claim Clearinghouse Server configured to consume access_tokens from an upstream Identity Broker and then hand out JWT claims to relying parties and other Claim Clearinghouses.

Some data owners will own the whole “chain” providing all of the different kinds of brokers and will also operate Claim Clearinghouses. For instance, NIH is a data owner and might provide Cloud Buckets and operate an IDP and Identity Broker to utilize ERACommons and other identity resources.

A Data Owner should be able to, based on an Identity from an Identity Provider, express some sort of permissions via the Clearinghouse Claims. It is the responsibility of the Data Owner to provide these permissions to their OIDC Claim Clearinghouse to be expressed claims within a standard /userinfo process for downstream use.

It is possible that the IdPs might have special claims. The OIDC Claim Clearinghouse being operated by the Data Owner should be “looking” for those claims and incorporating them, if desired, into the claims that it eventually sends to the user.

A data owner is expected to maintain the operational security of their OIDC Claim Clearinghouse Server and hold it to the GA4GH spec for operational security. It is also acceptable to align the security to a known and accepted framework such as NIST-800-53, ISO-27001/ISO-27002.

Future topics to explore

https://openid.net/specs/openid-connect-federation-1_0.html - OIDC federation

Register the Claim - According to RFC 7519 (JSON Web Token) section 4.2 https://tools.ietf.org/html/rfc7519#section-4.2 claim names should be registered by IANA in its "JSON Web Token Claims" registry at https://www.iana.org/assignments/jwt/jwt.xml . Register GA4GH.

You can’t perform that action at this time.