Building Blocks and Schemas for GA4GH Implementations
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

GA4GH SchemaBlocks

A graph showing recommended basic objects and their relationships. The example attributes are placeholders for elements defined in the general schema description.

Please not: The content of this repository is being re-"invented" in, as a wider GA4GH cross-workstream initiative. This site here should be considered a prototype for the new project ...

This repository contains schema "blocks" for the GA4GH project, in a collaborative effort between members of the Clinical and Phenotypic Data Capture (GA4GH::CP) and the Genomic Knowledge Standards (GA4GH::GKS) and the Discovery work streams.

Such blocks can be

  • object prototypes
  • object relations
  • documentation of data formats and standards
  • ... and probably other "things" related to the building of APIs and resources related top GA4GH

The project does not intent to build a monolithic API, but rather help to exchange usable components for creating implementations.

Currently, this site just represents skeleton schema elements, derived from the original, then monolithic GA4GH schema.

The primary documents are in the yaml directory, with JSON versions and examples extracted from them. The "readable" documentation is also created from the YAML files and can be accessed through the links below.

  • common (raw) object classes, which are used in the schemas themselves
  • biosample (raw) Most relevant "bio"data (such as diagnoses, phenotypes ...) is stored in the biosample object.
  • individual (raw) The individual object contains information which pertains to the whole biological entity biosamples are derived from (e.g. sex, heritable phenotypes...).

The "genomic" parts of the schema recommendations do not yet represent authoritative recommendations of the GA4GH::GKS group, but rather reflect extended versions of the original, VCF-derived GA4GH schema. Examples for current use of this schema are e.g. in the and the Beacon+ projects.

  • variant (raw) The variant object includes attributes and examples for both structural (DUP, DEL ...) and precise genome variants.
  • callset (raw) The callset object is for technoical data and series information (e.g. used platform and analysis metods). It is not strictly needed for querying combined variant + biosample aspects, since in the current implementation the variant object contains a reference to the biosample it was derived from.