### Demonstration of Schema Registry - PEPHub

This is an initial exploration of the the Schema Registry implementation at https://pephub-api-dev.databio.org/api/v1

All implementations are evolving and exploring different needs. All comments are for discussion purposes.

In [4]:
import requests;

import json;

def prettyprint(a_dict):
    print(json.dumps(a_dict, indent=3))

def printline(char="_"):
    print(char*80)

### Use pephub implementation

In [25]:
base = "https://pephub-api-dev.databio.org/api/v1"

### Get namespaces

In [40]:
url = f"{base}/namespaces"
print(url)
response = requests.get(url)
try:
    namespaces = response.json()['namespaces']
    for namespace in namespaces:
        print( namespace['namespace_name'])
except Exception as err:
    print(type(err))
    print("Error:", err)
    print("Response was:")
    prettyprint(response.json())


https://pephub-api-dev.databio.org/api/v1/namespaces
<class 'KeyError'>
Error: 'namespaces'
Response was:
{
   "pagination": {
      "page": 0,
      "page_size": 5,
      "total": 2
   },
   "results": [
      {
         "namespace": "namespace2",
         "number_of_projects": 0,
         "number_of_schemas": 3
      },
      {
         "namespace": "namespace1",
         "number_of_projects": 0,
         "number_of_schemas": 2
      }
   ]
}


**Relation to spec:** <span style="color:red">modified response</span>

### /schemas

In [41]:
endpoint = f"/schemas"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
schemas = response.json()['schemas']
for schema in schemas:
    print( schema['schema_name'])

/schemas


KeyError: 'schemas'

In [42]:
prettyprint(response.json())

{
   "pagination": {
      "page": 0,
      "page_size": 100,
      "total": 5
   },
   "results": [
      {
         "namespace": "namespace1",
         "name": "2.0.0",
         "description": "",
         "maintainers": "Nathan",
         "lifecycle_stage": "",
         "latest_version": "1.0.0",
         "private": false,
         "last_update_date": "2025-03-18T20:27:01.669912Z"
      },
      {
         "namespace": "namespace1",
         "name": "2.1.0",
         "description": "",
         "maintainers": "Donald",
         "lifecycle_stage": "",
         "latest_version": "1.0.0",
         "private": false,
         "last_update_date": "2025-03-18T20:27:01.602433Z"
      },
      {
         "namespace": "namespace2",
         "name": "bedbuncher",
         "description": "",
         "maintainers": "John",
         "lifecycle_stage": "",
         "latest_version": "1.0.0",
         "private": false,
         "last_update_date": "2025-03-18T20:27:01.506602Z"
      },
      {
   

**Relation to spec:** <span style="color:red">modified response</span>

Currently the representation of a schema does not yet match the Schema Registry spec.

Looks like there is an implied need for pagination

The implementation of pagination seems to be consistent with the [pagination guide recommendations](https://docs.google.com/document/d/1pu2icPEll-vueFcjCuUnjuAwEvtahOf_dyyaP5qqbCs/edit?tab=t.0#heading=h.ofkazvkuciju) recently issued by GA4GH TASC.



#### Using pagination
Pagination works per the TASC recommendation (note the page size of 2 is used for checking that pagination is working, real page sizes are likely to be larger).


In [43]:
endpoint = f"/schemas?page_size=2"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
prettyprint(response.json())

/schemas?page_size=2
{
   "pagination": {
      "page": 0,
      "page_size": 2,
      "total": 5
   },
   "results": [
      {
         "namespace": "namespace1",
         "name": "2.0.0",
         "description": "",
         "maintainers": "Nathan",
         "lifecycle_stage": "",
         "latest_version": "1.0.0",
         "private": false,
         "last_update_date": "2025-03-18T20:27:01.669912Z"
      },
      {
         "namespace": "namespace1",
         "name": "2.1.0",
         "description": "",
         "maintainers": "Donald",
         "lifecycle_stage": "",
         "latest_version": "1.0.0",
         "private": false,
         "last_update_date": "2025-03-18T20:27:01.602433Z"
      }
   ]
}


In [44]:
endpoint = f"/schemas?page_size=2&page=1"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
prettyprint(response.json())

/schemas?page_size=2&page=1
{
   "pagination": {
      "page": 1,
      "page_size": 2,
      "total": 5
   },
   "results": [
      {
         "namespace": "namespace2",
         "name": "bedbuncher",
         "description": "",
         "maintainers": "John",
         "lifecycle_stage": "",
         "latest_version": "1.0.0",
         "private": false,
         "last_update_date": "2025-03-18T20:27:01.506602Z"
      },
      {
         "namespace": "namespace2",
         "name": "bedboss",
         "description": "",
         "maintainers": "Teddy",
         "lifecycle_stage": "",
         "latest_version": "1.0.0",
         "private": false,
         "last_update_date": "2025-03-18T20:27:01.441300Z"
      }
   ]
}


In [45]:
endpoint = f"/schemas?page_size=2&page=2"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
prettyprint(response.json())

/schemas?page_size=2&page=2
{
   "pagination": {
      "page": 2,
      "page_size": 2,
      "total": 5
   },
   "results": [
      {
         "namespace": "namespace2",
         "name": "bedmaker",
         "description": "",
         "maintainers": "Teddy",
         "lifecycle_stage": "",
         "latest_version": "1.2.1",
         "private": false,
         "last_update_date": "2025-03-18T20:27:01.356336Z"
      }
   ]
}


### Schemas in a namespace

In [21]:
endpoint = f"/schemas/namespace2"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
prettyprint(response.json())

/schemas/namespace2
{
   "pagination": {
      "page": 0,
      "page_size": 100,
      "total": 3
   },
   "results": [
      {
         "namespace": "namespace2",
         "name": "bedbuncher",
         "description": "",
         "maintainers": "Teddy",
         "lifecycle_stage": "",
         "private": false,
         "last_update_date": "2025-03-18T17:09:44.752795Z"
      },
      {
         "namespace": "namespace2",
         "name": "bedboss",
         "description": "",
         "maintainers": "Teddy",
         "lifecycle_stage": "",
         "private": false,
         "last_update_date": "2025-03-18T17:09:44.694312Z"
      },
      {
         "namespace": "namespace2",
         "name": "bedmaker",
         "description": "",
         "maintainers": "Teddy",
         "lifecycle_stage": "",
         "private": false,
         "last_update_date": "2025-03-18T17:09:44.590874Z"
      }
   ]
}


In [46]:
endpoint = "/schemas/namespace2/bedmaker/versions/latest"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
prettyprint(response.json())

/schemas/namespace2/bedmaker/versions/latest
{
   "description": "bedmaker2.1.0",
   "properties": {
      "samples": {
         "type": "array",
         "items": {
            "type": "object",
            "properties": {
               "sample_name": {
                  "type": "string",
                  "description": "Name of the sample"
               },
               "input_file": {
                  "type": "string",
                  "description": "Absolute path to the input file"
               },
               "input_type": {
                  "type": "string",
                  "description": "file format",
                  "enum": [
                     "bigWig",
                     "bigBed",
                     "bed",
                     "wig",
                     "bedGraph"
                  ]
               },
               "genome": {
                  "type": "string",
                  "description": "organism genome code"
               },
               "

In [57]:
endpoint = "/schemas/namespace2/bedmaker"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
prettyprint(response.json())

/schemas/namespace2/bedmaker
{
   "namespace": "namespace2",
   "name": "bedmaker",
   "description": "",
   "maintainers": "Teddy",
   "lifecycle_stage": "",
   "latest_version": "1.2.1",
   "private": false,
   "last_update_date": "2025-03-18T20:27:01.356336Z"
}


In [48]:
endpoint = "/schemas/namespace2/bedmaker/versions"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
prettyprint(response.json())

/schemas/namespace2/bedmaker/versions
{
   "pagination": {
      "page": 0,
      "page_size": 100,
      "total": 2
   },
   "results": [
      {
         "namespace": "namespace2",
         "name": "bedmaker",
         "version": "1.2.1",
         "contributors": "Karlo",
         "release_notes": "First",
         "tags": {},
         "release_date": "2025-03-18T20:27:01.377386Z",
         "last_update_date": "2025-03-18T20:27:01.377386Z"
      },
      {
         "namespace": "namespace2",
         "name": "bedmaker",
         "version": "1.0.0",
         "contributors": "Teddy, John",
         "release_notes": "Initial release",
         "tags": {
            "maturity_level": "trial_use"
         },
         "release_date": "2025-03-18T20:27:01.377386Z",
         "last_update_date": "2025-03-18T20:27:01.377390Z"
      }
   ]
}


### Get a specific schema

In [58]:
endpoint = "/schemas/namespace2/bedmaker/versions/1.2.1"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
prettyprint(response.json())

/schemas/namespace2/bedmaker/versions/1.2.1
{
   "description": "bedmaker2.1.0",
   "properties": {
      "samples": {
         "type": "array",
         "items": {
            "type": "object",
            "properties": {
               "sample_name": {
                  "type": "string",
                  "description": "Name of the sample"
               },
               "input_file": {
                  "type": "string",
                  "description": "Absolute path to the input file"
               },
               "input_type": {
                  "type": "string",
                  "description": "file format",
                  "enum": [
                     "bigWig",
                     "bigBed",
                     "bed",
                     "wig",
                     "bedGraph"
                  ]
               },
               "genome": {
                  "type": "string",
                  "description": "organism genome code"
               },
               "f

In [52]:
endpoint = "/schemas/namespace2/bedmaker/versions/1.0.0"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
prettyprint(response.json())

/schemas/namespace2/bedmaker/versions/1.0.0
{
   "description": "bedmaker PEP schema",
   "properties": {
      "samples": {
         "type": "array",
         "items": {
            "type": "object",
            "properties": {
               "sample_name": {
                  "type": "string",
                  "description": "name of the sample, which is the name of the output BED file"
               },
               "input_file_path": {
                  "type": "string",
                  "description": "absolute path the file to convert"
               },
               "output_bed_path": {
                  "type": "string",
                  "description": "absolute path the file to the output BED file (derived attribute)"
               },
               "output_bigbed_path": {
                  "type": "string",
                  "description": "absolute path the file to the output bigBed file (derived attribute)"
               },
               "genome": {
               

In [53]:
endpoint = "/schemas/namespace2/bedboss/versions"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
prettyprint(response.json())

/schemas/namespace2/bedboss/versions
{
   "pagination": {
      "page": 0,
      "page_size": 100,
      "total": 1
   },
   "results": [
      {
         "namespace": "namespace2",
         "name": "bedboss",
         "version": "1.0.0",
         "contributors": "Teddy, John",
         "release_notes": "Initial release",
         "tags": {},
         "release_date": "2025-03-18T20:27:01.456797Z",
         "last_update_date": "2025-03-18T20:27:01.456801Z"
      }
   ]
}


In [54]:
endpoint = "/schemas/namespace2/bedbuncher/versions"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
prettyprint(response.json())

/schemas/namespace2/bedbuncher/versions
{
   "pagination": {
      "page": 0,
      "page_size": 100,
      "total": 1
   },
   "results": [
      {
         "namespace": "namespace2",
         "name": "bedbuncher",
         "version": "1.0.0",
         "contributors": "Teddy, John",
         "release_notes": "Initial release",
         "tags": {},
         "release_date": "2025-03-18T20:27:01.522135Z",
         "last_update_date": "2025-03-18T20:27:01.522139Z"
      }
   ]
}


In [56]:
endpoint = "/schemas/namespace2/bedbuncher/versions/latest"
print(endpoint)
response = requests.get(f"{base}{endpoint}")
prettyprint(response.json())

/schemas/namespace2/bedbuncher/versions/latest
{
   "description": "bedbuncher PEP schema",
   "imports": [
      "http://schema.databio.org/pep/2.0.0.yaml"
   ],
   "properties": {
      "samples": {
         "type": "array",
         "items": {
            "type": "object",
            "properties": {
               "JSONquery_path": {
                  "type": "string",
                  "description": "path to the JSON file with the Elasticsearch query"
               },
               "bedset_name": {
                  "type": "string",
                  "pattern": "^\\S*$",
                  "description": "name of the bedset that will be created"
               },
               "bbconfig_path": {
                  "type": "string",
                  "description": "path to bedbase config file"
               }
            },
            "required": [
               "JSONquery_path",
               "bedset_name"
            ]
         }
      }
   },
   "required": [
      "samp