## Using a local GA4GH Data Connect server with dbGaP data

This notebook illustrates how Data Connect is a practical and minimal approach to the following
* Obtaining data in bulk
* Making the data available in a Dataframe suitable for analysis
* Selecting subsets of data according to attributes of subject and samples
* Providing descriptions of the data via schema
* Accessing related data elsewhere in BigQuery 

The examples work with a single dbGaP study - the Undiagnosed Disease Network study (phs001232). The Data Connect and FHIR servers used contain only simulated data.

A local Data Connect instance is used. The examples will not run as shown without it, but the main purpose of this notebook is to show the examples to illustrate Data Connect.

The examples in this notebook query on a single study. The important matter of querying across mutliple studies, within dbGaP or elsewhere, is not dealt with here. However, see related notebooks in this folder which explore how to use Data Connect with mappings in order to address the different codings or terms that may be used across different studies.

The FASP library used below is at https://github.com/ga4gh/fasp-clients and may be installed as follows:
```
pip install git+https://github.com/ga4gh/fasp-clients
```

### Table listing
The following instantiates a Data Connect client and lists some of the tables available.

In [1]:
from fasp.search import DataConnectClient
cl = DataConnectClient('http://localhost:8089/', debug=False, row_limit=100000)
cl.list_tables('tutorial', verbose=False)

['tutorial.onek_genomes.bdc_1000genomes',
 'tutorial.onek_genomes.onek_drs',
 'tutorial.onek_genomes.onek_recal_variants_drs',
 'tutorial.onek_genomes.sra_drs_files',
 'tutorial.onek_genomes.sra_onek_drs',
 'tutorial.onek_genomes.ssd_drs',
 'tutorial.onek_genomes.ssd_drs_copy',
 'tutorial.onek_genomes.thousand_genomes_meta',
 'tutorial.scr_gecco_susceptibility.gecco_sample_acc',
 'tutorial.scr_gecco_susceptibility.phs001554_gecco_sddp_manifest',
 'tutorial.scr_gecco_susceptibility.sample_attributes_multi',
 'tutorial.scr_gecco_susceptibility.sample_multi',
 'tutorial.scr_gecco_susceptibility.sample_run_file',
 'tutorial.scr_gecco_susceptibility.sb_drs_index',
 'tutorial.scr_gecco_susceptibility.subject_multi',
 'tutorial.scr_gecco_susceptibility.subject_phenotypes_multi',
 'tutorial.scr_icac.sample_multi',
 'tutorial.scr_icac.sample_run_file',
 'tutorial.scr_udn_v5.sample_multi',
 'tutorial.scr_udn_v5.sample_run_file',
 'tutorial.scr_udn_v5.udn-pedigree-multi',
 'tutorial.scr_udn_v5.ud

### Download a table
If the aim is simply to provide access to and transfer of data then the Data Connect allows simple download by providing only the table name.

This get_data method wraps the Data Connect /table/\<tablename\>/data endpoint See in [Data Connect Specification](https://github.com/ga4gh-discovery/data-connect/blob/develop/SPEC.md#table-discovery-and-browsing-examples).

We use Jupyters ability to time this step.

The whole table is retrieved and loaded into a dataframe in under 3 seconds. No transformation of the data is needed as it is returned in tabular json format which is easily loaded into a Dataframe.

In [2]:
%%time
df = cl.get_data('tutorial.scr_udn_v5.udn_phenotype_code', return_type='dataframe')
df

Retrieving the query
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
____Page6_______________
____Page7_______________
____Page8_______________
____Page9_______________
CPU times: user 141 ms, sys: 24.4 ms, total: 165 ms
Wall time: 3.03 s


Unnamed: 0,dbgap_subject_id,subject_id,hpo_phenotypes,hpo_observed_status
0,1827049,7ef1f8d7-8020-4781-8a4f-b9cb96913744,HP:0001962,no
1,1827049,7ef1f8d7-8020-4781-8a4f-b9cb96913744,HP:0000765,no
2,1827049,7ef1f8d7-8020-4781-8a4f-b9cb96913744,HP:0000925,no
3,1827049,7ef1f8d7-8020-4781-8a4f-b9cb96913744,HP:0000927,no
4,1827049,7ef1f8d7-8020-4781-8a4f-b9cb96913744,HP:0001155,no
...,...,...,...,...
43658,4296808,4b5cc1f0-4bc8-4e9b-84b3-2b24cb1af42b,,
43659,4296959,72c30970-392c-4dca-94f5-661dcbc21dc5,,
43660,4296544,09d0c9f9-50ee-4c3c-89bf-f4510897c689,,
43661,4297305,cc8e04b7-6293-4d4e-8cab-9d988a8fdf6d,,


The conversion to a dataframe is performed in the DataConnectClient class, but it can also be shown as follows.

In [3]:
import pandas as pd
# return the data in json format
json_data = cl.get_data('tutorial.scr_udn_v5.udn_phenotype_code', return_type='json')
df2 = pd.DataFrame.from_dict(json_data)

Retrieving the query
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
____Page6_______________
____Page7_______________
____Page8_______________
____Page9_______________


The json data is as follows. As a python list of dictionaries it can be loaded into a dataframe via a standard method.

In [5]:
json_data[0:3]

[{'dbgap_subject_id': 1827049,
  'subject_id': '7ef1f8d7-8020-4781-8a4f-b9cb96913744',
  'hpo_phenotypes': 'HP:0001962',
  'hpo_observed_status': 'no'},
 {'dbgap_subject_id': 1827049,
  'subject_id': '7ef1f8d7-8020-4781-8a4f-b9cb96913744',
  'hpo_phenotypes': 'HP:0000765',
  'hpo_observed_status': 'no'},
 {'dbgap_subject_id': 1827049,
  'subject_id': '7ef1f8d7-8020-4781-8a4f-b9cb96913744',
  'hpo_phenotypes': 'HP:0000925',
  'hpo_observed_status': 'no'}]

### Access the same data via FHIR
The dbgap_fhir module used here is as in https://github.com/ncbi/DbGaP-FHIR-API-Docs/tree/production/jupyter

The module provides statistics about the amount of data retrieved and the time to do so.

In [6]:
from dbgap_fhir import DbGapFHIR

passport_path = '~/Downloads/task-specific-token.txt'
mf = DbGapFHIR('https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1'
           ,passport=passport_path
           ,debug=False)

In [7]:
%%time
conditions = mf.run_query("Condition")

Total  Resources: 40320
Total  Bytes: 39385045
Total  Pages: 404
Time elapsed 441.2335 seconds
CPU times: user 2.6 s, sys: 549 ms, total: 3.15 s
Wall time: 7min 21s


Examine an example Condition resource

In [8]:
conditions[4000]

{'resourceType': 'Condition',
 'id': 'cnd-2181175-278114-3551',
 'meta': {'versionId': '1',
  'lastUpdated': '2024-09-23T15:16:50.610-04:00',
  'source': '#EDQU8nruHSXHeWea',
  'security': [{'system': 'http://terminology.hl7.org/CodeSystem/v3-Confidentiality',
    'code': 'U',
    'display': 'unrestricted'}]},
 'verificationStatus': {'coding': [{'system': 'http://hl7.org/fhir/ValueSet/condition-ver-status',
    'code': 'confirmed'}]},
 'code': {'coding': [{'system': 'http://human-phenotype-ontology.org',
    'code': 'HP:0004322'}]},
 'subject': {'reference': 'Patient/2181175'}}

### Transform the results to a dataframe
The transformation to a dataframe is significantly more complicated. Specific nested items need to be identified and retrieved.

In [9]:
import pandas as pd

conditions_dict = []
for c in conditions:
    conditions_dict.append( {"dbgap_subject_id" : c['subject']['reference'].split('/')[1],
    "hpo_phenotypes" : c['code']['coding'][0]['code'],
    "hpo_observed_status" : c['verificationStatus']['coding'][0]['code']
        } )
fhir_df = pd.DataFrame.from_dict(conditions_dict)
fhir_df

Unnamed: 0,dbgap_subject_id,hpo_phenotypes,hpo_observed_status
0,2753986,HP:0002118,confirmed
1,2753986,HP:0002194,confirmed
2,2753986,HP:0002363,confirmed
3,2753986,HP:0002538,confirmed
4,2753986,HP:0010663,confirmed
...,...,...,...
40315,4296515,HP:0000717,confirmed
40316,4296515,HP:0001263,confirmed
40317,4296515,HP:0002500,confirmed
40318,4296515,HP:0000545,confirmed


Note that the two dataframes have essentially the same set of columns. The subject_id in the original udn_phenotype_code table is not in the Condition resource as it is stored elsewhere in the dbGaP FHIR representation as an alternate id for a Patient.

The hpo_observed status values are different as a standard terminology was used in place of the yes/no in the original.

There are fewer rows in the FHIR data. The difference is shown by the following counts by 'hpo_observed_status'

In [10]:
df.groupby('hpo_observed_status').size()

hpo_observed_status
None     3343
no       4512
yes     35808
dtype: int64

In [11]:
fhir_df.groupby('hpo_observed_status').size()

hpo_observed_status
confirmed    35808
refuted       4512
dtype: int64

### What are the missing rows?
These data were not loaded to FHIR as Conditions because there was no specific condition (hpo_phenotype) identified

In [12]:
cl.run_query('''select * from tutorial.scr_udn_v5.udn_phenotype_code
where hpo_observed_status = 'None' ''', return_type='dataframe')

Retrieving the query
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
____Page6_______________
____Page7_______________
____Page8_______________


Unnamed: 0,dbgap_subject_id,subject_id,hpo_phenotypes,hpo_observed_status
0,1827062,960b9b8c-9f7a-4f41-be61-f3b603bb7076,,
1,1827014,509dcfa7-9140-4ac6-b8f5-72b0ecdc29e0,,
2,1827023,64e45a23-3141-4069-8ee4-cabd374d35b1,,
3,1782246,5f6e9231-dcd2-4336-b620-16bcd88d1479,,
4,1826979,08e47286-e4fe-4f16-acab-7dae368d1d67,,
...,...,...,...,...
3338,4296808,4b5cc1f0-4bc8-4e9b-84b3-2b24cb1af42b,,
3339,4296959,72c30970-392c-4dca-94f5-661dcbc21dc5,,
3340,4296544,09d0c9f9-50ee-4c3c-89bf-f4510897c689,,
3341,4297305,cc8e04b7-6293-4d4e-8cab-9d988a8fdf6d,,


### Do the same for the Subject Phenotypes table

In this case the data is more complex

First from Data Connect

In [13]:
%%time
subject_df = cl.run_query('select * from tutorial.scr_udn_v5.udn_subject_phenotypes', return_type='dataframe')
subject_df

Retrieving the query
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
____Page6_______________
____Page7_______________
CPU times: user 59.8 ms, sys: 8.85 ms, total: 68.7 ms
Wall time: 2.54 s


Unnamed: 0,dbgap_subject_id,subject_id,sex,race,ethnicity,primary_symptom_category,age_onset,affection_status
0,1827003,3c0558e0-c58d-490f-b9af-0cf426ff3e50,F,White,,17,0,A
1,1827081,c732713c-b08c-4ef7-b62b-4176bff922a8,F,White,,22,0,A
2,2181209,1dfd8fa6-0e97-48e3-b6b5-ae24cfed18ed,M,Asian,,,0,A
3,2181199,186824e9-1b16-4af7-a40b-f71b656295a5,F,White,,1,1,A
4,2181373,8fb2a9ef-550b-4c35-9f15-868e82c9f5d3,F,White,,,0,A
...,...,...,...,...,...,...,...,...
5133,3509963,1a9e67b6-ad21-4415-90c0-71402cd7d446,F,White,Unknown/Not Reported Ethnicity,,7,UK
5134,3667228,4ddb2839-46be-4030-af49-bd7ff8d504df,F,White,Unknown/Not Reported Ethnicity,,12,UK
5135,4297390,e1a32592-709c-4f1e-b946-7c95fe91dfbb,F,White,Unknown/Not Reported Ethnicity,,12,UK
5136,4297308,cd1602af-4b89-4c17-9681-d313c5392ffa,F,White,Unknown/Not Reported Ethnicity,12,0,UK


### And via FHIR

In this case each value in the original table is a separate FHIR observation resource. Each column of the original table is a variable identified by a phv variable id. We can query FHIR for observations with these ids.

It's beyond the immediate scope of this example to show how to obtain the variable ids. There are various approaches. Note below that Data Connect provides a method to obtain variable ids as part of the schema for a table.

Timings show that it takes significantly longer to retrieve the data via FHIR.

In [14]:
obs_codes = ['phv00278118.v5.p2',
             'phv00278119.v5.p2',
             'phv00278120.v5.p2',
             'phv00278121.v5.p2',
             'phv00278122.v5.p2']
query_string = f"Observation?code={','.join(obs_codes)}"
print(query_string)
print()
observations = mf.run_query(query_string)

Observation?code=phv00278118.v5.p2,phv00278119.v5.p2,phv00278120.v5.p2,phv00278121.v5.p2,phv00278122.v5.p2

Total  Resources: 25690
Total  Bytes: 24129385
Total  Pages: 257
Time elapsed 271.6424 seconds


### An example observation 

In [15]:
observations[4002]

{'resourceType': 'Observation',
 'id': '278121-2753816-0',
 'meta': {'versionId': '1',
  'lastUpdated': '2024-09-23T14:50:10.931-04:00',
  'source': '#rnrQ0dSn8j5N1MAH',
  'security': [{'system': 'http://terminology.hl7.org/CodeSystem/v3-Confidentiality',
    'code': 'U',
    'display': 'unrestricted'}]},
 'status': 'final',
 'code': {'coding': [{'system': 'https://dbgap-api.ncbi.nlm.nih.gov/fhir/x1/CodeSystem/DbGaPConcept-VariableAccession',
    'code': 'phv00278121.v5.p2',
    'display': 'AGE_ONSET'}]},
 'subject': {'reference': 'Patient/2753816'},
 'valueString': '15'}

### Transforming to a DataFrame

In this case assembling the dataframe (table) from the individual Observation resources is more complex. Each observation will become a cell of the dataframe.

Each row of the dataframe will be the observations for a given Patient.

In this case, from the knowledge of the data obtained above we know that there is only one value of each Observation for each patient. This simplifies the transformation but that won't always be the case. For example, if the HPO phenotype data above were represented as observations the rows could not be assembled based on dbgap_subject_id as below because there multiple value pairs per subject. 

Covering all possible options for transforming FHIR data to a dataframe suitable for analysis is not a simple challenge.

In [16]:
import pandas as pd
from collections import Counter


patient_observations_dict = {}
for r in observations:

    subject_id = r['subject']['reference']
    obs_display_name = r['code']['coding'][0]['display']
    obs_code = r['code']['coding'][0]['code']
    if 'valueQuantity' in r:
        value_text = r['valueQuantity']['value']
    elif 'valueCodeableConcept' in r:
         value_text = r['valueCodeableConcept']['coding'][0]['display']
    elif 'valueString' in r:
         value_text = r['valueString']
    else:
        value_text = 'unknown'
        
    if subject_id not in patient_observations_dict:
        patient_observations_dict[subject_id] = {obs_display_name: value_text}
    else:
        patient_observations_dict[subject_id][obs_display_name] = value_text
pd.set_option("display.max_rows", 30, "display.max_columns", None)
patient_df = pd.DataFrame.from_dict(patient_observations_dict, orient='index')
display(patient_df)

Unnamed: 0,AFFECTION_STATUS,ETHNICITY,RACE,PRIMARY_SYMPTOM_CATEGORY,AGE_ONSET
Patient/2753407,UA,Hispanic or Latino,White,,76
Patient/2753413,UA,Not Hispanic or Latino,White,,15
Patient/1826993,A,Not Hispanic or Latino,White,10,34
Patient/1827029,A,Not Hispanic or Latino,White,,66
Patient/1827049,UA,Hispanic or Latino,White,,78
...,...,...,...,...,...
Patient/4296808,A,Not Hispanic or Latino,White,,30
Patient/4296544,UA,Unknown/Not Reported Ethnicity,White,20,88
Patient/4296730,A,,White,,16
Patient/4297305,A,Not Hispanic or Latino,White,,29


### Search/query capability
The above considered retrieval of the data in bulk for a single table of data.

Data Connect also provides the ability to run SQL queries against the tables.

In this example subjects are selected in which a specific phenotype (HPO code) were observed.

In [17]:
cl.run_query('''select * 
             from tutorial.scr_udn_v5.udn_phenotype_code
             where hpo_phenotypes = 'HP:0001263' and hpo_observed_status = 'yes' ''',
             return_type='dataframe')

Retrieving the query
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
____Page6_______________
____Page7_______________
____Page8_______________
____Page9_______________
____Page10_______________


Unnamed: 0,dbgap_subject_id,subject_id,hpo_phenotypes,hpo_observed_status
0,1826993,274559bb-b938-471a-92bb-b39b773399c1,HP:0001263,yes
1,1827029,6b537dfc-6ad8-45ec-bd16-b0d7b58284e4,HP:0001263,yes
2,1827049,7ef1f8d7-8020-4781-8a4f-b9cb96913744,HP:0001263,yes
3,1827065,9a0686ba-15ad-4c5a-bc26-86a4ed74e46d,HP:0001263,yes
4,2181265,469290e0-d885-4b24-97eb-e2cf629a39f0,HP:0001263,yes
...,...,...,...,...
1427,4297136,9fbf41e6-2530-48e0-97f2-7a13472ffac9,HP:0001263,yes
1428,4296710,3221cedc-0980-444f-830d-e7c7b071fd8e,HP:0001263,yes
1429,4297160,a4b716f1-3b15-4be2-a58a-af441ccaddda,HP:0001263,yes
1430,4296911,65c733f2-d389-4467-ba83-cfd947cbe16e,HP:0001263,yes


### Queries can treat numbers as numbers

This example selects subjects where the age of onset of the primary symptom was greater than 11.

In [18]:

cl.run_query('''select * from tutorial.scr_udn_v5.udn_subject_phenotypes
where age_onset > 11 ''', return_type='dataframe')

Retrieving the query
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
____Page6_______________


Unnamed: 0,dbgap_subject_id,subject_id,sex,race,ethnicity,primary_symptom_category,age_onset,affection_status
0,2753288,0a9cb7fe-9003-4b0b-9493-2eb155dd7c4d,F,White,,5,12,A
1,3040012,d3546678-c423-40f9-b84d-8d4f542f5e72,M,White,,,12,A
2,3039265,52ce04d1-1dc9-4850-b7fe-c0df458cd379,F,White,,1,12,A
3,3039769,a75358a3-bfb7-4a36-847d-5f4d83f56a43,F,White,,,32,A
4,3510369,6717dc25-bd18-42d6-8a95-9c2debf053ef,M,White,,,12,A
...,...,...,...,...,...,...,...,...
97,3039720,9e55b289-55ff-4fb4-b237-7b63fa957871,F,White,Unknown/Not Reported Ethnicity,2,17,A
98,3039867,b877251f-e074-45c7-8727-4451cc48224e,M,White,Unknown/Not Reported Ethnicity,12,17,UA
99,3039629,8e89658e-5eaf-47b7-b528-d2b325b4be1e,M,White,Unknown/Not Reported Ethnicity,,12,UA
100,3667228,4ddb2839-46be-4030-af49-bd7ff8d504df,F,White,Unknown/Not Reported Ethnicity,,12,UK


### What is the unit for age? Using the table schema
This might raise the question in the query above whether the value of 11 for the age of onset is 11 years, months, days etc.

Data Connect allows a machine readable schema for each table to be provided as a JSON Schema. With an extension to provide the unit of measure we can see that the unit for age_onset is years.

Codings for enumerated values are also provided.

In this Data Connect implementation an extension was created to produce the schema automatically from information provided by dbGaP submitters. No additional curation was required. Data submitters are required to provide a data dictionary for each of their data tables which is then stored as XML. The extension transforms the XML data dictionary for each dbGaP table to a JSON schema.

The list_table_info method of the DataConnectClient class is a simple wrapper around the /table/\<tablename\>/info endpoint for the Data Connect service.

In [21]:
cl.list_table_info('tutorial.scr_udn_v5.udn_subject_phenotypes', verbose=True)

_Schema for tabletutorial.scr_udn_v5.udn_subject_phenotypes_
{
   "name": "tutorial.scr_udn_v5.udn_subject_phenotypes",
   "data_model": {
      "description": "",
      "$id": "dbgap:pht006079.v5",
      "properties": {
         "SUBJECT_ID": {
            "$id": "dbgap:phv00278116.v5",
            "description": "De-identified subject ID",
            "type": "string"
         },
         "SEX": {
            "$id": "dbgap:phv00278117.v5",
            "description": "Biological sex",
            "type": "encoded value",
            "oneOf": [
               {
                  "const": "F",
                  "title": "Female"
               },
               {
                  "const": "M",
                  "title": "Male"
               },
               {
                  "const": "O",
                  "title": "Other"
               }
            ]
         },
         "RACE": {
            "$id": "dbgap:phv00278118.v5",
            "description": "Race of subject",
          

<fasp.search.data_connect_client.SearchSchema at 0x168ce2400>

It is worth considering the FHIR syntax that would be necessary for equivalent queries. In some cases it will be realtively straightforward e.g. where both attributes of the Condition are in the same resource as in the example above. In other cases e.g. in the case where cells of the table are in separate cells the FHIR syntax will be relatively challenging. Even where the 'join' of Observations exists the query syntax is not straightforward.   

### Query SRA 'metadata' for a UDN sample

Other data sources can also be made available through Data Connect from their original source.

It was straightfoward to make available via Data Connect the SRA metadata table that NCBI provides in BigQuery.

In [24]:
cl.list_tables('nih_sra_datastore')

Retrieving the table list
____Page1_______________
nih_sra_datastore.sra.metadata
nih_sra_datastore.sra.metadata_clustered
nih_sra_datastore.sra_tax_analysis_tool.kmer
nih_sra_datastore.sra_tax_analysis_tool.tax_analysis
nih_sra_datastore.sra_tax_analysis_tool.tax_analysis_clustered
nih_sra_datastore.sra_tax_analysis_tool.tax_analysis_info
nih_sra_datastore.sra_tax_analysis_tool.taxonomy


['nih_sra_datastore.sra.metadata',
 'nih_sra_datastore.sra.metadata_clustered',
 'nih_sra_datastore.sra_tax_analysis_tool.kmer',
 'nih_sra_datastore.sra_tax_analysis_tool.tax_analysis',
 'nih_sra_datastore.sra_tax_analysis_tool.tax_analysis_clustered',
 'nih_sra_datastore.sra_tax_analysis_tool.tax_analysis_info',
 'nih_sra_datastore.sra_tax_analysis_tool.taxonomy']

#### Query run data for a UDN sample

In [114]:
query1 = '''SELECT assay_type, experiment, instrument, librarysource, avgspotlen
FROM nih_sra_datastore.sra.metadata
where acc='SRR8006118'
'''

res = cl.run_query(query1, return_type='dataframe')
res

Retrieving the query
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
____Page6_______________
____Page7_______________
____Page8_______________
____Page9_______________
____Page10_______________
____Page11_______________
____Page12_______________
____Page13_______________
____Page14_______________
____Page15_______________
____Page16_______________
____Page17_______________
____Page18_______________
____Page19_______________
____Page20_______________
____Page21_______________
____Page22_______________
____Page23_______________
____Page24_______________
____Page25_______________
____Page26_______________
____Page27_______________
____Page28_______________
____Page29_______________
____Page30_______________
____Page31_______________
____Page32_______________
____Page33_______________
____Page34_______________
____Page35_______________
____Page36_______________
____Page37_______________
____Page38_______________


Unnamed: 0,assay_type,experiment,instrument,librarysource,avgspotlen
0,WGS,SRX4836919,HiSeq X Ten,GENOMIC,302


### Schema for SRA metadata

While no machine readable schema was available for the SRA metadata table it was straightforward to create one based on the information made available at

https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud-based-metadata-table/

Data Connect provides a standard for providing schema for data (such as SRA) that is unlikely to be provided in FHIR.

In [69]:
cl.list_table_info('nih_sra_datastore.sra.metadata', verbose=True)

_Schema for tablenih_sra_datastore.sra.metadata_
{
   "name": "nih_sra_datastore.sra.metadata",
   "description": "Metadata Table (sra.metadata) contains information about the run and biological samples. Metadata Table (sra.metadata) contains information about the run and biological samples. The biological sample data is stored in two different columns.  See Record (array) column that you need to use the command UNNEST to query. See https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud-based-metadata-table/",
   "data_model": {
      "$id": "",
      "description": "Metadata Table (sra.metadata) contains information about the run and biological samples. Metadata Table (sra.metadata) contains information about the run and biological samples. The biological sample data is stored in two different columns.  See Record (array) column that you need to use the command UNNEST to query. See https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud-based-metadata-table/",
      "$schema": "http://json-schema.or

<fasp.search.data_connect_client.SearchSchema at 0x175860310>