## Comparisons of methods for searching FHIR

This GA4GH Search query searches a copy of the Patient resource from the NCPI FHIR Resource created under Project Forge. The attribute we want to query on, ethnicity, is available as an extension so a level of indirection is necessaey to query on that attribute. The value of ethnicity must then be unpacked.

In [6]:
import json

from fasp.search  import DataConnectClient

searchClient = DataConnectClient('https://ga4gh-search-adapter-presto-public.prod.dnastack.com')

query = """select id, patient from kidsfirst.ga4gh_tables.patient 
where json_extract_scalar(patient, '$.extension[0].url') = 
'http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity' 
limit 3"""
#TODO query on the value of ethnicity with AND

res = searchClient.runQuery(query)

for r in res:
    patient = r[1]
    print(patient['id'], patient['gender'])
    for e in patient['extension']:
        print (e['url'])
        print(e['extension'][0]['url'])
        vc = e['extension'][0]['valueCoding']
        print(vc['code'], vc['display'])

_Retrieving the query_
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
451133 male
http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity
ombCategory
2186-5 Not Hispanic or Latino
http://hl7.org/fhir/us/core/StructureDefinition/us-core-race
ombCategory
2106-3 White
451134 female
http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity
ombCategory
2135-2 Hispanic or Latino
http://hl7.org/fhir/us/core/StructureDefinition/us-core-race
ombCategory
2106-3 White
451135 female
http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity
ombCategory
2135-2 Hispanic or Latino
http://hl7.org/fhir/us/core/StructureDefinition/us-core-race
ombCategory
2106-3 White


We can query on the code for ethnicity. Note that we have to rely on the extensions being at a particular place in the array of extensions.

In [41]:
import json

from fasp.search  import DataConnectClient

searchClient = DataConnectClient('https://ga4gh-search-adapter-presto-public.prod.dnastack.com')

query = """select id, patient from kidsfirst.ga4gh_tables.patient 
where json_extract_scalar(patient, '$.gender') = 'female'
and json_extract_scalar(patient, '$.extension[0].url') = 
'http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity' 
and json_extract_scalar(patient, '$.extension[0].extension[0].valueCoding.code') = '2135-2'
limit 20"""


res = searchClient.runQuery(query)

for r in res:
    patient = r[1]
    print(patient['id'], patient['gender'])
    for e in patient['extension']:
        print (e['url'])
        print(e['extension'][0]['url'])
        vc = e['extension'][0]['valueCoding']
        print(vc['code'], vc['display'])
    print('_______________________')

_Retrieving the query_
____Page1_______________
____Page2_______________
____Page3_______________
____Page4_______________
____Page5_______________
451134 female
http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity
ombCategory
2135-2 Hispanic or Latino
http://hl7.org/fhir/us/core/StructureDefinition/us-core-race
ombCategory
2106-3 White
_______________________
451135 female
http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity
ombCategory
2135-2 Hispanic or Latino
http://hl7.org/fhir/us/core/StructureDefinition/us-core-race
ombCategory
2106-3 White
_______________________
451136 female
http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity
ombCategory
2135-2 Hispanic or Latino
http://hl7.org/fhir/us/core/StructureDefinition/us-core-race
ombCategory
2106-3 White
_______________________
451165 female
http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity
ombCategory
2135-2 Hispanic or Latino
http://hl7.org/fhir/us/core/StructureDefinition/us

### Perform the same query directly via FHIR
Is it any easier to specify the query and to unpack the results?

Using the NIH Cloud Platform Interoperability (NCPI) FHIR server directly.

Note the file with the cookie for the NCPI FHIR server should contain the following
{"Cookie":"AWSELBAuthSessionCookie-0=your_cookie_here"}
The following provides instructions on how to get the cookie
https://github.com/NIH-NCPI/ncpi-api-fhir-service

First a basic query to check we can query the FHIR server via the fhir-py library.

In [1]:
import requests
fhir_server = 'https://ncpi-api-fhir-service-dev.kidsfirstdrc.org'
x = requests.get(fhir_server)
print (x.cookies)

<RequestsCookieJar[<Cookie _csrf=eBViLgtdlKnrFdK6JUGYgAif for d3b-center.auth0.com/usernamepassword/login>]>


In [48]:
from fhirpy import SyncFHIRClient
import os
import json


endpoint = 'https://ncpi-api-fhir-service-dev.kidsfirstdrc.org'
full_cookie_path = os.path.expanduser('~/.keys/ncpi_fhir_cookie.json')
with open(full_cookie_path) as f:
        cookies = json.load(f)

client = SyncFHIRClient(endpoint, extra_headers=cookies)

# Search for patients by gender
resources = client.resources('Patient')  
resources = resources.search(gender='female').limit(1000)
patients = resources.fetch_all()
print("# of patients:{}".format(len(patients)))
print (type(patients[1]))

# of patients:3687
<class 'fhirpy.lib.SyncFHIRResource'>


Look at some of the details of patients

In [3]:
for p in patients[2:10]:
    print(json.dumps(p.serialize()))
    print('_____________________')

{"resourceType": "Patient", "id": "539318", "meta": {"versionId": "1", "lastUpdated": "2021-04-28T23:43:03.678+00:00", "source": "#Zpvw5NWtHnxvwxWr", "profile": ["http://hl7.org/fhir/StructureDefinition/Patient"]}, "identifier": [{"system": "https://kf-api-dataservice.kidsfirstdrc.org/participants/", "value": "PT_RP789F44"}, {"system": "https://kf-api-dataservice.kidsfirstdrc.org/participants/", "value": "?study_id=SD_7NQ9151J&external_id=BH3504_2"}, {"system": "urn:kids-first:unique-string", "value": "Patient|SD_7NQ9151J|BH3504_2"}], "gender": "female"}
_____________________
{"resourceType": "Patient", "id": "539310", "meta": {"versionId": "1", "lastUpdated": "2021-04-28T23:43:03.546+00:00", "source": "#R8H3r13WvVHAuDkR", "profile": ["http://hl7.org/fhir/StructureDefinition/Patient"]}, "identifier": [{"system": "https://kf-api-dataservice.kidsfirstdrc.org/participants/", "value": "PT_XNXCHGGH"}, {"system": "https://kf-api-dataservice.kidsfirstdrc.org/participants/", "value": "?study_i

Working on formulating in fhir-py the query as above

In [11]:
# Search for patients by race
resources = client.resources('Patient')  
resources = resources.search(race='White').limit(1000)
patients = resources.fetch_all()
print("# of patients:{}".format(len(patients)))

# of patients:0


In [46]:
# Search for patients by id
resources = client.resources('Patient')  
resources = resources.search(_id=539314).limit(1000)
patients = resources.fetch_all()
print("# of patients:{}".format(len(patients)))

# of patients:1


Objective is to query on attributes within a study and obtain a DRS id

At this point we can't see what study a Patient is part of.

Can we get Research Studies?

In [45]:
resources = client.resources('ResearchStudy')  
orgs = resources.fetch_all()
for o in orgs:
    os = o.serialize()
    print (os['identifier'][1]['value'])
    print (os['title'])
    print(json.dumps(o.serialize(), indent=3))
    print('_'*40)

ResearchStudy|phs001987
Genome-wide Sequencing to Identify the Genes Responsible for Enchondromatoses and Related Malignant Tumors
{
   "resourceType": "ResearchStudy",
   "id": "539879",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-04-28T23:43:17.331+00:00",
      "source": "#3dDKp2NrF3FLtKGS",
      "profile": [
         "http://hl7.org/fhir/StructureDefinition/ResearchStudy"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/studies/",
         "value": "SD_7NQ9151J"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "ResearchStudy|phs001987"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/studies/",
         "value": "?external_id=phs001987&version=v1.p1"
      }
   ],
   "title": "Genome-wide Sequencing to Identify the Genes Responsible for Enchondromatoses and Related Malignant Tumors",
   "status": "completed"
}
____________________________

Query for the Familial Leukemia study as I have access to that

In [50]:
resources = client.resources('ResearchStudy')
resources = resources.search(_id=577137).limit(1000)
studies = resources.fetch_all()
print(len(studies))
studies[0].serialize()

1


{'resourceType': 'ResearchStudy',
 'id': '577137',
 'meta': {'versionId': '1',
  'lastUpdated': '2021-04-28T23:57:32.722+00:00',
  'source': '#pP70Q2ef3A0Tsaev',
  'profile': ['http://hl7.org/fhir/StructureDefinition/ResearchStudy']},
 'identifier': [{'system': 'https://kf-api-dataservice.kidsfirstdrc.org/studies/',
   'value': 'SD_W0V965XZ'},
  {'system': 'urn:kids-first:unique-string',
   'value': 'ResearchStudy|phs001738'},
  {'system': 'https://kf-api-dataservice.kidsfirstdrc.org/studies/',
   'value': '?external_id=phs001738&version=v1.p1'}],
 'title': 'Genomic Analysis of Familial Leukemia',
 'status': 'completed',
 'principalInvestigator': {'reference': 'PractitionerRole/575854'}}

How do we find the files in the study?

In [29]:
knownResources = ["Appointment","Account","Invoice","CatalogEntry","EventDefinition",
                  "DocumentManifest","MessageDefinition","Goal","MedicinalProductPackaged","Endpoint",
                  "EnrollmentRequest","Consent","CapabilityStatement","Measure","Medication","ResearchSubject",
                  "Subscription","DocumentReference","GraphDefinition","Parameters","CoverageEligibilityResponse",
                  "MeasureReport","PractitionerRole","SubstanceReferenceInformation","RelatedPerson",
                  "ServiceRequest","SupplyRequest","Practitioner","VerificationResult","SubstanceProtein",
                  "BodyStructure","Slot","Contract","Person","RiskAssessment","Group","PaymentNotice",
                  "ResearchDefinition","MedicinalProductManufactured","Organization","CareTeam","ImplementationGuide",
                  "ImagingStudy","FamilyMemberHistory","ChargeItem","ResearchElementDefinition","ObservationDefinition",
                  "Encounter","Substance","SubstanceSpecification","SearchParameter","ActivityDefinition",
                  "Communication","InsurancePlan","Linkage","SubstanceSourceMaterial","ImmunizationEvaluation",
                  "DeviceUseStatement","RequestGroup","DeviceRequest","MessageHeader","ImmunizationRecommendation",
                  "Provenance","Task","Questionnaire","ExplanationOfBenefit","MedicinalProductPharmaceutical","ResearchStudy",
                  "Specimen","AllergyIntolerance","CarePlan","StructureDefinition","ChargeItemDefinition","EpisodeOfCare",
                  "OperationOutcome","Procedure","List","ConceptMap","OperationDefinition","ValueSet","Immunization","MedicationRequest",
                  "EffectEvidenceSynthesis","BiologicallyDerivedProduct","Device","VisionPrescription","Media",
                  "MedicinalProductContraindication","EvidenceVariable","MolecularSequence","MedicinalProduct","DeviceMetric",
                  "CodeSystem","Flag","SubstanceNucleicAcid","RiskEvidenceSynthesis","AppointmentResponse","StructureMap",
                  "AdverseEvent","GuidanceResponse","Observation","MedicationAdministration","EnrollmentResponse",
                  "Binary","Library","MedicinalProductInteraction","MedicationStatement","CommunicationRequest","TestScript",
                  "Basic","SubstancePolymer","TestReport","ClaimResponse","MedicationDispense","DiagnosticReport",
                  "OrganizationAffiliation","HealthcareService","MedicinalProductIndication","NutritionOrder",
                  "TerminologyCapabilities","Evidence","AuditEvent","PaymentReconciliation","Condition",
                  "SpecimenDefinition","Composition","DetectedIssue","Bundle","CompartmentDefinition",
                  "MedicationKnowledge","MedicinalProductIngredient","Patient","Coverage","QuestionnaireResponse",
                  "CoverageEligibilityRequest","NamingSystem","MedicinalProductUndesirableEffect","ExampleScenario",
                  "Schedule","SupplyDelivery","ClinicalImpression","DeviceDefinition","PlanDefinition",
                  "MedicinalProductAuthorization","Claim","Location"]
for res in knownResources:
    resources = client.resources(res)
    resCount = resources.fetch_all()
    print(res, len(resCount))    

Appointment 0
Account 0
Invoice 0
CatalogEntry 0
EventDefinition 0
DocumentManifest 0
MessageDefinition 0
Goal 0
MedicinalProductPackaged 0
Endpoint 0
EnrollmentRequest 0
Consent 0
CapabilityStatement 0
Measure 0
Medication 0
ResearchSubject 11529
Subscription 0
DocumentReference 81538
GraphDefinition 0
Parameters 0
CoverageEligibilityResponse 0
MeasureReport 0
PractitionerRole 3
SubstanceReferenceInformation 0
RelatedPerson 0
ServiceRequest 0
SupplyRequest 0
Practitioner 4
VerificationResult 0
SubstanceProtein 0
BodyStructure 0
Slot 0
Contract 0
Person 0
RiskAssessment 0
Group 1614
PaymentNotice 0
ResearchDefinition 0
MedicinalProductManufactured 0
Organization 22
CareTeam 0
ImplementationGuide 0
ImagingStudy 0
FamilyMemberHistory 0
ChargeItem 0
ResearchElementDefinition 0
ObservationDefinition 0
Encounter 0
Substance 0
SubstanceSpecification 0
SearchParameter 1635
ActivityDefinition 0
Communication 0
InsurancePlan 0
Linkage 0
SubstanceSourceMaterial 0
ImmunizationEvaluation 0
DeviceU

In [30]:
resources = client.resources('ResearchSubject').limit(1000)
subjects = resources.fetch_all()
for s in subjects[2:10]:
    print(json.dumps(p.serialize()))
    print('_____________________')

{"resourceType": "Patient", "id": "539336", "meta": {"versionId": "1", "lastUpdated": "2021-04-28T23:43:03.994+00:00", "source": "#8ZOQsZIeFG6OKfTP", "profile": ["http://hl7.org/fhir/StructureDefinition/Patient"]}, "extension": [{"url": "http://hl7.org/fhir/us/core/StructureDefinition/us-core-race", "extension": [{"url": "ombCategory", "valueCoding": {"system": "urn:oid:2.16.840.1.113883.6.238", "code": "2106-3", "display": "White"}}, {"url": "text", "valueString": "White"}]}, {"url": "http://hl7.org/fhir/us/core/StructureDefinition/us-core-ethnicity", "extension": [{"url": "ombCategory", "valueCoding": {"system": "urn:oid:2.16.840.1.113883.6.238", "code": "2186-5", "display": "Not Hispanic or Latino"}}, {"url": "text", "valueString": "Not Hispanic or Latino"}]}], "identifier": [{"system": "https://kf-api-dataservice.kidsfirstdrc.org/participants/", "value": "PT_EDK17M1G"}, {"system": "https://kf-api-dataservice.kidsfirstdrc.org/participants/", "value": "?study_id=SD_7NQ9151J&external_

## Async client

In [9]:
import asyncio
from fhirpy import AsyncFHIRClient



endpoint = 'https://ncpi-api-fhir-service-dev.kidsfirstdrc.org'
full_cookie_path = os.path.expanduser('~/.keys/ncpi_fhir_cookie.json')
with open(full_cookie_path) as f:
        cookies = json.load(f)

# Create an instance
client = AsyncFHIRClient(endpoint, extra_headers=cookies)

# Search for patients
resources = client.resources('Patient')  # Return lazy search set
resources = resources.search(name='John').limit(10).sort('name')
#patients = await resources.fetch()  # Returns list of AsyncFHIRResource




In [10]:
resources

<AsyncFHIRSearchSet Patient?name=John&_count=10&_sort=name>

In [13]:
patients = resources.fetch()

In [14]:
patients

<coroutine object AsyncSearchSet.fetch at 0x11c7ff2c0>

In [32]:
resCounts= {"Appointment":0,
"Account":0,
"Invoice":0,
"CatalogEntry":0,
"EventDefinition":0,
"DocumentManifest":0,
"MessageDefinition":0,
"Goal":0,
"MedicinalProductPackaged":0,
"Endpoint":0,
"EnrollmentRequest":0,
"Consent":0,
"CapabilityStatement":0,
"Measure":0,
"Medication":0,
"ResearchSubject":11529,
"Subscription":0,
"DocumentReference":81538,
"GraphDefinition":0,
"Parameters":0,
"CoverageEligibilityResponse":0,
"MeasureReport":0,
"PractitionerRole":3,
"SubstanceReferenceInformation":0,
"RelatedPerson":0,
"ServiceRequest":0,
"SupplyRequest":0,
"Practitioner":4,
"VerificationResult":0,
"SubstanceProtein":0,
"BodyStructure":0,
"Slot":0,
"Contract":0,
"Person":0,
"RiskAssessment":0,
"Group":1614,
"PaymentNotice":0,
"ResearchDefinition":0,
"MedicinalProductManufactured":0,
"Organization":22,
"CareTeam":0,
"ImplementationGuide":0,
"ImagingStudy":0,
"FamilyMemberHistory":0,
"ChargeItem":0,
"ResearchElementDefinition":0,
"ObservationDefinition":0,
"Encounter":0,
"Substance":0,
"SubstanceSpecification":0,
"SearchParameter":1635,
"ActivityDefinition":0,
"Communication":0,
"InsurancePlan":0,
"Linkage":0,
"SubstanceSourceMaterial":0,
"ImmunizationEvaluation":0,
"DeviceUseStatement":0,
"RequestGroup":0,
"DeviceRequest":0,
"MessageHeader":0,
"ImmunizationRecommendation":0,
"Provenance":0,
"Task":3212,
"Questionnaire":0,
"ExplanationOfBenefit":0,
"MedicinalProductPharmaceutical":0,
"ResearchStudy":7,
"Specimen":55715,
"AllergyIntolerance":0,
"CarePlan":0,
"StructureDefinition":664,
"ChargeItemDefinition":0,
"EpisodeOfCare":0,
"OperationOutcome":0,
"Procedure":0,
"List":0,
"ConceptMap":0,
"OperationDefinition":46,
"ValueSet":1329,
"Immunization":0,
"MedicationRequest":0,
"EffectEvidenceSynthesis":0,
"BiologicallyDerivedProduct":0,
"Device":0,
"VisionPrescription":0,
"Media":0,
"MedicinalProductContraindication":0,
"EvidenceVariable":0,
"MolecularSequence":0,
"MedicinalProduct":0,
"DeviceMetric":0,
"CodeSystem":1070,
"Flag":0,
"SubstanceNucleicAcid":0,
"RiskEvidenceSynthesis":0,
"AppointmentResponse":0,
"StructureMap":0,
"AdverseEvent":0,
"GuidanceResponse":0,
"Observation":7957,
"MedicationAdministration":0,
"EnrollmentResponse":0,
"Binary":0,
"Library":0,
"MedicinalProductInteraction":0,
"MedicationStatement":0,
"CommunicationRequest":0,
"TestScript":0,
"Basic":0,
"SubstancePolymer":0,
"TestReport":0,
"ClaimResponse":0,
"MedicationDispense":0,
"DiagnosticReport":5,
"OrganizationAffiliation":0,
"HealthcareService":0,
"MedicinalProductIndication":0,
"NutritionOrder":0,
"TerminologyCapabilities":0,
"Evidence":0,
"AuditEvent":0,
"PaymentReconciliation":0,
"Condition":120405,
"SpecimenDefinition":0,
"Composition":0,
"DetectedIssue":0,
"Bundle":0,
"CompartmentDefinition":5,
"MedicationKnowledge":0,
"MedicinalProductIngredient":0,
"Patient":11538,
"Coverage":0,
"QuestionnaireResponse":0,
"CoverageEligibilityRequest":0,
"NamingSystem":0,
"MedicinalProductUndesirableEffect":0,
"ExampleScenario":0,
"Schedule":0,
"SupplyDelivery":0,
"ClinicalImpression":0,
"DeviceDefinition":0,
"PlanDefinition":0,
"MedicinalProductAuthorization":0,
"Claim":0,
"Location":0}

populatedRes = {}
for r, cnt in resCounts.items():
    if cnt > 0:
        populatedRes[r] =cnt
populatedRes
    

{'ResearchSubject': 11529,
 'DocumentReference': 81538,
 'PractitionerRole': 3,
 'Practitioner': 4,
 'Group': 1614,
 'Organization': 22,
 'SearchParameter': 1635,
 'Task': 3212,
 'ResearchStudy': 7,
 'Specimen': 55715,
 'StructureDefinition': 664,
 'OperationDefinition': 46,
 'ValueSet': 1329,
 'CodeSystem': 1070,
 'Observation': 7957,
 'DiagnosticReport': 5,
 'Condition': 120405,
 'CompartmentDefinition': 5,
 'Patient': 11538}

In [33]:
populatedRes = {'ResearchSubject': 11529,
 'DocumentReference': 81538,
 'PractitionerRole': 3,
 'Practitioner': 4,
 'Group': 1614,
 'Organization': 22,
 'SearchParameter': 1635,
 'Task': 3212,
 'ResearchStudy': 7,
 'Specimen': 55715,
 'StructureDefinition': 664,
 'OperationDefinition': 46,
 'ValueSet': 1329,
 'CodeSystem': 1070,
 'Observation': 7957,
 'DiagnosticReport': 5,
 'Condition': 120405,
 'CompartmentDefinition': 5,
 'Patient': 11538}

In [37]:
resources = client.resources('Condition').limit(10)
res = resources.fetch()
for r in res[0:10]:
    print(json.dumps(r.serialize(), indent=3))
    print('_____________________')

{
   "resourceType": "Condition",
   "id": "585974",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-04-30T08:27:05.630+00:00",
      "source": "#5fIyMrMSNitzbeFq",
      "profile": [
         "https://ncpi-fhir.github.io/ncpi-fhir-ig/StructureDefinition/disease"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/diagnoses/",
         "value": "DG_C11K6VF3"
      }
   ],
   "clinicalStatus": {
      "coding": [
         {
            "system": "http://terminology.hl7.org/CodeSystem/condition-clinical",
            "code": "active",
            "display": "Active"
         }
      ],
      "text": "Active"
   },
   "category": [
      {
         "coding": [
            {
               "system": "http://terminology.hl7.org/CodeSystem/condition-category",
               "code": "encounter-diagnosis",
               "display": "Encounter Diagnosis"
            }
         ]
      }
   ],
   "code": {
      "text": "Left 

In [41]:
document_references = client.resources("DocumentReference")
document_references.search(_profile='http://fhir.ncpi-project-forge.io/StructureDefinition/ncpi-drs-document-reference')
drs_ids = document_references.fetch()
print("# of ids:{}".format(len(drs_ids)))
for d in drs_ids[10:20]:
    print(json.dumps(d.serialize(), indent=3))
    print('_'*50)

# of ids:50
{
   "resourceType": "DocumentReference",
   "id": "767771",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-06-16T02:55:16.269+00:00",
      "source": "#WycIjWH5SzlroYos",
      "profile": [
         "https://ncpi-fhir.github.io/ncpi-fhir-ig/StructureDefinition/ncpi-drs-document-reference"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "GF_3JNFV96A"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "?study_id=SD_BHJXBDQK&external_id=s3://kf-study-us-east-1-prd-sd-bhjxbdqk/source/nantomics/e2824c3f-18c4-458e-bcc5-860dcc35dc4b.local.transcript.bam.bai"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "DocumentReference-SD_BHJXBDQK-s3://kf-study-us-east-1-prd-sd-bhjxbdqk/source/nantomics/e2824c3f-18c4-458e-bcc5-860dcc35dc4b.local.transcript.bam.bai"
      }
   ],
   "

In [None]:
How would we find all the Document References for that patient?

In [98]:
resources = client.resources('DocumentReference')
resources = resources.search(subject='697194').limit(1000)
documents = resources.fetch_all()
print("# of documents:{}".format(len(documents)))

import pandas as pd

myDocs = []
for d in documents:
    djson = d.serialize()

    #print(json.dumps(djson, indent=3))
    #print(djson['subject'])
    kfid = [did for did in djson['identifier'] if did['system']=='urn:kids-first:unique-string']
    myDocs.append({
        "kfid":kfid[0]['value'],
    "type":djson['type']['text'],
    "format":djson['content'][0]['format']['display']
    #"drs":djson['content'][0]['attachment']['url']
                  }
    )

    #print("kfid:{}".format(kfid[0]['value']))
    #print("type:{}".format(djson['type']['text']))
    #print("format:{}".format(djson['content'][0]['format']['display']))
    #print("{}".format(djson['content'][0]['attachment']['url']))
    #print('_'*50)


docsDF = pd.DataFrame(myDocs)
docsDF = docsDF.sort_values(["type", "format"])
from IPython.display import display, HTML
docsDF.style.set_properties(subset=['kfid'], **{'width': '12px'})
display (docsDF)

# of documents:38


Unnamed: 0,kfid,type,format
0,DocumentReference-SD_BHJXBDQK-s3://kf-study-us...,Aligned Reads,bam
5,DocumentReference-SD_BHJXBDQK-s3://kf-study-us...,Aligned Reads,bam
9,DocumentReference-SD_BHJXBDQK-s3://kf-study-us...,Aligned Reads,bam
19,DocumentReference-SD_BHJXBDQK-s3://kf-study-us...,Aligned Reads,bam
23,DocumentReference-SD_BHJXBDQK-s3://kf-study-us...,Aligned Reads,bam
7,DocumentReference-SD_BHJXBDQK-s3://kf-study-us...,Aligned Reads,cram
24,DocumentReference-SD_BHJXBDQK-s3://kf-study-us...,Aligned Reads,cram
29,DocumentReference-SD_BHJXBDQK-s3://kf-study-us...,Aligned Reads,cram
30,DocumentReference-SD_BHJXBDQK-s3://kf-study-us...,Aligned Reads,cram
6,DocumentReference-SD_BHJXBDQK-s3://kf-study-us...,Aligned Reads Index,crai


For these 38 files
Nothing tells us what the different bam files are.
Nothing tells us which crai applies to which cram.

Perhaps looking at samples for this subject might tell us that sequencing was done on more than one sample, e.g. tumor and somatic.

How do we query for samples. We've seen that there are Specimen resources.

The following is a guess - based on what worked for Documents

In [99]:
resources = client.resources('Specimen')
resources = resources.search(subject='697194').limit(1000)
specimens = resources.fetch_all()
print("# of specimens:{}".format(len(specimens)))

# of specimens:17


OK! Let's look at the specimens



In [101]:
for s in specimens:
    print(json.dumps(s.serialize(), indent=3))
    print ('_'*50)

{
   "resourceType": "Specimen",
   "id": "763287",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-06-16T02:52:29.532+00:00",
      "source": "#AOCDAxfya2UvC0DR",
      "profile": [
         "http://hl7.org/fhir/StructureDefinition/Specimen"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/biospecimens/",
         "value": "BS_PEFRDKDZ"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/biospecimens/",
         "value": "?study_id=SD_BHJXBDQK&external_aliquot_id=739813"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "Specimen-SD_BHJXBDQK-739813"
      }
   ],
   "status": "available",
   "type": {
      "coding": [
         {
            "system": "http://terminology.hl7.org/CodeSystem/v2-0487",
            "code": "TISS",
            "display": "Tissue"
         }
      ],
      "text": "Solid Tissue"
   },
   "subject": {
      "reference": "Pa

It's hard to see what's going on there. Let's see if we can summarize. First it looks like there may have been specimens collected at more than one time. There's an extension used to record age. It's done in a complicated way - by defining and event birth, a relationship "after" and an offset a value and unit. (Wouldn't a CDE for days after birth have been easier?)

In [112]:
for s in specimens:
    specimen = s.serialize()
    if '_collectedDateTime' in specimen['collection']:
        offsetDuration = specimen['collection']['_collectedDateTime']['extension'][0]['extension'][2]['valueDuration']
        print("Specimen {} collected at {} {}".format(specimen['id'], offsetDuration['value'],offsetDuration['unit']))
    else:
        print ("Specimen {} collected at unknown time".format(specimen['id']))
    

Specimen 763287 collected at 5407 days
Specimen 763285 collected at unknown time
Specimen 763266 collected at 5407 days
Specimen 763261 collected at 5407 days
Specimen 719528 collected at 5407 days
Specimen 719526 collected at 5407 days
Specimen 719525 collected at 5407 days
Specimen 719524 collected at 5407 days
Specimen 719522 collected at 5407 days
Specimen 719521 collected at 5407 days
Specimen 719519 collected at 5407 days
Specimen 719516 collected at 5407 days
Specimen 719514 collected at 5407 days
Specimen 719513 collected at 5407 days
Specimen 719510 collected at 5407 days
Specimen 719508 collected at 5407 days
Specimen 719507 collected at 5407 days


Apart from one unknown specimen all were collected on the same day.

What was collected?

In [126]:
specimenDetails = []
for s in specimens:
    specimen = s.serialize()
    specimenDetails.append ({
    "id": specimen['id'],
    "type": specimen['type']['text'],
    "site": specimen['collection']['bodySite']['text'],
    "method": specimen['collection']['method']['text']
    })
specimensDF = pd.DataFrame(specimenDetails)
specimensDF
    

Unnamed: 0,id,type,site,method
0,763287,Solid Tissue,Frontal Lobe,Surgical Resections
1,763285,Not Reported,Not Reported,Not Reported
2,763266,Peripheral Whole Blood,Frontal Lobe,Surgical Resections
3,763261,Solid Tissue,Frontal Lobe,Surgical Resections
4,719528,Solid Tissue,Frontal Lobe,Surgical Resections
5,719526,Solid Tissue,Frontal Lobe,Surgical Resections
6,719525,Solid Tissue,Frontal Lobe,Surgical Resections
7,719524,Peripheral Whole Blood,Frontal Lobe,Surgical Resections
8,719522,Solid Tissue,Frontal Lobe,Surgical Resections
9,719521,Peripheral Whole Blood,Frontal Lobe,Surgical Resections


That still doesn't tell us a lot. What was different about the solid tissue samples. Tumor? Normal? And the multiple blood samples can't be distinguished. There's also nothing linking the files to the specimens. Which were used for the sequencingThat still doesn't tell us a lot. What was different about the solid tissue samples. Tumor? Normal? And the multiple blood samples can't be distinguished. There's also nothing linking the files to the specimens. Which were used for the sequencing etc.

  etc.

The Collection method and site. The blood may have been collected during the surgical procedure

In [127]:
specimenDetails = []
for s in specimens:
    specimen = s.serialize()
    kfid = [did for did in specimen['identifier'] if did['system']=='urn:kids-first:unique-string']
    specimenDetails.append ({
    "id": specimen['id'],
    "urn:kids-first:unique-string": kfid[0]['value'],
    "type": specimen['type']['text'],
    "site": specimen['collection']['bodySite']['text'],
    "method": specimen['collection']['method']['text']
    })
specimensDF = pd.DataFrame(specimenDetails)
specimensDF = specimensDF.sort_values(["urn:kids-first:unique-string"])
specimensDF



Unnamed: 0,id,urn:kids-first:unique-string,type,site,method
5,719526,Specimen-SD_BHJXBDQK-551153,Solid Tissue,Frontal Lobe,Surgical Resections
4,719528,Specimen-SD_BHJXBDQK-551154,Solid Tissue,Frontal Lobe,Surgical Resections
13,719513,Specimen-SD_BHJXBDQK-668381,Solid Tissue,Frontal Lobe,Surgical Resections
6,719525,Specimen-SD_BHJXBDQK-668389,Solid Tissue,Frontal Lobe,Surgical Resections
16,719507,Specimen-SD_BHJXBDQK-683272,Peripheral Whole Blood,Frontal Lobe,Surgical Resections
10,719519,Specimen-SD_BHJXBDQK-683273,Peripheral Whole Blood,Frontal Lobe,Surgical Resections
15,719508,Specimen-SD_BHJXBDQK-683274,Peripheral Whole Blood,Frontal Lobe,Surgical Resections
7,719524,Specimen-SD_BHJXBDQK-683275,Peripheral Whole Blood,Frontal Lobe,Surgical Resections
9,719521,Specimen-SD_BHJXBDQK-683373,Peripheral Whole Blood,Frontal Lobe,Surgical Resections
14,719510,Specimen-SD_BHJXBDQK-689897,Solid Tissue,Frontal Lobe,Surgical Resections


In [39]:
resources = client.resources('ResearchStudy').include('ResearchStudy', 'ResearchSubject')
resources = resources.search(_id=577137).limit.3(10)
studies = resources.fetch_all()
studies[0].serialize()

{'resourceType': 'ResearchStudy',
 'id': '577137',
 'meta': {'versionId': '1',
  'lastUpdated': '2021-04-28T23:57:32.722+00:00',
  'source': '#pP70Q2ef3A0Tsaev',
  'profile': ['http://hl7.org/fhir/StructureDefinition/ResearchStudy']},
 'identifier': [{'system': 'https://kf-api-dataservice.kidsfirstdrc.org/studies/',
   'value': 'SD_W0V965XZ'},
  {'system': 'urn:kids-first:unique-string',
   'value': 'ResearchStudy|phs001738'},
  {'system': 'https://kf-api-dataservice.kidsfirstdrc.org/studies/',
   'value': '?external_id=phs001738&version=v1.p1'}],
 'title': 'Genomic Analysis of Familial Leukemia',
 'status': 'completed',
 'principalInvestigator': {'reference': 'PractitionerRole/575854'}}

What about Groups?

In [130]:
resources = client.resources('Group')
#resources = resources.search().limit(1000)
groups = resources.fetch_all()
print("# of specimens:{}".format(len(groups)))
for g in groups[0:10]:
    print(json.dumps(g.serialize(), indent=3))
    print('_'*50)

# of specimens:1614


In [131]:
for g in groups[0:10]:
    print(json.dumps(g.serialize(), indent=3))
    print('_'*50)

{
   "resourceType": "Group",
   "id": "539779",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-04-28T23:43:14.996+00:00",
      "source": "#kkL5Z5tBQUm4WaTH",
      "profile": [
         "http://hl7.org/fhir/StructureDefinition/Group"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/families/",
         "value": "FM_125JCWJG"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/families/",
         "value": "?study_id=SD_7NQ9151J&external_id=BH11885"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "Group|SD_7NQ9151J|BH11885"
      }
   ],
   "type": "person",
   "actual": true,
   "quantity": 3,
   "member": [
      {
         "entity": {
            "reference": "Patient/539568"
         }
      },
      {
         "entity": {
            "reference": "Patient/539570"
         }
      },
      {
         "entity": {
            "reference": "Patie

Groups are families, possibly other things too.

In [132]:
resources = client.resources('Organization')
#resources = resources.search().limit(1000)
orgs = resources.fetch_all()
print("# of orgs:{}".format(len(orgs)))
for g in orgs:
    print(json.dumps(g.serialize(), indent=3))
    print('_'*50)

# of orgs:22
{
   "resourceType": "Organization",
   "id": "540165",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-04-28T23:43:24.434+00:00",
      "source": "#cJuRQrGtJwpVmuTd",
      "profile": [
         "http://hl7.org/fhir/StructureDefinition/Organization"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/sequencing-centers/",
         "value": "SC_X1N69WJM"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "Organization|HudsonAlpha Institute for Biotechnology"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/sequencing-centers/",
         "value": "?name=HudsonAlpha Institute for Biotechnology"
      }
   ],
   "name": "HudsonAlpha Institute for Biotechnology"
}
__________________________________________________
{
   "resourceType": "Organization",
   "id": "546124",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-04-28T23:45

In [133]:

resources = client.resources('CompartmentDefinition')
#resources = resources.search().limit(1000)
orgs = resources.fetch_all()
print("# of CompartmentDefinition:{}".format(len(orgs)))
for g in orgs[0:10]:
    print(json.dumps(g.serialize(), indent=3))
    print('_'*50)

# of CompartmentDefinition:5
{
   "resourceType": "CompartmentDefinition",
   "id": "device",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2020-06-18T20:06:33.892+00:00"
   },
   "url": "http://hl7.org/fhir/CompartmentDefinition/device",
   "version": "4.0.0",
   "name": "Base FHIR compartment definition for Device",
   "status": "draft",
   "experimental": true,
   "date": "2018-12-27T10:06:46-05:00",
   "publisher": "FHIR Project Team",
   "contact": [
      {
         "telecom": [
            {
               "system": "url",
               "value": "http://hl7.org/fhir"
            }
         ]
      }
   ],
   "description": "There is an instance of the device compartment for each Device resource, and the identity of the compartment is the same as the Device. The set of resources associated with a particular device",
   "code": "Device",
   "search": true,
   "resource": [
      {
         "code": "Account",
         "param": [
            "subject"
         ]
      

In [136]:

resources = client.resources('ResearchSubject')
#resources = resources.search().limit(1000)
orgs = resources.fetch_all()
print("# of StructureDefinition:{}".format(len(orgs)))
for g in orgs[100:110]:
    print(json.dumps(g.serialize(), indent=3))
    print('_'*50)

# of StructureDefinition:11529
{
   "resourceType": "ResearchSubject",
   "id": "539987",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-04-28T23:43:20.316+00:00",
      "source": "#Uh6JlKTONpL8Oje4",
      "profile": [
         "http://hl7.org/fhir/StructureDefinition/ResearchSubject"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/participants/",
         "value": "PT_EW0R2AXZ"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/participants/",
         "value": "?study_id=SD_7NQ9151J&external_id=BH10133_2"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "ResearchSubject|SD_7NQ9151J|BH10133_2"
      }
   ],
   "status": "off-study",
   "study": {
      "reference": "ResearchStudy/539879"
   },
   "individual": {
      "reference": "Patient/539435"
   }
}
__________________________________________________
{
   "resourceType": "ResearchSubject",
 

Back to the leukemia stufy

In [207]:
resources = client.resources('ResearchSubject')
studyid = 577137
resources = resources.search(study=studyid).limit(1000)
subjects = resources.fetch_all()
print("# of subjects in study {}:{}".format(studyid, len(subjects)))
for s in subjects[100:110]:
    print(json.dumps(s.serialize(), indent=3))
    print('_'*50)

# of subjects in study 577137:620
{
   "resourceType": "ResearchSubject",
   "id": "577638",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-04-28T23:57:46.317+00:00",
      "source": "#rCZyQOLV2r4FjIoV",
      "profile": [
         "http://hl7.org/fhir/StructureDefinition/ResearchSubject"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/participants/",
         "value": "PT_ZBTJAWM3"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/participants/",
         "value": "?study_id=SD_W0V965XZ&external_id=SJNORM057651"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "ResearchSubject|SD_W0V965XZ|SJNORM057651"
      }
   ],
   "status": "off-study",
   "study": {
      "reference": "ResearchStudy/577137"
   },
   "individual": {
      "reference": "Patient/576359"
   }
}
__________________________________________________
{
   "resourceType": "ResearchSu

In [204]:
def getFiles(client, subject_id):
    resources = client.resources('DocumentReference')
    resources = resources.search(subject=subject_id).limit(1000)
    documents = resources.fetch_all()
    print("# of documents for subject {} :{}".format(subject_id, len(documents)))

    myDocs = []
    for d in documents:
        djson = d.serialize()
        print(json.dumps(djson, indent=3))
        #kfid = [did for did in djson['identifier'] if did['system']=='urn:kids-first:unique-string']
        myDocs.append({

        "type":djson['type']['text'],
        "format":djson['content'][0]['format']['display']
        })

    docsDF = pd.DataFrame(myDocs)
    docsDF = docsDF.sort_values(["type", "format"])
    print(docsDF)

In [160]:
for s in subjects[100:110]:
    patient = s.serialize()['individual']['reference']
    getFiles(client, patient)
    print('_'*50)

# of documents for subject Patient/576359 :8
{
   "resourceType": "DocumentReference",
   "id": "579853",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-04-28T23:58:29.558+00:00",
      "source": "#JSVjLb0SqgVt45qn",
      "profile": [
         "https://ncpi-fhir.github.io/ncpi-fhir-ig/StructureDefinition/ncpi-drs-document-reference"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "GF_QVM2Y3KB"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "?study_id=SD_W0V965XZ&external_id=s3://kf-study-us-east-1-prd-sd-w0v965xz/harmonized-data/raw-gvcf/07f5cc15-199f-47a6-8f7b-1a899532d23c.g.vcf.gz.tbi"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "DocumentReference|SD_W0V965XZ|s3://kf-study-us-east-1-prd-sd-w0v965xz/harmonized-data/raw-gvcf/07f5cc15-199f-47a6-8f7b-1a899532d23c.g.vcf.g

# of documents for subject Patient/576338 :8
{
   "resourceType": "DocumentReference",
   "id": "579639",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-04-28T23:58:25.709+00:00",
      "source": "#zYfT9JC6i4LJfP9Y",
      "profile": [
         "https://ncpi-fhir.github.io/ncpi-fhir-ig/StructureDefinition/ncpi-drs-document-reference"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "GF_3DYW6YRK"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "?study_id=SD_W0V965XZ&external_id=s3://kf-study-us-east-1-prd-sd-w0v965xz/harmonized-data/aligned-reads/57870523-9934-4aa3-87a0-f6dd15e588bf.cram.crai"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "DocumentReference|SD_W0V965XZ|s3://kf-study-us-east-1-prd-sd-w0v965xz/harmonized-data/aligned-reads/57870523-9934-4aa3-87a0-f6dd15e588bf.

# of documents for subject Patient/576365 :8
{
   "resourceType": "DocumentReference",
   "id": "579884",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-04-28T23:58:30.718+00:00",
      "source": "#bIIkqxmlbybAq8zE",
      "profile": [
         "https://ncpi-fhir.github.io/ncpi-fhir-ig/StructureDefinition/ncpi-drs-document-reference"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "GF_9JXAVXC0"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "?study_id=SD_W0V965XZ&external_id=s3://kf-study-us-east-1-prd-sd-w0v965xz/harmonized-data/raw-gvcf/6c69d0d1-249e-4d4b-9624-461e9aa86219.g.vcf.gz.tbi"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "DocumentReference|SD_W0V965XZ|s3://kf-study-us-east-1-prd-sd-w0v965xz/harmonized-data/raw-gvcf/6c69d0d1-249e-4d4b-9624-461e9aa86219.g.vcf.g

# of documents for subject Patient/576341 :8
{
   "resourceType": "DocumentReference",
   "id": "579709",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-04-28T23:58:27.083+00:00",
      "source": "#IqySazj5zmKqt81z",
      "profile": [
         "https://ncpi-fhir.github.io/ncpi-fhir-ig/StructureDefinition/ncpi-drs-document-reference"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "GF_M6N4ZCF6"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "?study_id=SD_W0V965XZ&external_id=s3://kf-study-us-east-1-prd-sd-w0v965xz/harmonized-data/aligned-reads/eed3a9e3-8584-46c8-a260-fd6f66293ecc.cram.crai"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "DocumentReference|SD_W0V965XZ|s3://kf-study-us-east-1-prd-sd-w0v965xz/harmonized-data/aligned-reads/eed3a9e3-8584-46c8-a260-fd6f66293ecc.

# of documents for subject Patient/576346 :8
{
   "resourceType": "DocumentReference",
   "id": "579759",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-04-28T23:58:27.888+00:00",
      "source": "#dqOzFcMIBOWVMjzF",
      "profile": [
         "https://ncpi-fhir.github.io/ncpi-fhir-ig/StructureDefinition/ncpi-drs-document-reference"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "GF_KSFYTBP3"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "?study_id=SD_W0V965XZ&external_id=s3://kf-study-us-east-1-prd-sd-w0v965xz/source/hudsonalpha/haib17CGM4219/SL337039/SL337039.hard-filtered.vcf.gz"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "DocumentReference|SD_W0V965XZ|s3://kf-study-us-east-1-prd-sd-w0v965xz/source/hudsonalpha/haib17CGM4219/SL337039/SL337039.hard-filtered.vcf.gz"

Check for 

In [169]:
resources = client.resources('DocumentReference')
resources = resources.search(format__text='bam').limit(1000)
documents = resources.fetch_all()
print("# of documents:{}".format(len(documents)))

import pandas as pd

print ("{}".format(len(documents)))
file1 = open("bam_subjects.txt","w") 
for d in documents:
    djson = d.serialize()
    #print(djson['subject'])
    file1.write(djson['subject']['reference']+'\n')

file1.close()




# of documents:4860
4860


In [170]:
resources = client.resources('DocumentReference')
documents = resources.fetch_all()
print("# of documents:{}".format(len(documents)))

import pandas as pd

file1 = open("all_docs.txt","w") 
for d in documents:
    djson = d.serialize()
    if 'format' in djson['content'][0]:
        fmt = djson['content'][0]['format']['display']
    else:
        fmt = 'none'
    file1.write("{}\t{}\n".format(fmt,djson['subject']['reference']))                
file1.close()



# of documents:81538
81538


In [208]:
resources = client.resources('ResearchSubject')
studyid = 701322
resources = resources.search(study=studyid).limit(1000)
subjects = resources.fetch_all()
print("# of subjects in study {}:{}".format(studyid, len(subjects)))
for s in subjects[100:110]:
    print(json.dumps(s.serialize(), indent=3))
    print('_'*50)

# of subjects in study 701322:4170
{
   "resourceType": "ResearchSubject",
   "id": "705201",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-06-16T02:04:35.846+00:00",
      "source": "#DfT0vO78yvejA00Y",
      "profile": [
         "http://hl7.org/fhir/StructureDefinition/ResearchSubject"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/participants/",
         "value": "PT_P9RT6693"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/participants/",
         "value": "?study_id=SD_BHJXBDQK&external_id=C743289"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "ResearchSubject-SD_BHJXBDQK-C743289"
      }
   ],
   "status": "off-study",
   "study": {
      "reference": "ResearchStudy/701322"
   },
   "individual": {
      "reference": "Patient/701050"
   }
}
__________________________________________________
{
   "resourceType": "ResearchSubject",
 

In [218]:
for s in subjects[100:110]:
    patient = s.serialize()['individual']['reference']
    getFiles2(client, patient, 'bam')
    print('_'*50)

# of documents for subject Patient/701050 :5
{
   "resourceType": "DocumentReference",
   "id": "807758",
   "meta": {
      "versionId": "1",
      "lastUpdated": "2021-06-16T03:15:59.337+00:00",
      "source": "#jtHMlNcsFzdcE4jY",
      "profile": [
         "https://ncpi-fhir.github.io/ncpi-fhir-ig/StructureDefinition/ncpi-drs-document-reference"
      ]
   },
   "identifier": [
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "GF_CAE2D8JR"
      },
      {
         "system": "https://kf-api-dataservice.kidsfirstdrc.org/genomic-files/",
         "value": "?study_id=SD_BHJXBDQK&external_id=s3://kf-study-us-east-1-prd-sd-bhjxbdqk/source/nantomics/58a077d6-e970-4074-a8bb-9d4ace3cddf0.local.transcript.bam"
      },
      {
         "system": "urn:kids-first:unique-string",
         "value": "DocumentReference-SD_BHJXBDQK-s3://kf-study-us-east-1-prd-sd-bhjxbdqk/source/nantomics/58a077d6-e970-4074-a8bb-9d4ace3cddf0.local.transcrip

# of documents for subject Patient/700996 :0
__________________________________________________
# of documents for subject Patient/701011 :0
__________________________________________________
# of documents for subject Patient/701030 :0
__________________________________________________
# of documents for subject Patient/701038 :0
__________________________________________________


In [217]:
def getFiles2(client, subject_id, file_type):
    resources = client.resources('DocumentReference')
    resources = resources.search(subject=subject_id, format__text=file_type).limit(1000)
    documents = resources.fetch_all()
    print("# of documents for subject {} :{}".format(subject_id, len(documents)))

    myDocs = []
    for d in documents:
        djson = d.serialize()
        print(json.dumps(djson, indent=3))
        #kfid = [did for did in djson['identifier'] if did['system']=='urn:kids-first:unique-string']
        myDocs.append({

        "type":djson['type']['text'],
        "format":djson['content'][0]['format']['display']
        })

    if len(myDocs) > 0:
        docsDF = pd.DataFrame(myDocs)
        docsDF = docsDF.sort_values(["type", "format"])
        print(docsDF)