### Comparison of file origin between CDA and PDC API

The [09CO022 Example- CDA v2 notebook](https://github.com/ianfore/cdatest/blob/main/v2tests/09CO022%20Example%20-%20CDA%20v2.ipynb) showed that CDA lists the file ids of both tumor and normal samples as identical. This was true for many studies. This seems odd. A key part of the study design is likely to be a comparison of tumor and normal. One would expect at least some files to be specific to normal and to tumor. 

We should be able to use the PDC API to determine what the PDC says about these files, and confirm whether they derive from one specimen (aliquot) only, or from both tumor and normal.

First we'll define a function to process and summarize the files for a given study

In [1]:
from cdapython import Q, columns, unique_terms

def processProject(projName):
    pq = Q('ResearchSubject.associated_project = "{}"'.format(projName))
    pr = pq.run(limit=2)
    for subject2 in pr[1]['ResearchSubject']:
        subid = subject2['identifier'][0]
        if subid['system'] == 'PDC':
            print('_'*50)
            print("Subject: {}:{}".format(subid['system'],subid['value']))
            print("Specimen count: {}".format(len(subject2['Specimen'])))
            lastFileList = []
            for s in subject2['Specimen']:
                print('_'*10)
                print ('Specimen {}'.format(s['id']))
                print("Source material {}".format(s['source_material_type']))
                print ("{} derived from {}".format(s['specimen_type'],s['derived_from_specimen']))
                print ("files {}".format(len(s['File'])))
                specimenFiles = {}
                filesPrinted = False
                for f in s['File']:
                    specimenFiles[f['id']] = f['label']
                #specimenFiles.sort()
                if specimenFiles == lastFileList:
                    print("Same file content as previous specimen")
                lastFileList = specimenFiles
    return lastFileList

Now we can use that to look at what's going on in a number of studies

### Study: Integrated Proteogenomic Characterization of HBV-related Hepatocellular carcinoma

In [2]:
file_ids = processProject('Integrated Proteogenomic Characterization of HBV-related Hepatocellular carcinoma')

Getting results from database

Total execution time: 17269 ms
__________________________________________________
Subject: PDC:7a59a31a-1168-11ea-9bfa-0a42f3c845fe
Specimen count: 4
__________
Specimen 7a5b4b48-1168-11ea-9bfa-0a42f3c845fe
Source material Primary Tumor
sample derived from initial specimen
files 281
__________
Specimen 7a5d6240-1168-11ea-9bfa-0a42f3c845fe
Source material Primary Tumor
aliquot derived from 7a5b4b48-1168-11ea-9bfa-0a42f3c845fe
files 281
Same file content as previous specimen
__________
Specimen 7a5b4c3a-1168-11ea-9bfa-0a42f3c845fe
Source material Solid Tissue Normal
sample derived from initial specimen
files 281
Same file content as previous specimen
__________
Specimen 7a5d6460-1168-11ea-9bfa-0a42f3c845fe
Source material Solid Tissue Normal
aliquot derived from 7a5b4c3a-1168-11ea-9bfa-0a42f3c845fe
files 281
Same file content as previous specimen


#### Define a function to use the PDC API to list some details about a file

In [3]:
#Get details about a single file
import requests
import json

def getPDCFileDetails(file_id):
    # The URL for our API calls
    url = 'https://pdc.cancer.gov/graphql'

    # query to get file metadata

    query = '{ fileMetadata(file_id: "'+file_id+'" acceptDUA: true) { ' + '''
        file_name
        file_size
        md5sum
        file_location
        file_submitter_id
        fraction_number
        experiment_type
        aliquots {
          aliquot_id
          aliquot_submitter_id
          sample_id
          sample_submitter_id
        }
      }
    }'''


    response = requests.post(url, json={'query': query})

    if(response.ok):
        #If the response was OK then print the returned JSON
        jData = json.loads(response.content)
        fileMD = jData['data']['fileMetadata'][0]
        #print (fileMetadata['file_name'],fileMetadata['fraction_number'],fileMetadata['experiment_type'])
        aqCount = len(fileMD['aliquots'])
        #if aqCount != 4:
        #print ("File {} has {} aliquots".format(file_id, aqCount))
        print("fraction:{} Expt type:{}".format(fileMD['fraction_number'], fileMD['experiment_type']))
        print("Linked aliquots:")
        for a in fileMD['aliquots']:
            print("sample_submitter_id:{} aliquot_id:{}".format(a['sample_submitter_id'], a['aliquot_id']))
            #print(a)

    else:
        # If response code is not ok (200), print the resulting http error code with description
        response.raise_for_status()

And test the function

In [4]:
getPDCFileDetails("0431f842-6ef0-47c0-b5b2-e68d64cd9837")

fraction:12 Expt type:iTRAQ4
Linked aliquots:
sample_submitter_id:Tumor Pat64 aliquot_id:e8dbfb3d-693a-11ea-b1fd-0aad30af8a83
sample_submitter_id:Tumor Pat51 aliquot_id:e8dbb9bb-693a-11ea-b1fd-0aad30af8a83
sample_submitter_id:adjacent Pat64 aliquot_id:e8daf96a-693a-11ea-b1fd-0aad30af8a83
sample_submitter_id:adjacent Pat51 aliquot_id:e8dae938-693a-11ea-b1fd-0aad30af8a83


Now run it on the whole study:  Integrated Proteogenomic Characterization of HBV-related Hepatocellular carcinoma

In [5]:
for id, lb in file_ids.items():
    print("File:{}".format(lb))
    getPDCFileDetails(id)

File:20180226_HF_ZHW_total_liver24_F26.mzid.gz
fraction:26 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180614_fusion_zhw_TMT11_liver24_pho_F01.raw
fraction:1 Expt type:TMT11
Linked aliquots:
sample_submitter_id:M

fraction:9 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180614_fusion_zhw_TMT11_liver24_pho_F10.raw
fraction:10 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c84

fraction:18 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F36.psm
fraction:36 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sam

fraction:4 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F05.psm
fraction:5 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sampl

fraction:41 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F33.mzid.gz
fraction:33 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe

fraction:26 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180614_fusion_zhw_TMT11_liver24_pho_F13.psm
fraction:13 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c8

fraction:43 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180614_fusion_zhw_TMT11_liver24_pho_F19.psm
fraction:19 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c8

fraction:36 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F37.mzML.gz
fraction:37 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe

fraction:12 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180614_fusion_zhw_TMT11_liver24_pho_F18.psm
fraction:18 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c8

fraction:17 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F11.mzML.gz
fraction:11 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe

fraction:5 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F07.mzid.gz
fraction:7 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
s

fraction:47 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F15.mzML.gz
fraction:15 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe

fraction:12 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180614_fusion_zhw_TMT11_liver24_pho_F23.psm
fraction:23 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c8

fraction:19 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180614_fusion_zhw_TMT11_liver24_pho_F16.psm
fraction:16 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c8

fraction:10 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F17.psm
fraction:17 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sam

fraction:43 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F22.mzid.gz
fraction:22 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe

fraction:20 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F04.mzid.gz
fraction:4 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe


fraction:21 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180614_fusion_zhw_TMT11_liver24_pho_F13.mzid.gz
fraction:13 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42

fraction:21 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F18.mzid.gz
fraction:18 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe

fraction:40 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F03.mzML.gz
fraction:3 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe


fraction:44 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180614_fusion_zhw_TMT11_liver24_pho_F10.mzML.gz
fraction:10 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42

fraction:5 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F06.raw
fraction:6 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sampl

fraction:38 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F15.raw
fraction:15 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sam

fraction:25 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180614_fusion_zhw_TMT11_liver24_pho_F01.mzML.gz
fraction:1 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f

fraction:33 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F46.psm
fraction:46 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sam

fraction:41 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180226_HF_ZHW_total_liver24_F42.raw
fraction:42 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sam

fraction:4 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:813 aliquot_id:7a5d6532-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:814 aliquot_id:7a5d69a0-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:745 aliquot_id:7a5d54ef-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:746 aliquot_id:7a5d5802-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:755 aliquot_id:7a5d590c-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:756 aliquot_id:7a5d5b11-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:777 aliquot_id:7a5d5d2e-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:778 aliquot_id:7a5d613f-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:785 aliquot_id:7a5d6240-1168-11ea-9bfa-0a42f3c845fe
sample_submitter_id:786 aliquot_id:7a5d6460-1168-11ea-9bfa-0a42f3c845fe
File:20180614_fusion_zhw_TMT11_liver24_pho_F16.mzML.gz
fraction:16 Expt type:TMT11
Linked aliquots:
sample_submitter_id:MIX aliquot_id:7a5e18c9-1168-11ea-9bfa-0a42f

In [24]:
### Georgetown Lung Cancer Proteomics Study

In [9]:
gtown_file_ids = processProject('Georgetown Lung Cancer Proteomics Study')

Getting results from database

Total execution time: 9691 ms
__________________________________________________
Subject: PDC:9e8e80e5-d732-11ea-b1fd-0aad30af8a83
Specimen count: 4
__________
Specimen 9e8e933e-d732-11ea-b1fd-0aad30af8a83
Source material Primary Tumor
sample derived from initial specimen
files 48
__________
Specimen 9e8eb67b-d732-11ea-b1fd-0aad30af8a83
Source material Primary Tumor
aliquot derived from 9e8e933e-d732-11ea-b1fd-0aad30af8a83
files 48
Same file content as previous specimen
__________
Specimen 9e8e95b1-d732-11ea-b1fd-0aad30af8a83
Source material Solid Tissue Normal
sample derived from initial specimen
files 48
Same file content as previous specimen
__________
Specimen 9e8eb77b-d732-11ea-b1fd-0aad30af8a83
Source material Solid Tissue Normal
aliquot derived from 9e8e95b1-d732-11ea-b1fd-0aad30af8a83
files 48
Same file content as previous specimen


In [10]:
n = 0
for id, lb in gtown_file_ids.items():
    n += 1
    print(n,"_"*50)
    print("File:{}".format(lb))
    getPDCFileDetails(id)

1 __________________________________________________
File:Ctrl_1-set_1-label_113-frac_2-F9.wiff
fraction:9 Expt type:iTRAQ8
Linked aliquots:
sample_submitter_id:ICBI-000003-01 aliquot_id:9e8ebae6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000003-02 aliquot_id:9e8ebbc6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ReferenceMix aliquot_id:9e8ecf0b-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000002-01 aliquot_id:9e8eb85a-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000002-02 aliquot_id:9e8eb932-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000010-01 aliquot_id:9e8ecc31-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000001-02 aliquot_id:9e8eb77b-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000001-01 aliquot_id:9e8eb67b-d732-11ea-b1fd-0aad30af8a83
2 __________________________________________________
File:Ctrl_1-set_1-label_113-frac_2-F9.wiff.scan
fraction:9 Expt type:iTRAQ8
Linked aliquots:
sample_submitter_id:ICBI-000003-01 aliquot_id:9e8eb

fraction:3 Expt type:iTRAQ8
Linked aliquots:
sample_submitter_id:ICBI-000003-01 aliquot_id:9e8ebae6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000003-02 aliquot_id:9e8ebbc6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ReferenceMix aliquot_id:9e8ecf0b-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000002-01 aliquot_id:9e8eb85a-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000002-02 aliquot_id:9e8eb932-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000010-01 aliquot_id:9e8ecc31-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000001-02 aliquot_id:9e8eb77b-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000001-01 aliquot_id:9e8eb67b-d732-11ea-b1fd-0aad30af8a83
14 __________________________________________________
File:Ctrl_3-set_1-label_115-frac_1-F3.wiff.scan
fraction:3 Expt type:iTRAQ8
Linked aliquots:
sample_submitter_id:ICBI-000003-01 aliquot_id:9e8ebae6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000003-02 aliquot_id:9e8ebbc6-d732-11e

fraction:24 Expt type:iTRAQ8
Linked aliquots:
sample_submitter_id:ICBI-000003-01 aliquot_id:9e8ebae6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000003-02 aliquot_id:9e8ebbc6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ReferenceMix aliquot_id:9e8ecf0b-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000002-01 aliquot_id:9e8eb85a-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000002-02 aliquot_id:9e8eb932-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000010-01 aliquot_id:9e8ecc31-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000001-02 aliquot_id:9e8eb77b-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000001-01 aliquot_id:9e8eb67b-d732-11ea-b1fd-0aad30af8a83
25 __________________________________________________
File:Tumor_1-set_1-label_116-frac_1-F4.wiff
fraction:4 Expt type:iTRAQ8
Linked aliquots:
sample_submitter_id:ICBI-000003-01 aliquot_id:9e8ebae6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000003-02 aliquot_id:9e8ebbc6-d732-11ea-b

fraction:21 Expt type:iTRAQ8
Linked aliquots:
sample_submitter_id:ICBI-000003-01 aliquot_id:9e8ebae6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000003-02 aliquot_id:9e8ebbc6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ReferenceMix aliquot_id:9e8ecf0b-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000002-01 aliquot_id:9e8eb85a-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000002-02 aliquot_id:9e8eb932-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000010-01 aliquot_id:9e8ecc31-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000001-02 aliquot_id:9e8eb77b-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000001-01 aliquot_id:9e8eb67b-d732-11ea-b1fd-0aad30af8a83
37 __________________________________________________
File:Tumor_3-set_1-label_118-frac_1-F6.wiff
fraction:6 Expt type:iTRAQ8
Linked aliquots:
sample_submitter_id:ICBI-000003-01 aliquot_id:9e8ebae6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000003-02 aliquot_id:9e8ebbc6-d732-11ea-b

fraction:23 Expt type:iTRAQ8
Linked aliquots:
sample_submitter_id:ICBI-000003-01 aliquot_id:9e8ebae6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000003-02 aliquot_id:9e8ebbc6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ReferenceMix aliquot_id:9e8ecf0b-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000002-01 aliquot_id:9e8eb85a-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000002-02 aliquot_id:9e8eb932-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000010-01 aliquot_id:9e8ecc31-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000001-02 aliquot_id:9e8eb77b-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000001-01 aliquot_id:9e8eb67b-d732-11ea-b1fd-0aad30af8a83
48 __________________________________________________
File:Tumor_4-set_1-label_119-frac_3-F23.wiff.scan
fraction:23 Expt type:iTRAQ8
Linked aliquots:
sample_submitter_id:ICBI-000003-01 aliquot_id:9e8ebae6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000003-02 aliquot_id:9e8ebbc6-d732

These illustrate that there are situations where identical file content for different specimens is to be expected. 

Many PDC files likely genuinely derive from multiple samples.

The issue at [cda-python/issues/99](https://github.com/CancerDataAggregator/cda-python/issues/99) leads to a lack of confidence whether this is always accurate, so some exploration is warranted. We want to be sure these relationships reflect real science rather than issues with data stewardship.

What is going on in the above example? Some possibilities come to mind:
* When these were going into a mass spec were the samples multiplexed into a single sample for the MS?
* Or did they go through MS separately, with individual mass spec files?
    * And we don't have those individual files
* Or maybe the mass spec instrument software does the aggregation of data from separate samples/aliquots and we never see the individual specimen files?

I asked Chris
>Yes, the samples are multiplexed into one plex and then injected into LC-MS/MS.  Multiplexing provides a means for relative quantitation.  The Expt type is iTRAQ8, which is the Applied BioSystems kit for combining 8 samples into 1 plex.  The kit has 8 tags, one for each sample going into the plex.  The tags are 8 isotopic versions of the same molecule, so their chromatographic properties should be the same.  Between the MS and MS steps of the pipeline, there is a fragmentation step where the ion breaks apart at its weakest link.  The weakest link is generally between the peptide and the tag.  In the second MS step, the tags should show up in the spectra next to each other (differing by 1 amu).  Taking the relative intensity of the tag peaks provides a decent measure of quantitation for the peptide from each sample.
> 
>The second channel in the plex is a ReferenceMix, which is likely a mixture of all samples in the experiment.  Running an aliquot of the ReferenceMix in every plex provides a measure of relative quantitation across the entire experiment.  It’s not sufficiently quantitative for a clinical assay, but it’s much better than comparing peak intensities from different MS runs.
> 
>This aspect of proteomics data does make it challenging to represent in something like CDA.  One file does contain information about peptide-ions from 8 different samples.
> 
>In terms of one file containing both the tumor-normal pair, yes, CPTAC aligns the plexes such that tumor-normal pairs are present in the same plex.

A key point here is that it is relative quantitation. That is important to the data model. We'll come back to that.

What's interesting is that in these plexes there are more than just the tumor-normal pair

What else is in these plexes? i.e. what did the study designers build into their study data model.

Let's take a specific file as an example, from the Georgetown study.
id:32192cf4-6494-11ea-b1fd-0aad30af8a83 
file: Ctrl_1-set_1-label_113-frac_2-F9.wiff

In [13]:
id = '32192cf4-6494-11ea-b1fd-0aad30af8a83'
getPDCFileDetails(id)

fraction:9 Expt type:iTRAQ8
Linked aliquots:
sample_submitter_id:ICBI-000003-01 aliquot_id:9e8ebae6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000003-02 aliquot_id:9e8ebbc6-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ReferenceMix aliquot_id:9e8ecf0b-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000002-01 aliquot_id:9e8eb85a-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000002-02 aliquot_id:9e8eb932-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000010-01 aliquot_id:9e8ecc31-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000001-02 aliquot_id:9e8eb77b-d732-11ea-b1fd-0aad30af8a83
sample_submitter_id:ICBI-000001-01 aliquot_id:9e8eb67b-d732-11ea-b1fd-0aad30af8a83


In [17]:
aq_ids = ['9e8ebae6-d732-11ea-b1fd-0aad30af8a83',
'9e8ebbc6-d732-11ea-b1fd-0aad30af8a83',
'9e8ecf0b-d732-11ea-b1fd-0aad30af8a83',
'9e8eb85a-d732-11ea-b1fd-0aad30af8a83',
'9e8eb932-d732-11ea-b1fd-0aad30af8a83',
'9e8ecc31-d732-11ea-b1fd-0aad30af8a83',
'9e8eb77b-d732-11ea-b1fd-0aad30af8a83',
'9e8eb67b-d732-11ea-b1fd-0aad30af8a83']

In [23]:
query1 = """SELECT sp.derived_from_subject, sp.source_material_type
from gdc-bq-sample.integration.all_v2 AS su,
unnest(ResearchSubject) AS rs,
unnest(Specimen) AS sp
where sp.id in
('9e8ebae6-d732-11ea-b1fd-0aad30af8a83',
'9e8ebbc6-d732-11ea-b1fd-0aad30af8a83',
'9e8ecf0b-d732-11ea-b1fd-0aad30af8a83',
'9e8eb85a-d732-11ea-b1fd-0aad30af8a83',
'9e8eb932-d732-11ea-b1fd-0aad30af8a83',
'9e8ecc31-d732-11ea-b1fd-0aad30af8a83',
'9e8eb77b-d732-11ea-b1fd-0aad30af8a83',
'9e8eb67b-d732-11ea-b1fd-0aad30af8a83')
""".format(aq)

r1 = Q.sql(query1)
aqdata = r1[0]

from cda_funcs import qResultsToDF 
qResultsToDF(r1)

Unnamed: 0,derived_from_subject,source_material_type
0,ICBI-000010,Primary Tumor
1,ReferenceMix,Not Reported
2,ICBI-000002,Primary Tumor
3,ICBI-000002,Solid Tissue Normal
4,ICBI-000003,Primary Tumor
5,ICBI-000003,Solid Tissue Normal
6,ICBI-000001,Primary Tumor
7,ICBI-000001,Solid Tissue Normal


This 8-plex has three tumor-normal pairs where a comparison makes sense.
The ReferenceMix is in there for normalization in the way Chris suggested (scientific not 3NF type normalization!).
What can we do with the tumor from ICBI-000010? There's no normal in there for comparison. 
Finished with digging for now, but am guessing another plex exists where it is compared with its normal partner specimen.
Does that give an additional way to normalize the relative values across plexes? Does one scatter individual 
samples like this across the plexes for that purpose?

### Conclusions
The observations here give rise to some CDA feature/CRDC-H model considerations. Surfacing these as [cda-python/issues/105](https://github.com/CancerDataAggregator/cda-python/issues/105).

While it’s mechanically true that a file derives from two separate samples, the scientific relevance of the file in these examples is in what it says about the case. The mechanistic part is certainly relevant when we want to think about reproducibility, but at a clinical, or other level where we want to make biological inferences, the focus is on seeing the forest rather than the trees. CDA was set up more to address the latter focus. That has been a driver for the model. The examples here may help move things in the right direction.

How you set your plexes up determines the ‘meaning’. And meaning is semantics. Those semantics are created by the study investigators. The designers of the CPTAC studies knew these designs/models from the get go. We are uncovering them retrospectively.
 
The potential value of capturing study specific models/designs that were created by the investigators has been highlighted before. There should indeed be a common model, but it should be a model that allows capture and communication of the study specific models. These proteomics studies and their availability in CDA allow us to explore this with real data. 



#### A retrospective observation:
Wiht the benefit of hindsight we can see that a clue to the file being composed of a specific number of samples was hiding in plain sight. The experiment type attribute of the file had this coded in.
Experiment type TMT11 has 11 aliquots, iTRAQ8 has 8, iTRAQ4 has 4 and TMT10 has 10.