# Introduction
This notebook walks through the basics of using the Data Hub API to work on, validate, and submit your data.  These APIs are designed to allow users to perform all the actions that can be done via the [Data Submission Portal](https://hub.datacommons.cancer.gov/) from a notebook or script.  The intent is to allow submitters to operate directly from their own environments if they so choose rathar than work through the graphical submission interface.

There are a few prerequists that you have to meet before you can use this API:

# Prerequisites

## GraphQL
The Data Hub API uses [GraphQL](https://graphql.org/) and a good understanding of how to use GraphQL is required.  Since GraphQL can be complex, a tutorial is beyond the scope of this document, however the [GraphQL Documentation](https://graphql.org/learn/) can be very useful.

## Login.gov account
Use of Data Hub in general requires that a user have an account registered with [Login.gov](https://www.login.gov/) (NIH users can use their NIH account and PIV card).  Note that a Login.gov account is distinct from an eRA Commons identity that is frequently used at NIH.  They are not the same thing.

## Approved Submission
You must recieve approval to submit data to CRDC prior to using the Data Hub APIs.  If you need approval, please read and follow the [Submissions Request Instructions](https://datacommons.cancer.gov/submit).  Instructions for using the graphical data submission process are on the same page.

## An API Token
If you are an approved submitter with a Login.gov or NIH account, you can generate an API token from the graphical interface.  Log into the system, then click on your user name and select the **API Token** menu option.  This will bring up a dialog box that allows you to create an API token and copy it to your clipboard.  There are two things to note about API tokens
- The token is tied to your user identity and can be used on any submission that you're approved to work on.
- You can have only one token at a time.  Generating a new token will revoke the previous token.


In [1]:
import requests
import os

The imports below are just used for display purposes in this notebook, they're not required to interact with the Data Hub API

In [2]:
import pandas as pd
from IPython.display import display, Markdown, Latex

API Endpoints

In [3]:
prod = 'https://hub.datacommons.cancer.gov/api/graphql'
#Note that use of Dev2 requires a VPN connection through the NIH firewall
dev2 = 'https://hub-dev2.datacommons.cancer.gov/api/graphql'

# Security Note
It is ***highly*** recommended that you keep your API token secure and not include it in any code.  While there are many ways to do this, for the purposes of this notebook it's been set in an environment variable names "DEV2API".

In [4]:
dev2APIKey = os.environ['DEV2API']

In [5]:
def apiQuery(url, query, variables,headers):
    token = os.environ['DEV2API']
    if headers is None:
        headers = {"Authorization": f"Bearer {token}"}
    else:
        headers["Authorization"] = f"Bearer {token}"
    try:
        if variables is None:
            result = requests.post(url = url, headers = headers, json={"query": query})
        else:
            result = requests.post(url = url, headers = headers, json = {"query":query, "variables":variables})
        if result.status_code == 200:
            return result.json()
        else:
            print(f"Error: {result.status_code}")
            return result.content
    except requests.exceptions.HTTPError as e:
        return(f"HTTP Error: {e}")

# Step 1: Understanding the landscape

Let's assume that this is our first submission using the API, so what we need to do is list the studies that my orgnaization is approved for so I can submit to the correct study. That's done with the *listApprovedStudiesOfMyOrganization* query

In [6]:
org_query = """
{
  listApprovedStudiesOfMyOrganization{
    originalOrg
    dbGaPID
    studyAbbreviation
    studyName
    _id
  }
}
"""

Note that the actual results returned by this query will vary for each organization.  These are examples only and shouldn't be used.

In [7]:
org_res = apiQuery(dev2, org_query,None, None)

In [8]:
org_df = pd.DataFrame(org_res['data']['listApprovedStudiesOfMyOrganization'])
display(Markdown(org_df.to_markdown()))

|    | originalOrg                                    | dbGaPID   | studyAbbreviation   | studyName                                                                                       | _id                                  |
|---:|:-----------------------------------------------|:----------|:--------------------|:------------------------------------------------------------------------------------------------|:-------------------------------------|
|  0 | Purdue Center for Cancer Research              |           | UBC01               | Antitumor Activity and Molecular Effects of Vemurafenib in Dogs with BRAF-mutant Bladder Cancer | b9e9ab79-d90b-4ec1-83b7-f83a5a75f5b5 |
|  1 | Comparative Molecular Characterization Program |           | OSA01               | A Multi-Platform Sequencing Analysis of Canine Appendicular Osteosarcoma                        | e3feefe9-cc70-4ae0-be06-9df7f29d84e8 |
|  2 | Comparative Molecular Characterization Program |           | TCL01               | Whole exome sequencing analysis of canine cancer cell lines                                     | 6c7fa436-efa3-42c6-af4c-7f5b70a1d35d |
|  3 | NCI BBRB                                       |           | CMB                 | Cancer Moonshot Biobank                                                                         | 4c2b6522-20b8-4841-8c7a-318b325c99b4 |
|  4 | CCDI                                           | phs003432 | TALLsc              | T-cell Acute Lymphoblastic Leukemia Single Cell RNA Sequencing and ATAC Sequencing              | 49a69fef-71f8-44e6-ad3b-f7a62d91e348 |

# Step 2:  Creating a new submission or using an existing submission

The next step in the process is to either create a new submission or to use one of your existing submissions.  It is not necessary to create a new submission every time, if you have an existing submission that you need to continue working on, simply start using that submission. 

## Step 2, Alternate 1: Creating a new submission

For the purposes of this demonstration, we'll use the CCDI TALLsc study as the example.  In order to submit data you first step create a new submission within the study.  **Do not do this if you're continuing with an exsiting study**.

From the data we obtained in the first query, we'll have to parse out the infrmiaton that's relevant to the CCDI TALLsc study.  We'll need these to construct the query that creates the new submission

In [9]:
for study in org_res['data']['listApprovedStudiesOfMyOrganization']:
    if study['originalOrg'] == 'CCDI':
        org = study['originalOrg']
        dbgap = study['dbGaPID']
        abbrev = study['studyAbbreviation']
        name = study['studyName']
        studyid = study['_id']

dc = "CDS"
name = "Jupyter Demo 3"
intention = "New/Update"
datatype = "Metadata and Data Files"


### createSubmissions mutation

Creating submissions requires the use of a mutation that calls createSubmissions.  There are multiple required variables that have to be provided in a GraphQL compatible way:
- studyID:  This is the Study ID that can be obtained from the graphical interface
- dbGaPID: Obtained when registering the study at dbGaP.  This is required for all controlled access studies
- dataCommons: This is the CRDC Data Commons the submissions will be deposited in
- name: This can be anything that allows you to identify this specific submission
- intention: Can be “New/Update” if you are adding information to the submission or  “Delete” if you are removing information from the submission
- dataType: Can be either "Metadata and Data Files" or “Metadata Only”.  Which one is selected depends on whether or not data files will be included in the submission

  This query will return the _id field which will be the newly created submission ID. It will also return a number of other fields that can be checked to make sure the submission was created properly.

In [10]:
create_submission_query = """
mutation CreateNewSubmission(
  $studyID: String!,
  $dbGaPID: String!,
  $dataCommons: String!,
  $name: String!,
  $intention:String!,
  $dataType: String!,
){
  createSubmission(
    studyID: $studyID,
    dbGaPID: $dbGaPID,
    dataCommons: $dataCommons,
    name: $name,
    intention: $intention,
    dataType: $dataType
  ){
    _id
    studyID
    dbGaPID
    dataCommons
    name
    intention
    dataType
    status
  }
}"""

In [27]:
variables = {"studyID":studyid, "dbGaPID":dbgap, "dataCommons":dc, "name":name, "intention":intention,"dataType":datatype}

In [28]:
create_res = apiQuery(dev2,create_submission_query, variables, None)

In [29]:
print(create_res)

{'data': {'createSubmission': {'_id': '162fb91f-75a8-4994-86e7-8df189ebc476', 'studyID': '49a69fef-71f8-44e6-ad3b-f7a62d91e348', 'dbGaPID': 'phs003432', 'dataCommons': 'CDS', 'name': 'Jupyter Demo 3', 'intention': 'New/Update', 'dataType': 'Metadata and Data Files', 'status': 'New'}}}


Parse out the submission ID since we'll need it later

In [30]:
submissionid = create_res['data']['createSubmission']["_id"]
subname = create_res['data']['createSubmission']['name']

#### Side trip

At this point if you go to the graphical interface you should see that a new submission has been created using the name provided in the query

### Step 2, Alternate 2: Working with existing submissions

If you already have submissions in Data Hub that you've been working with, you can continue to work with them instead of creating a new submission.  To continue work on a submission, you will first have to identify the submissions using the *listSubmissions* query.

The listSubmissions query requires that **status** be provided as a parameter.  The status can be one of:
- "All"
- "New"
- "In Progress"
- -"Submitted"
- "Released"
- "Completed"
- "Archived"
- "Canceled"
- "Rejected"
- "Withdrawn"-
- "Deleted"

This allows users to scan for submissions that are in a specific state, but for the purposes of the demonstration, we'll use "All" to bring back everything.  We'll also return some additional information about each submission so we can identify the ones we want to work with.

For long lists, the *listSubmissions* query also allows the list to be sorted in ascending or descending order with the **sortDirection** field, and to request a sorting order by field with the **orderBy** field.  Additional fields for this query can be found in the documentation. 

In [11]:
list_sub_query = """
    query ListSubmissions($status:String!){
          listSubmissions(status: $status){
            submissions{
              _id
              name
              status
              studyAbbreviation
              studyID
            }
          }
    }
"""

In [12]:
statusvariables = {"status":"All"}

In [13]:
list_sub_res = apiQuery(dev2, list_sub_query, statusvariables, None)

In [14]:
submissions_df = pd.DataFrame(list_sub_res['data']['listSubmissions']['submissions'])
display(Markdown(submissions_df.to_markdown()))

|    | _id                                  | name                           | status      | studyAbbreviation   | studyID                              |
|---:|:-------------------------------------|:-------------------------------|:------------|:--------------------|:-------------------------------------|
|  0 | 162fb91f-75a8-4994-86e7-8df189ebc476 | Jupyter Demo 3                 | In Progress | TALLsc              | 49a69fef-71f8-44e6-ad3b-f7a62d91e348 |
|  1 | f3eb4e0d-872c-4cbe-a758-0a2df9a1200d | Jupyter Demo 2                 | In Progress | TALLsc              | 49a69fef-71f8-44e6-ad3b-f7a62d91e348 |
|  2 | 02862615-84b7-4815-becf-97a8593bf629 | Jupyter Demo 1                 | In Progress | TALLsc              | 49a69fef-71f8-44e6-ad3b-f7a62d91e348 |
|  3 | eda77bf5-37cd-4f3b-822e-cbceb31fb05c | Demo create submission Jupyter | Canceled    | TALLsc              | 49a69fef-71f8-44e6-ad3b-f7a62d91e348 |
|  4 | 04bd7dad-0859-49aa-8df1-5e6560e5482a | Demo create submission 1       | New         | TALLsc              | 49a69fef-71f8-44e6-ad3b-f7a62d91e348 |
|  5 | d77df872-384f-493f-b18f-449ed6fa7fdb | Demo create submission Jupyter | In Progress | TALLsc              | 49a69fef-71f8-44e6-ad3b-f7a62d91e348 |
|  6 | f41aea9c-bb76-4b48-8b53-27028317b434 | Demo create submission Jupyter | In Progress | TALLsc              | 49a69fef-71f8-44e6-ad3b-f7a62d91e348 |
|  7 | 107ba083-f107-4a2f-a848-824bb8746a01 | Demo create submission 1       | New         | TALLsc              | 49a69fef-71f8-44e6-ad3b-f7a62d91e348 |
|  8 | 181432cd-e915-46ff-b62e-1f167abb7e2f | API Demonstration              | New         | CMB                 | 4c2b6522-20b8-4841-8c7a-318b325c99b4 |

Since we're working with the TALLsc study, we need to work on one of the submissions related to that

In [15]:
for submission in list_sub_res['data']['listSubmissions']['submissions']:
    if submission['name'] == 'Demo create submission Jupyter':
        submissionid = submission['_id']

# Step 3: Uploading Submission templates

Once the study is created, the next step is to start uploading metadata submission templates.  There are two ways of accomplishing this upload:
1) Using the Upload CLI Tool : This is generally the easiest method and can be used to upload both the metadata templates and the data files.  The use of the Uploader CLI Tool [is documented elsewhere](https://github.com/CBIIT/crdc-datahub-cli-uploader/tree/master)
2) Using the API : If you wish to provide metadata only via a program, the API can be used as will be demonstrated in this notebook.

**Note that while the API can be used to upload metadata, the actual data files MUST be uploaded with the Upload CLI Tool**

## Collecting information about the metadata files to upload
Let's set up the list of metadata files we want to upload.  This will be a list of **FileInput** objects.  A FileInput object consiste of a dictionary with *fileName* and *size* as the keys.

- fileName: The full path file name.  Note that this will vary depending on the operating system being used.
- size: The size of the file in bytes

The last field required for the query is the *type* field is either "metadata" or "data file" and "data file" isn't allowed ouside of the Upload CLI Tool, we'll set it to "metadata"

In [16]:
#metadatafiles = [{"fileName":"/home/pihl/testdata/PDXNet_participant.tsv", "size": 2106 }, {"fileName":"/home/pihl/testdata/PDXNet_sample.tsv", "size":12416}]
#metadatafiles = [{"fileName":"PDXNet_sample.tsv", "size":12416}]
metadatafiles = [{"fileName":"PDXNet_participant.tsv", "size": 2106 }, {"fileName":"PDXNet_sample.tsv", "size":12416},{"fileName":"PDXNet_diagnosis.tsv", "size":6439},{"fileName":"PDXNet_file.tsv", "size":76940},{"fileName":"PDXNet_genomic_info.tsv", "size":283886},{"fileName":"PDXNet_image.tsv", "size":3671},
                      {"fileName":"PDXNet_program.tsv", "size":307},{"fileName":"PDXNet_study.tsv", "size":2171},{"fileName":"PDXNet_treatment.tsv", "size":112}]
type = "metadata"
uploadpath = "/home/pihl/testdata/"
print(submissionid)

f41aea9c-bb76-4b48-8b53-27028317b434


## The createBatch mutation
Now that we've got credentials and the list of files, we create a "batch", which is the term for one or more files uploaded at the same time.  We do this by using the createBatch muations as shown below.  

One of the critical pieces of information returned is the signed URL that is used to actually trasfer the files to Data Hub.

In [17]:
create_batch_query = """
mutation CreateBatch(
    $submissionID: ID!, 
    $type: String!, 
    $file: [FileInput]) {
  createBatch(submissionID: $submissionID, type: $type, files: $file) {
    _id
    files {
      fileName
      signedURL
    }
  }
}
"""

In [18]:
create_batch_variables = {"submissionID":submissionid, "type":type, "file":metadatafiles}

In [19]:
create_batch_res = apiQuery(dev2, create_batch_query, create_batch_variables,None)


The results from this mutation will have the signed URLs (again, for security reasons it's a good idea to not print them out).  We'll use these to upload the files.  Make sure that you're using the correct signed URL for each file.

In [20]:
batchid = create_batch_res['data']['createBatch']['_id']
print(batchid)

e391e412-e4ad-4a91-ba5e-fc24cd106652


In [21]:
#def awsFileUpload(file, signedurl, size):
#    #https://docs.aws.amazon.com/AmazonS3/latest/userguide/example_s3_Scenario_PresignedUrl_section.html
#    #headers = {'Content-Type': 'text/tab-separated-values', 'Connection':'keep-alive', 'Accept':'*/*', 'Accept-Encoding':'gzip,deflate,br', 'Content-Length':str(size)}
#    headers = {'Content-Type': 'text/tab-separated-values')}
#    try:
#        with open(file, 'rb') as f:
#            filetext = f.read()
#        res = requests.put(signedurl, data=filetext, headers=headers)
#        if res.status_code == 200:
#            return res
#        else:
#            print(f"Error: {res.status_code}")
#            return res.content
#    except requests.exceptions.HTTPError as e:
#        return(f"HTTP error: {e}")

In [22]:
def awsFileUpload(file, signedurl, datadir):
    #https://docs.aws.amazon.com/AmazonS3/latest/userguide/example_s3_Scenario_PresignedUrl_section.html
    #headers = {'Content-Type': 'text/tab-separated-values', 'Connection':'keep-alive', 'Accept':'*/*', 'Accept-Encoding':'gzip,deflate,br', 'Content-Length':str(size)}
    headers = {'Content-Type': 'text/tab-separated-values'}
    try:
        fullFileName = datadir+file
        with open(fullFileName, 'rb') as f:
            filetext = f.read()
        res = requests.put(signedurl, data=filetext, headers=headers)
        if res.status_code == 200:
            return res
        else:
            print(f"Error: {res.status_code}")
            return res.content
    except requests.exceptions.HTTPError as e:
        return(f"HTTP error: {e}")

As each file is uploaded, an *UploadResult* object has to be constructed.  This will get used in the batch update step.

In [23]:
file_upload_result = []
for entry in metadatafiles:
    for metadatafile in create_batch_res['data']['createBatch']['files']:
        if entry['fileName'] == metadatafile['fileName']:
            metares = awsFileUpload(metadatafile['fileName'], metadatafile['signedURL'], uploadpath)
            if metares.status_code == 200:
                succeeded = True
            else:
                succeeded = False
            file_upload_result.append({'fileName':entry['fileName'], 'succeeded': succeeded, 'errors':[], 'skipped':False})

After files have been uploaded, the next step is to update the batch by calling the updateBatch mutation

In [24]:
update_batch_query = """
    mutation UpdateBatch(
        $batchID: ID!
        $files: [UploadResult]
        ){
        updateBatch(batchID:$batchID, files:$files){
            _id
            displayID
        }
        }
"""

In [25]:
print(file_upload_result)
#file_upload_result = [{"fileName":"/home/pihl/testdata/PDXNet_participant.tsv","succeeded":True, "errors":[],"skipped":False},{"fileName":"/home/pihl/testdata/PDXNet_sample.tsv","succeeded":True, "errors":[],"skipped":False}]

[{'fileName': 'PDXNet_participant.tsv', 'succeeded': True, 'errors': [], 'skipped': False}, {'fileName': 'PDXNet_sample.tsv', 'succeeded': True, 'errors': [], 'skipped': False}, {'fileName': 'PDXNet_diagnosis.tsv', 'succeeded': True, 'errors': [], 'skipped': False}, {'fileName': 'PDXNet_file.tsv', 'succeeded': True, 'errors': [], 'skipped': False}, {'fileName': 'PDXNet_genomic_info.tsv', 'succeeded': True, 'errors': [], 'skipped': False}, {'fileName': 'PDXNet_image.tsv', 'succeeded': True, 'errors': [], 'skipped': False}, {'fileName': 'PDXNet_program.tsv', 'succeeded': True, 'errors': [], 'skipped': False}, {'fileName': 'PDXNet_study.tsv', 'succeeded': True, 'errors': [], 'skipped': False}, {'fileName': 'PDXNet_treatment.tsv', 'succeeded': True, 'errors': [], 'skipped': False}]


In [26]:
update_variables = {'batchID':batchid, 'files':file_upload_result}

In [27]:
update_res = apiQuery(dev2, update_batch_query, update_variables, None)
print(batchid)
print(update_res)

e391e412-e4ad-4a91-ba5e-fc24cd106652
{'data': {'updateBatch': {'_id': 'e391e412-e4ad-4a91-ba5e-fc24cd106652', 'displayID': 8}}}


#### Side Trip
If you log into the Data Hub interface, at this point you should see the files that have been uploaded along with any errors that were detected.

### Checking the upload
Before going any further, it's a good idea to make sure that the upload went as expected.  The best way to check for upload errors is wtih the *listBatches* query.  Since this returns all of the batches in a submission, you'll have to do a little parsing to see if there are any issues with the batch you just sent.

In [28]:
list_batches_query = """
query ListBatches($submissionID: ID!) {
  listBatches(submissionID: $submissionID) {
    batches {
      _id
      submissionID
      displayID
      type
      fileCount
      files {
        fileName
      }
      status
      errors
    }
  }
}
"""

In [29]:
batches_variables = {'submissionID':submissionid}

In [30]:
batch_error_res = apiQuery(dev2, list_batches_query, batches_variables, None)
print(batch_error_res)

{'data': {'listBatches': {'batches': [{'_id': 'e391e412-e4ad-4a91-ba5e-fc24cd106652', 'submissionID': 'f41aea9c-bb76-4b48-8b53-27028317b434', 'displayID': 8, 'type': 'metadata', 'fileCount': 9, 'files': [{'fileName': 'PDXNet_participant.tsv'}, {'fileName': 'PDXNet_sample.tsv'}, {'fileName': 'PDXNet_diagnosis.tsv'}, {'fileName': 'PDXNet_file.tsv'}, {'fileName': 'PDXNet_genomic_info.tsv'}, {'fileName': 'PDXNet_image.tsv'}, {'fileName': 'PDXNet_program.tsv'}, {'fileName': 'PDXNet_study.tsv'}, {'fileName': 'PDXNet_treatment.tsv'}], 'status': 'Failed', 'errors': ['“PDXNet_sample.tsv: 74”: conflict data detected: “sample_type”: "DNA".', '“PDXNet_sample.tsv: 38”: conflict data detected: “sample_type”: "RNA".', '“PDXNet_image.tsv:2”:  Key property “study_link_id” value is required.', '“PDXNet_treatment.tsv:2”:  Key property “treatment_id” value is required.']}, {'_id': 'ac699a96-08e1-486a-9280-7912d08d64d7', 'submissionID': 'f41aea9c-bb76-4b48-8b53-27028317b434', 'displayID': 7, 'type': 'metad

In [31]:
batch_df = pd.DataFrame(batch_error_res['data']['listBatches']['batches'])
display(Markdown(batch_df.to_markdown()))

|    | _id                                  | submissionID                         |   displayID | type     |   fileCount | files                                                                                                                                                                                                                                                                                                                                     | status    | errors                                                                                                                                                                                                                                                                                                      |
|---:|:-------------------------------------|:-------------------------------------|------------:|:---------|------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|  0 | e391e412-e4ad-4a91-ba5e-fc24cd106652 | f41aea9c-bb76-4b48-8b53-27028317b434 |           8 | metadata |           9 | [{'fileName': 'PDXNet_participant.tsv'}, {'fileName': 'PDXNet_sample.tsv'}, {'fileName': 'PDXNet_diagnosis.tsv'}, {'fileName': 'PDXNet_file.tsv'}, {'fileName': 'PDXNet_genomic_info.tsv'}, {'fileName': 'PDXNet_image.tsv'}, {'fileName': 'PDXNet_program.tsv'}, {'fileName': 'PDXNet_study.tsv'}, {'fileName': 'PDXNet_treatment.tsv'}] | Failed    | ['“PDXNet_sample.tsv: 74”: conflict data detected: “sample_type”: "DNA".', '“PDXNet_sample.tsv: 38”: conflict data detected: “sample_type”: "RNA".', '“PDXNet_image.tsv:2”:  Key property “study_link_id” value is required.', '“PDXNet_treatment.tsv:2”:  Key property “treatment_id” value is required.'] |
|  1 | ac699a96-08e1-486a-9280-7912d08d64d7 | f41aea9c-bb76-4b48-8b53-27028317b434 |           7 | metadata |           1 | [{'fileName': 'PDXNet_sample.tsv'}]                                                                                                                                                                                                                                                                                                       | Failed    | ['“PDXNet_sample.tsv: 74”: conflict data detected: “sample_type”: "DNA".', '“PDXNet_sample.tsv: 38”: conflict data detected: “sample_type”: "RNA".']                                                                                                                                                        |
|  2 | 5cb96a32-f015-470f-bfdd-c7d3f28e1ce2 | f41aea9c-bb76-4b48-8b53-27028317b434 |           6 | metadata |           2 | [{'fileName': '/home/pihl/testdata/PDXNet_participant.tsv'}, {'fileName': '/home/pihl/testdata/PDXNet_sample.tsv'}]                                                                                                                                                                                                                       | Uploading |                                                                                                                                                                                                                                                                                                             |
|  3 | da8bfe78-b227-42e9-9291-8934391151e2 | f41aea9c-bb76-4b48-8b53-27028317b434 |           5 | metadata |           2 | [{'fileName': '/home/pihl/testdata/PDXNet_participant.tsv'}, {'fileName': '/home/pihl/testdata/PDXNet_sample.tsv'}]                                                                                                                                                                                                                       | Uploading |                                                                                                                                                                                                                                                                                                             |
|  4 | e1937bd6-f659-4275-a5b7-9ac387f4fde8 | f41aea9c-bb76-4b48-8b53-27028317b434 |           4 | metadata |           2 | [{'fileName': '/media/vmshare/PDXNet_participant.tsv'}, {'fileName': '/media/vmshare/PDXNet_sample.tsv'}]                                                                                                                                                                                                                                 | Uploading |                                                                                                                                                                                                                                                                                                             |
|  5 | 6fbf7ba1-27c2-4040-86c9-1858e15eb4be | f41aea9c-bb76-4b48-8b53-27028317b434 |           3 | metadata |           2 | [{'fileName': '/media/vmshare/PDXNet_participant.tsv'}, {'fileName': '/media/vmshare/PDXNet_sample.tsv'}]                                                                                                                                                                                                                                 | Uploading |                                                                                                                                                                                                                                                                                                             |
|  6 | 3b08b7a7-a746-4724-b7da-99fb5e889f4c | f41aea9c-bb76-4b48-8b53-27028317b434 |           2 | metadata |           2 | [{'fileName': '/media/vmshare/PDXNet_participant.tsv'}, {'fileName': '/media/vnshare/PDXNet_sample.tsv'}]                                                                                                                                                                                                                                 | Uploading |                                                                                                                                                                                                                                                                                                             |
|  7 | b3fa1b09-42d5-40ee-8dc2-88e8a2b4156f | f41aea9c-bb76-4b48-8b53-27028317b434 |           1 | metadata |           2 | [{'fileName': '/media/vmshare/PDXNet_participant.tsv'}, {'fileName': '/media/vnshare/PDXNet_sample.tsv'}]                                                                                                                                                                                                                                 | Uploading |                                                                                                                                                                                                                                                                                                             |

Clearly there were some issues associated with the sample file that have to be corrected.  From the error message, it looks like there is a conflict in that the same sample has different sample_types.  While this almost certainly reflects a 