<a id='top'></a>
# Quick Links
[Introduction](#introduction)\
[Prerequisites](#prerequisites)\
[Security Note](#securityNote)\
[Pagination Note](#paginationNote)\
[Step 1: Understanding the Landscape](#step1)\
[Step 2: Creating a new submission or using an existing submission](#step2)
- [Creating a new submission](#newSubmission)
- [Working with existing submissions](#existingSubmission)

[Step 3: Uploading Submission templates](#step3)\
[Step 4: Running the Validations](#step4)\
[Step 5: Submitting, Canceling, or Withdrawing](#step5)


# Introduction
This notebook walks through the basics of using the Data Submission Portal API to work on, validate, and submit your data.  These APIs are designed to allow users to perform all the actions that can be done via the [Data Submission Portal](https://hub.datacommons.cancer.gov/) from a notebook or script.  The intent is to allow submitters to operate directly from their own environments if they so choose rathar than work through the graphical submission interface.

There are a few prerequists that you have to meet before you can use this API:

<a id='prerequisites'></a>
# Prerequisites
## GraphQL
The Data Submission Portal API uses [GraphQL](https://graphql.org/) and a good understanding of how to use GraphQL is required.  Since GraphQL can be complex, a tutorial is beyond the scope of this document, however the [GraphQL Documentation](https://graphql.org/learn/) can be very useful.

## Login.gov account
Use of the Data Submission Portal in general requires that a user have an account registered with [Login.gov](https://www.login.gov/) (NIH users can use their NIH account and PIV card).  Note that a Login.gov account is distinct from an eRA Commons identity that is frequently used at NIH.  They are not the same thing.

## Approved Submission
You must recieve approval to submit data to CRDC prior to using the Data Submission Portal APIs.  If you need approval, please read and follow the [Submissions Request Instructions](https://datacommons.cancer.gov/submit).  Instructions for using the graphical data submission process are on the same page.

## An API Token
If you are an approved submitter with a Login.gov or NIH account, you can generate an API token from the graphical interface.  Log into the system, then click on your user name and select the **API Token** menu option.  This will bring up a dialog box that allows you to create an API token and copy it to your clipboard.  There are two things to note about API tokens
- The token is tied to your user identity and can be used on any submission that you're approved to work on.
- You can have only one token at a time.  Generating a new token will revoke the previous token.


In [1]:
import requests
import os
from sys import platform

The imports below are just used for display purposes in this notebook, they're not required to interact with the Data Submission Portal API

In [2]:
import pandas as pd
from IPython.display import display, Markdown, Latex

<a id='securityNote'></a>
# Security Note
It is ***highly*** recommended that you keep your API token secure and not include it in any code.  While there are many ways to do this, for the purposes of this notebook it's been set in an environment variable names "STAGEAPI".

In [3]:
def apiQuery(tier, query, variables):
    if tier == 'prod':
        url = 'https://hub.datacommons.cancer.gov/api/graphql'
        token = os.environ['PRODAPI']
    elif tier == 'stage':
        #Note that use of Stage is for example purposes only, actual submissions should use the production URL.  If you wish to run tests on Stage, please contact the helpdesk.
        url = 'https://hub-stage.datacommons.cancer.gov/api/graphql'
        token = os.environ['STAGEAPI']
    else:
        return('Please provide either "stage" or "prod" as tier values')
    headers = {"Authorization": f"Bearer {token}"}
    try:
        if variables is None:
            result = requests.post(url = url, headers = headers, json={"query": query})
        else:
            result = requests.post(url = url, headers = headers, json = {"query":query, "variables":variables})
        if result.status_code == 200:
            return result.json()
        else:
            print(f"Error: {result.status_code}")
            return result.content
    except requests.exceptions.HTTPError as e:
        return(f"HTTP Error: {e}")

<a id='paginationNote'></a>
# Pagination Note
Most of queries that return results are paginated and need to be checked to make sure all results are retrived.  The number of available results from a query is found in the **total** field that can be returned if requested and pagination can be done using the **first** and **offest** fields in queries.  We won't be highlighting pagination in this notebook, but it can be a critical tool for fully understanding your submissions.  If the **first** field is not included in a query, the system defaults to returning the first 10 results.

- **first**: The number of records to be returned.  If first is set to -1, the API will return all results.
- **offset**: The number of records to be skipped when returning results.

<a id='step1'></a>
# Step 1: Understanding the landscape
Let's assume that this is our first submission using the API, so what we need to do is list the studies that I'm approved to submit to. That's done with the *getMyUser* query.  This query can return more information about an account (such as the status) however for this situation, we'll focus on the studies that are available.

In [4]:
study_query = """
{
  getMyUser {
    userStatus
    studies {
      _id
      controlledAccess
      createdAt
      dbGaPID
      studyName
      studyAbbreviation
    }
  }
}
"""

Note that the actual results returned by this query will vary for each organization.  These are example results only and shouldn't be used.

In [5]:
study_res = apiQuery('stage', study_query, None)

In [6]:
study_df = pd.DataFrame(study_res['data']['getMyUser']['studies'])
display(Markdown(study_df.to_markdown()))

|    | _id                                  | controlledAccess   | createdAt                | dbGaPID         | studyName                         | studyAbbreviation   |
|---:|:-------------------------------------|:-------------------|:-------------------------|:----------------|:----------------------------------|:--------------------|
|  0 | 0f92fd6d-3a0e-4f0f-b057-182bdc04cc6f | True               | 2025-03-05T16:05:00.556Z | phs001234       | UAT Studies                       | UATS                |
|  1 | 3fb57ccd-1744-4601-a2ae-d5cef27513cc | False              | 2025-05-01T18:58:43.520Z |                 | ARPA-H Sage Biosciences CCDI Test | Sage CCDI           |
|  2 | 466dfc05-605c-49fc-abfc-0a2ce3c9a6b8 | True               | 2025-04-17T18:52:36.344Z | phs0000GC       | General Commons Test              | GCTest              |
|  3 | 4c2b6522-20b8-4841-8c7a-318b325c99b4 | False              | 2024-07-08T12:00:00.000Z |                 | Cancer Moonshot Biobank           | CMB                 |
|  4 | 4f1a7385-bda6-4c07-abd0-49e21ec3c1ce | False              | 2025-04-17T17:57:00.838Z |                 | COTC021                           | COTC021             |
|  5 | 5d0d0213-7358-4ef1-8355-abba01b2cc3a | True               | 2025-03-05T17:40:37.755Z | phs0002         | API Example Study                 | AES                 |
|  6 | 6477a667-91aa-4838-8f95-fc2ac450bd7b | True               | 2025-04-17T18:51:49.422Z | phs00CTDC       | CTDC Test Study                   | CTDCTest            |
|  7 | 6baf0e3b-d541-4e3b-9fee-13e8360bb8df | False              | 2025-04-17T18:40:27.953Z |                 | ICDC Test Study                   | ICDCTest            |
|  8 | 6f7337aa-7723-4946-8e16-27550670a57b | False              | 2025-05-01T18:58:13.309Z |                 | ARPA-H Netrias CCDI Transform     | Netrias CCDI        |
|  9 | 9c3d4b26-ebbd-4cc1-9e41-4ed1075024b6 | True               | 2025-01-14T15:01:41.358Z | phs002192.v5.p2 | The Cancer Moonshot Biobank       | CMB                 |
| 10 | cba1415c-ec06-4d83-844e-655ac3a99300 | False              | 2025-05-01T18:57:40.893Z |                 | ARPA-H CCDI Insilicom             | Insilicom CCDI      |

And just as a check, let's have a quick look at the user status.  If the status is **Active**, the submission can proceed.  If the status is **Inactive**, you will have to contact the Help Desk (or your submission contact) to get the status set to **Acttive**

In [7]:
print(f"User status is: {study_res['data']['getMyUser']['userStatus']}")

User status is: Active


<a id='step2'></a>
# Step 2:  Creating a new submission or using an existing submission
The next step in the process is to either create a new submission or to use one of your existing submissions.  It is not necessary to create a new submission every time, if you have an existing submission that you need to continue working on, simply start using that submission. 

<a id='newSubmission'></a>
## Step 2, Alternate 1: Creating a new submission
For the purposes of this demonstration, we'll use the **API Example Study** as the example.  

In order to submit data your first step is to create a new submission within the study.  **Do not do this if you're continuing with an exsiting submission**
 
From the data we obtained in the first query, we'll have to parse out the information that's relevant to the **API Example Study**.  We'll need these to construct the query that creates the new submission

In [9]:
abbrev = 'AES'
for study in study_res['data']['getMyUser']['studies']:
    if study['studyAbbreviation'] == abbrev:
        dbgap = study['dbGaPID']
        name = study['studyName']
        studyid = study['_id']
dc = "CDS"
name = "Documentation Test 1"
intention = "New/Update"
datatype = "Metadata and Data Files"


### createSubmissions mutation

Creating submissions requires the use of a mutation that calls createSubmissions.  There are multiple required variables that have to be provided in a GraphQL compatible way:
- **studyID**:  This is the assigned Study ID that can be obtained from the **_id** field in the *getMyUser* query
- **dbGaPID**: Obtained when registering the study at dbGaP.  This is required for all controlled access studies
- **dataCommons**: This is the CRDC Data Commons the submissions will be deposited into
- **name**: This can be anything that allows you to identify this specific submission
- **intention**: Can be *New/Update* if you are adding information to the submission or  *Delete* if you are removing information from an earlier, completed submission
- **dataType**: Can be either *Metadata and Data Files* or *Metadata Only*.  Which one is selected depends on whether or not data files will be included in the submission

  This query will return the **_id** field which will be the newly created submission ID. It will also return a number of other fields that can be checked to make sure the submission was created properly.

In [10]:
create_submission_query = """
mutation CreateNewSubmission(
  $studyID: String!,
  $dataCommons: String!,
  $name: String!,
  $intention:String!,
  $dataType: String!,
){
  createSubmission(
    studyID: $studyID,
    dataCommons: $dataCommons,
    name: $name,
    intention: $intention,
    dataType: $dataType
  ){
    _id
    studyID
    dbGaPID
    dataCommons
    name
    intention
    dataType
    status
  }
}"""

In [11]:
variables = {"studyID":studyid, "dataCommons":dc, "name":name, "intention":intention,"dataType":datatype}

In [12]:
create_res = apiQuery('stage',create_submission_query, variables)

Parse out the submission ID since we'll need it later

In [13]:
submissionid = create_res['data']['createSubmission']["_id"]
subname = create_res['data']['createSubmission']['name']

#### Side trip

At this point if you go to the graphical interface you should see that a new submission has been created using the name provided in the query

<a id='existingSubmission'></a>
### Step 2, Alternate 2: Working with existing submissions
If you already have submissions in the Data Submission Portal that you've been working with, you can continue to work with them instead of creating a new submission.  To continue work on a submission, you will first have to identify the submissions using the *listSubmissions* query.

The listSubmissions query requires that **status** be provided as a parameter.  The status can be any combination of:
- All
- New
- In Progress
- Submitted
- Released
- Completed
- Archived
- Canceled
- Rejected
- Withdrawn
- Deleted

**All** returns all submission statuses.

Details about what each of these states means can be found in the Submission Documentation.  For most submitters, the important states are **New**, **In Progress**, and **Submitted** as those will be the states that allow work to be done on the submission.

This allows for queries to bring back information about a specific state, but for the purposes of the demonstration, we'll use "All" to bring back everything.  We'll also return some additional information about each submission so we can identify the ones we want to work with.

For long lists, the *listSubmissions* query also allows the list to be pagniated using the **first** and **offset** fields and sorted in ascending or descending order with the **sortDirection** field, and to request a sorting order by field with the **orderBy** field.  Please see the API documentation for additional information.

In [70]:
list_sub_query = """
    query ListSubmissions(
    $status:[String],
    $first: Int,
    $offset: Int,
    $orderBy: String,
    $sortDirection: String){
          listSubmissions(
              status: $status,
              first: $first,
              offset: $offset,
              orderBy: $orderBy,
              sortDirection: $sortDirection){
            total
            submissions{
              _id
              name
              submitterName
              dataCommons
              studyAbbreviation
              dbGaPID
              modelVersion
              status
              conciergeName
              createdAt
              updatedAt
              intention
            }
          }
    }
"""

In [71]:
statusvariables = {"status":["New", "Deleted", "In Progress"], "first": -1, "offset": 0, "orderBy": "updatedAt", "sortDirection": "desc"}

In [72]:
list_sub_res = apiQuery('stage', list_sub_query, statusvariables)

In [73]:
submissions_df = pd.DataFrame(list_sub_res['data']['listSubmissions']['submissions'])
display(Markdown(submissions_df.to_markdown()))

|    | _id                                  | name                      | submitterName   | dataCommons   | studyAbbreviation       | dbGaPID         | modelVersion   | status      | conciergeName   | createdAt                | updatedAt                | intention   |
|---:|:-------------------------------------|:--------------------------|:----------------|:--------------|:------------------------|:----------------|:---------------|:------------|:----------------|:-------------------------|:-------------------------|:------------|
|  0 | 0623921e-926f-4e2a-b0f7-6f1d442d855d | Stage API Sub Test 3      | Todd Pihl       | CDS           | AES                     | phs0002         | 6.0.2          | New         |                 | 2025-03-07T19:33:28.186Z | 2025-03-07T19:33:28.186Z | New/Update  |
|  1 | a4e4f3f6-6d47-4174-a687-b71ba925a558 | Stage API Sub Test 2      | Todd Pihl       | CDS           | AES                     | phs0002         | 6.0.2          | In Progress |                 | 2025-03-06T15:27:02.009Z | 2025-03-06T18:33:56.857Z | New/Update  |
|  2 | c477eeb1-53b9-45f3-873b-9fea9e242267 | Stage API Submission Test | Todd Pihl       | CDS           | AES                     | phs0002         | 6.0.2          | New         |                 | 2025-03-05T18:28:19.846Z | 2025-03-05T19:01:54.803Z | New/Update  |
|  3 | 4ce43a10-4669-40ce-949f-3fbbfc9f2513 | Stage API Submission Test | Todd Pihl       | CDS           | G_controlledStudy_Stage | 34424           | 4.0.4          | Deleted     |                 | 2024-10-18T20:02:36.282Z | 2025-03-05T13:30:02.351Z | New/Update  |
|  4 | 451744e8-3c12-4782-ba6b-0d97ef585929 | Key 3                     | Todd Pihl       | CDS           | HTAN image              | phs002371_image | 5.0.4          | In Progress |                 | 2025-02-05T18:59:16.955Z | 2025-02-07T21:28:15.395Z | New/Update  |

Since we're working with the **AES** study, we need to work on one of the submissions related to that

In [74]:
for submission in list_sub_res['data']['listSubmissions']['submissions']:
    if submission['name'] == 'Stage API Sub Test 3':
        submissionid = submission['_id']

<a id='step3'></a>
# Step 3: Uploading Submission templates
Once the study is created, the next step is to start uploading metadata submission templates and data files.  There are two ways of accomplishing this upload:
1) Using the Upload CLI Tool : This is generally the easiest method and can be used to upload both the metadata templates and the data files.  The use of the Uploader CLI Tool [is documented elsewhere](https://github.com/CBIIT/crdc-datahub-cli-uploader/tree/master)
2) Using the API : If you wish to provide metadata only via a program, the API can be used as will be demonstrated in this notebook.  While it is possible to upload data files using the API, it is **strongly** recommended that the Upload CLI Tool is used instead.

Uploading data files using the API will be covered in a separate notebook.

## Collecting information about the metadata files to upload
Let's set up the list of metadata files we want to upload.  This will be a list of **FileInput** objects.  A FileInput object consiste of a dictionary with *fileName* and *size* as the keys.

- **fileName**: Just the name of the file, not including the path.
- **size**: The size of the file in bytes

The last field required for the query is the *type* field is either "metadata" or "data file" and "data file" isn't allowed ouside of the Upload CLI Tool, we'll set it to "metadata"

In [107]:
if platform == 'linux' or platform == 'linux2':
    datadir = '/testdata/'
elif platform == "win32":
    datadir = r"C:\Users\pihltd\Documents\datadir"
elif platform == "darwin":
    datadir = "/testdata/"
filelist = os.listdir(datadir)
metadatafiles = []
for file in filelist:
    metadatafiles.append(file)
print(metadatafiles)

['PDXNet_diagnosis.tsv', 'PDXNet_participant.tsv', 'PDXNet_genomic_info.tsv', 'PDXNet_sample.tsv', 'PDXNet_study.tsv', 'PDXNet_program.tsv', 'PDXNet_file.tsv']


In [108]:
submissiontype = "metadata"

## The createBatch mutation
Now that we've got credentials and the list of files, we create a "batch", which is the term for one or more files uploaded at the same time.  We do this by using the createBatch muations as shown below.  

One of the critical pieces of information returned is the signed URL that is used to actually trasfer the files to the Data Submission Portal.

In [109]:
create_batch_query = """
mutation CreateBatch(
    $submissionID: ID!, 
    $type: String, 
    $files: [String!]!) {
  createBatch(submissionID: $submissionID, type: $type, files: $files) {
    _id
    submissionID
    bucketName
    filePrefix
    type
    status
    createdAt
    updatedAt
    files {
      fileName
      signedURL
    }
  }
}
"""

In [110]:
create_batch_variables = {"submissionID":submissionid, "type":submissiontype, "files":metadatafiles}

In [111]:
create_batch_res = apiQuery('stage', create_batch_query, create_batch_variables)

The results from this mutation will have the signed URLs (again, for security reasons it's a good idea to not print them out).  We'll use these to upload the files.  Make sure that you're using the correct signed URL for each file.  We'll also need the batch ID, so that shoudl be parsed out.

In [112]:
batchid = create_batch_res['data']['createBatch']['_id']

In [113]:
def awsFileUpload(file, signedurl, datadir):
    #https://docs.aws.amazon.com/AmazonS3/latest/userguide/example_s3_Scenario_PresignedUrl_section.html
    headers = {'Content-Type': 'text/tab-separated-values'}
    try:
        fullFileName = datadir+file
        with open(fullFileName, 'rb') as f:
            filetext = f.read()
        res = requests.put(signedurl, data=filetext, headers=headers)
        if res.status_code == 200:
            return res
        else:
            print(f"Error: {res.status_code}")
            return res.content
    except requests.exceptions.HTTPError as e:
        return(f"HTTP error: {e}")

In [114]:
def processFilesForUpload(metadatafiles, datadir,batch_creation_results):
    file_upload_result = []
    for entry in metadatafiles:
        for metadatafile in batch_creation_results['data']['createBatch']['files']:
            if entry == metadatafile['fileName']:
                metares = awsFileUpload(metadatafile['fileName'], metadatafile['signedURL'], datadir)
                if metares.status_code == 200:
                    succeeded = True
                else:
                    succeeded = False
                file_upload_result.append({'fileName':entry, 'succeeded': succeeded, 'errors':[], 'skipped':False})
    return file_upload_result

As each file is uploaded, an *UploadResult* object has to be constructed.  This will get used in the batch update step.

In [115]:
datadir = "/testdata/"
file_upload_result = processFilesForUpload(metadatafiles, datadir, create_batch_res)

After files have been uploaded, the next step is to update the batch by calling the *updateBatch* mutation.  This mutation uses the *UploadResult* object that we created in the previous step

In [116]:
update_batch_query = """
    mutation UpdateBatch(
        $batchID: ID!
        $files: [UploadResult]
        ){
        updateBatch(batchID:$batchID, files:$files){
            _id
            displayID
        }
        }
"""

In [117]:
update_variables = {'batchID':batchid, 'files':file_upload_result}

In [118]:
update_res = apiQuery('stage', update_batch_query, update_variables)

#### Side Trip
If you log into the Data Submission Portal interface, at this point you should see the files that have been uploaded along with any errors that were detected.

### Checking the upload
Before going any further, it's a good idea to make sure that the upload went as expected.  The best way to check for upload errors is wtih the *listBatches* query.  Since this returns all of the batches in a submission, you'll have to do a little parsing to see if there are any issues with the batch you just sent.

In [119]:
list_batches_query = """
query ListBatches($submissionID: ID!) {
  listBatches(submissionID: $submissionID) {
    total
    batches {
      _id
      submissionID
      displayID
      type
      fileCount
      files {
        fileName
      }
      status
      errors
    }
  }
}
"""

In [120]:
batches_variables = {'submissionID':submissionid}

In [121]:
batch_error_res = apiQuery('stage', list_batches_query, batches_variables)

In [122]:
batch_df = pd.DataFrame(batch_error_res['data']['listBatches']['batches'])
display(Markdown(batch_df.to_markdown()))

|    | _id                                  | submissionID                         |   displayID | type     |   fileCount | files                                                                                                                                                                                                                                                                                                                                     | status    | errors                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
|---:|:-------------------------------------|:-------------------------------------|------------:|:---------|------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|  0 | 564c0a73-9b08-4b3d-b4d1-084fba856480 | d7824107-91f2-4825-9b94-c0993e07b4cc |          17 | metadata |           7 | [{'fileName': 'PDXNet_diagnosis.tsv'}, {'fileName': 'PDXNet_participant.tsv'}, {'fileName': 'PDXNet_genomic_info.tsv'}, {'fileName': 'PDXNet_sample.tsv'}, {'fileName': 'PDXNet_study.tsv'}, {'fileName': 'PDXNet_program.tsv'}, {'fileName': 'PDXNet_file.tsv'}]                                                                         | Uploading |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|  1 | ba6ac3e0-282a-44fc-8a67-b8f9e221d625 | d7824107-91f2-4825-9b94-c0993e07b4cc |          16 | metadata |           1 | [{'fileName': 'PDXNet_treatment.tsv'}]                                                                                                                                                                                                                                                                                                    | Failed    | ['Batch validation failed - internal error. Please try again and contact the helpdesk if this error persists.']                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|  2 | 7cc349b8-9b83-4776-845d-3b673b55bfd3 | d7824107-91f2-4825-9b94-c0993e07b4cc |          15 | metadata |           1 | [{'fileName': 'PDXNet_genomic_info.tsv'}]                                                                                                                                                                                                                                                                                                 | Uploaded  | []                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|  3 | 4782cdb3-8bbe-4d3e-a241-9d036117389a | d7824107-91f2-4825-9b94-c0993e07b4cc |          14 | metadata |           1 | [{'fileName': 'PDXNet_diagnosis.tsv'}]                                                                                                                                                                                                                                                                                                    | Uploaded  | []                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|  4 | 0f4905a6-5100-440b-915e-efbb3f85cb87 | d7824107-91f2-4825-9b94-c0993e07b4cc |          13 | metadata |           1 | [{'fileName': 'PDXNet_sample.tsv'}]                                                                                                                                                                                                                                                                                                       | Uploaded  | []                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|  5 | 9aec0222-40ae-4cc1-9bb1-80eeda27258c | d7824107-91f2-4825-9b94-c0993e07b4cc |          12 | metadata |           1 | [{'fileName': 'PDXNet_participant.tsv'}]                                                                                                                                                                                                                                                                                                  | Uploaded  | []                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|  6 | 242f107f-b0b2-46cc-aeef-866ab0873bc2 | d7824107-91f2-4825-9b94-c0993e07b4cc |          11 | metadata |           1 | [{'fileName': 'PDXNet_program.tsv'}]                                                                                                                                                                                                                                                                                                      | Uploaded  | []                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|  7 | 319061af-8b24-4c5e-8fe6-963c9237f636 | d7824107-91f2-4825-9b94-c0993e07b4cc |          10 | metadata |           1 | [{'fileName': 'PDXNet_study.tsv'}]                                                                                                                                                                                                                                                                                                        | Uploaded  | []                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
|  8 | fa9b6d60-7235-4864-bcf7-e95dcd093f51 | d7824107-91f2-4825-9b94-c0993e07b4cc |           9 | metadata |           8 | [{'fileName': 'PDXNet_diagnosis.tsv'}, {'fileName': 'PDXNet_file.tsv'}, {'fileName': 'PDXNet_genomic_info.tsv'}, {'fileName': 'PDXNet_participant.tsv'}, {'fileName': 'PDXNet_program.tsv'}, {'fileName': 'PDXNet_sample.tsv'}, {'fileName': 'PDXNet_study.tsv'}, {'fileName': 'PDXNet_treatment.tsv'}]                                   | Failed    | ['Batch validation failed - internal error. Please try again and contact the helpdesk if this error persists.']                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|  9 | d7d2e05e-db55-4c40-bc41-ee2a1096b43f | d7824107-91f2-4825-9b94-c0993e07b4cc |           8 | metadata |           8 | [{'fileName': 'PDXNet_diagnosis.tsv'}, {'fileName': 'PDXNet_participant.tsv'}, {'fileName': 'PDXNet_genomic_info.tsv'}, {'fileName': 'PDXNet_treatment.tsv'}, {'fileName': 'PDXNet_sample.tsv'}, {'fileName': 'PDXNet_study.tsv'}, {'fileName': 'PDXNet_program.tsv'}, {'fileName': 'PDXNet_file.tsv'}]                                   | Failed    | ['Batch validation failed - internal error. Please try again and contact the helpdesk if this error persists.']                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| 10 | aa4953aa-34c8-4fab-b198-ab2809d13bdd | d7824107-91f2-4825-9b94-c0993e07b4cc |           7 | metadata |           8 | [{'fileName': 'PDXNet_diagnosis.tsv'}, {'fileName': 'PDXNet_file.tsv'}, {'fileName': 'PDXNet_genomic_info.tsv'}, {'fileName': 'PDXNet_participant.tsv'}, {'fileName': 'PDXNet_program.tsv'}, {'fileName': 'PDXNet_sample.tsv'}, {'fileName': 'PDXNet_study.tsv'}, {'fileName': 'PDXNet_treatment.tsv'}]                                   | Failed    | ['“PDXNet_sample.tsv: 74”: conflict data detected: “sample_type”: "DNA".', '“PDXNet_sample.tsv: 38”: conflict data detected: “sample_type”: "RNA".', '“PDXNet_study.tsv”: Property "phs_accession" is required.', 'Batch validation failed - internal error. Please try again and contact the helpdesk if this error persists.']                                                                                                                                                                                                                                                   |
| 11 | c560dae2-b9cb-4ae1-a2cd-c93f93111fd8 | d7824107-91f2-4825-9b94-c0993e07b4cc |           6 | metadata |           8 | [{'fileName': 'PDXNet_diagnosis.tsv'}, {'fileName': 'PDXNet_participant.tsv'}, {'fileName': 'PDXNet_genomic_info.tsv'}, {'fileName': 'PDXNet_treatment.tsv'}, {'fileName': 'PDXNet_sample.tsv'}, {'fileName': 'PDXNet_study.tsv'}, {'fileName': 'PDXNet_program.tsv'}, {'fileName': 'PDXNet_file.tsv'}]                                   | Failed    | ['Batch validation failed - internal error. Please try again and contact the helpdesk if this error persists.']                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| 12 | 2b98d959-b38a-4fd3-bea3-da15ceefa31c | d7824107-91f2-4825-9b94-c0993e07b4cc |           5 | metadata |           9 | [{'fileName': 'PDXNet_diagnosis.tsv'}, {'fileName': 'PDXNet_file.tsv'}, {'fileName': 'PDXNet_genomic_info.tsv'}, {'fileName': 'PDXNet_image.tsv'}, {'fileName': 'PDXNet_participant.tsv'}, {'fileName': 'PDXNet_program.tsv'}, {'fileName': 'PDXNet_sample.tsv'}, {'fileName': 'PDXNet_study.tsv'}, {'fileName': 'PDXNet_treatment.tsv'}] | Failed    | ['“PDXNet_file.tsv”: "phs_accession" is not Key property of "study", please use "study_id" instead.', '“PDXNet_image.tsv:2”:  Key property “study_link_id” value is required.', '“PDXNet_program.tsv”: Property "program_short_name" is required.', '“PDXNet_sample.tsv: 74”: conflict data detected: “sample_type”: "DNA".', '“PDXNet_sample.tsv: 38”: conflict data detected: “sample_type”: "RNA".', '“PDXNet_study.tsv”: Key property “study_id” is required.', 'Batch validation failed - internal error. Please try again and contact the helpdesk if this error persists.'] |
| 13 | a357184c-e660-4e76-8d70-ffbe17c4d8db | d7824107-91f2-4825-9b94-c0993e07b4cc |           4 | metadata |           9 | [{'fileName': 'PDXNet_diagnosis.tsv'}, {'fileName': 'PDXNet_participant.tsv'}, {'fileName': 'PDXNet_genomic_info.tsv'}, {'fileName': 'PDXNet_treatment.tsv'}, {'fileName': 'PDXNet_image.tsv'}, {'fileName': 'PDXNet_sample.tsv'}, {'fileName': 'PDXNet_study.tsv'}, {'fileName': 'PDXNet_program.tsv'}, {'fileName': 'PDXNet_file.tsv'}] | Failed    | ['Batch validation failed - internal error. Please try again and contact the helpdesk if this error persists.']                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| 14 | a8f34250-93c8-43c0-922e-f4d2c22077a3 | d7824107-91f2-4825-9b94-c0993e07b4cc |           3 | metadata |           9 | [{'fileName': 'PDXNet_diagnosis.tsv'}, {'fileName': 'PDXNet_participant.tsv'}, {'fileName': 'PDXNet_genomic_info.tsv'}, {'fileName': 'PDXNet_treatment.tsv'}, {'fileName': 'PDXNet_image.tsv'}, {'fileName': 'PDXNet_sample.tsv'}, {'fileName': 'PDXNet_study.tsv'}, {'fileName': 'PDXNet_program.tsv'}, {'fileName': 'PDXNet_file.tsv'}] | Failed    | ['Batch validation failed - internal error. Please try again and contact the helpdesk if this error persists.']                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| 15 | ef696623-46b6-4930-ba91-2f0c9ffab106 | d7824107-91f2-4825-9b94-c0993e07b4cc |           2 | metadata |           9 | [{'fileName': 'PDXNet_diagnosis.tsv'}, {'fileName': 'PDXNet_participant.tsv'}, {'fileName': 'PDXNet_genomic_info.tsv'}, {'fileName': 'PDXNet_treatment.tsv'}, {'fileName': 'PDXNet_image.tsv'}, {'fileName': 'PDXNet_sample.tsv'}, {'fileName': 'PDXNet_study.tsv'}, {'fileName': 'PDXNet_program.tsv'}, {'fileName': 'PDXNet_file.tsv'}] | Failed    | ['“PDXNet_participant.tsv”: "study)id" is not Key property of "study", please use "study_id" instead.', 'Batch validation failed - internal error. Please try again and contact the helpdesk if this error persists.']                                                                                                                                                                                                                                                                                                                                                             |
| 16 | 46c44e38-4434-4862-8347-8e2657925f96 | d7824107-91f2-4825-9b94-c0993e07b4cc |           1 | metadata |           9 | [{'fileName': 'PDXNet_diagnosis.tsv'}, {'fileName': 'PDXNet_participant.tsv'}, {'fileName': 'PDXNet_genomic_info.tsv'}, {'fileName': 'PDXNet_treatment.tsv'}, {'fileName': 'PDXNet_image.tsv'}, {'fileName': 'PDXNet_sample.tsv'}, {'fileName': 'PDXNet_study.tsv'}, {'fileName': 'PDXNet_program.tsv'}, {'fileName': 'PDXNet_file.tsv'}] | Failed    | ['“PDXNet_participant.tsv”: Property "sex" is required.', '“PDXNet_participant.tsv”: "phs_accession" is not Key property of "study", please use "study_id" instead.', 'Batch validation failed - internal error. Please try again and contact the helpdesk if this error persists.']                                                                                                                                                                                                                                                                                               |

Clearly there were some issues that will have to be corrected before the submission can proceed.

### A Note on metadata uploads
When a metadata upload fails, all of the files in the upload are failed, regardless of which files have errors.  While in this demonstration, all of the metadata files are being upload as a group (and therefore all have to be re-uploaded), the Submission Portal does allow metadata files to be be submitted, and evaluated, individually.  When submitted individually, only files that fail the initial upload validation need to be corrected, any files that have already passed will remain in the system.

For this demo, there is a second set of files that have the errors fixed and are in a different directory.

In [140]:
if platform == 'linux' or platform == 'linux2':
    datadir = '/fixedtestdata/'
elif platform == "win32":
    datadir = r"C:\Users\pihltd\Documents\datadir"
elif platform == "darwin":
    datadir = "/fixedtestdata/"
filelist = os.listdir(datadir)
new_metadatafiles = []
for file in filelist:
    new_metadatafiles.append(file)
print(new_metadatafiles)

FileNotFoundError: [Errno 2] No such file or directory: '/fixedtestdata/'

With that in place, we'll go through the same steps to add the files:

1. Create a new batch and grab the batch ID

In [97]:
create_batch_variables = {"submissionID":submissionid, "type":submissiontype, "files":new_metadatafiles}
create_batch_res = apiQuery('stage', create_batch_query, create_batch_variables)
batchid = create_batch_res['data']['createBatch']['_id']

2. Upload the files using the pre-signed URLs

In [98]:
file_upload_result = processFilesForUpload(new_metadatafiles, datadir, create_batch_res)

3. Update the batch

In [99]:
update_variables = {'batchID':batchid, 'files':file_upload_result}
update_res = apiQuery('stage', update_batch_query, update_variables)

And lastly, check the batch for errors

In [101]:
batch_error_res = apiQuery('stage', list_batches_query, batches_variables)
batch_df = pd.DataFrame(batch_error_res['data']['listBatches']['batches'])
display(Markdown(batch_df.to_markdown()))

|    | _id                                  | submissionID                         |   displayID | type     |   fileCount | files                                                                                                                                                                                                                                                                                                                                                         | status   | errors                                                                                                                                                                                                                                                                                                                                                     |
|---:|:-------------------------------------|:-------------------------------------|------------:|:---------|------------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|  0 | 242a9da4-8a26-420a-88bd-732e85aac752 | 0623921e-926f-4e2a-b0f7-6f1d442d855d |           2 | metadata |           7 | [{'fileName': 'Demo_file.tsv'}, {'fileName': 'Demo_participant.tsv'}, {'fileName': 'Demo_program.tsv'}, {'fileName': 'Demo_genomic_info.tsv'}, {'fileName': 'Demo_sample.tsv'}, {'fileName': 'Demo_study.tsv'}, {'fileName': 'Demo_diagnosis.tsv'}]                                                                                                           | Uploaded | []                                                                                                                                                                                                                                                                                                                                                         |
|  1 | 3da22432-1711-4ae3-ae62-75d5258478f6 | 0623921e-926f-4e2a-b0f7-6f1d442d855d |           1 | metadata |          10 | [{'fileName': 'Demo_file.tsv'}, {'fileName': 'Demo_treatment.tsv'}, {'fileName': 'Demo_sampleFIXED.tsv'}, {'fileName': 'Demo_participant.tsv'}, {'fileName': 'Demo_program.tsv'}, {'fileName': 'Demo_genomic_info.tsv'}, {'fileName': 'Demo_sample.tsv'}, {'fileName': 'Demo_study.tsv'}, {'fileName': 'Demo_diagnosis.tsv'}, {'fileName': 'Demo_image.tsv'}] | Failed   | ['“Demo_treatment.tsv:2”:  Key property “treatment_id” value is required.', '“Demo_participant.tsv”: Property "sex" is required.', '“Demo_sample.tsv: 38”: conflict data detected: “sample_type”: "RNA".', '“Demo_sample.tsv: 74”: conflict data detected: “sample_type”: "DNA".', '“Demo_image.tsv:2”:  Key property “study_link_id” value is required.'] |

The status is now **Uploaded** and no errors are reported, so all seven files are now successfully added to the submission.

#### Side Trip

If you log into the Submission Portal, you should see that all files have uploaded and passed.

<a id='step3troubleshooting'></a>
# Step 3 Troubleshooting
If submitting multiple files in a batch is resulting in crtyptic error messages (such as *system error*), it can be useful to submit the files indvidually.  The process is the same, just do each file individually.  This can highlight errors in a single file and let correct files be successfully submitted.

We'll start with the same list of files, but loop through the list rather than submit the entire list at once.

In [142]:
for file in metadatafiles:
    create_batch_variables = {"submissionID":submissionid, "type":submissiontype, "files":file}
    create_batch_res = apiQuery('stage', create_batch_query, create_batch_variables)
    batchid = create_batch_res['data']['createBatch']['_id']
    datadir = "/testdata/"
    file_upload_result = processFilesForUpload(file, datadir, create_batch_res)
    update_variables = {'batchID':batchid, 'files':file_upload_result}
    update_res = apiQuery('stage', update_batch_query, update_variables)

<a id='step4'></a>
# Step 4: Running the Validations
Once you have either metadata templates or data files successfully uploaded to the Submission Portal, you can start running validations.  Validations can be run at any time, you don't have to complete all uploads before running validations.  However, if you do run validations on incomplete submissions, you will see errors relating to the missing information.

It's important to remember that validations are run against everything in the submission, not just against a specific file, or subset of files.

Validations are triggered by running the *validateSubmission* mutation which requires the submission ID and the types of validation to run., and the scope of the validation.
#### Types
- **metadata** - run the validations for the uploaded metadata files
- **data file** - run the validations for the uploaded data files
- Note that both values can be used in a single validation run

#### Scope
- **New** - Run validations only against newly uploaded files.  Any files that have previously been validated will be ignored.
- **All** - Run validations against all the files, both new and previously uploaded.

In [123]:
run_validation_query = """
    mutation ValidateSubmission(
  $id: ID!
  $types: [String]
  $scope: String
){
  validateSubmission(_id: $id, types: $types, scope: $scope){
    success
    message
  }
}
"""

In [124]:
validation_variables = {"id":submissionid, "types":"metadata", "scope":"All"}

In [125]:
validation_res = apiQuery('stage', run_validation_query, validation_variables)
print(validation_res['data']['validateSubmission']['success'])

True


The **success** value simply indicates that the validation process has successfully launched, it *does not* indicate that the validation results are successful.  

To check the validation results, there are two queries that can be run:

- **aggregatedSubmissionQCResults**: This query returns a summary of the errors that have been found.  Running this first is good practice as systemic issues can produce hundreds or thousands of lines of errors, and this report summarizes those into a more easily understood format.
- **submissionQCResults**: This query returns detailed results on each of the errors found during validation.  Note that the results from this query can be numerous and are be a good use case for pagination.  In this example we'll return only the first 10 results, but checking the returned **total** field will be necessary to understand if all results have been returned.

For the purposes of this example, we'll just use two:  **_id** which is the submission ID and will pull back all results for the entire submission, and **severity** which can be set to one of the following:

- **All** - Return all errors regardless of severity
- **Error** - Return only Error level errors.  These will block submission of the study.
- **Warnings** - Return only Warning level errors.  Warnings will not block submission, however they should be corrected if possible.

In [130]:
summaryQuery = """
    query SummaryQueryQCResults(
        $submissionID: ID!,
        $severity: String,
        $first: Int,
        $offset: Int,
        $orderBy: String,
        $sortDirection: String
    ){
        aggregatedSubmissionQCResults(
            submissionID: $submissionID,
            severity: $severity,
            first: $first,
            offset: $offset,
            orderBy: $orderBy
            sortDirection: $sortDirection
        ){
            total
            results{
                title
                severity
                count
                code
            }
        }
    }

"""

In [131]:
summary_variables = {"submissionID":submissionid, "severity":"All", "first":-1, "offset":0, "sortDirection": "desc", "orderBy": "displayID"}

In [132]:
summary_res = apiQuery('stage', summaryQuery, summary_variables)

In [133]:
summary_df = pd.DataFrame(summary_res['data']['aggregatedSubmissionQCResults']['results'])
display(Markdown(summary_df.to_markdown()))

|    | title                             | severity   |   count | code   |
|---:|:----------------------------------|:-----------|--------:|:-------|
|  0 | Relationship not specified        | Error      |     152 | M013   |
|  1 | Program name mismatch             | Error      |       1 | M028   |
|  2 | Invalid Property                  | Warning    |     167 | M017   |
|  3 | Missing required property         | Error      |       4 | M003   |
|  4 | Updating existing data            | Warning    |     280 | M018   |
|  5 | Study name mismatch               | Error      |       1 | M029   |
|  6 | Many-to-one relationship conflict | Error      |       3 | M025   |
|  7 | Value not permitted               | Error      |     447 | M010   |

#### Side Trip
As with other results, these can also be viewed in the Data Submission portal graphical interface. The Submission Portal also allows download of a .csv file if that is preferred.

To get a detailed breakdown of each entry, the **submissionQCResults** query should be used.  This query has a number of different options that can be used to fine-tune the results that are returned so please refer to the documentation for more options.  
This allows large numbers of errors to be handled in a more digestible manner.

In the example below, we'll use the *M003* code to limit the errors returned to just those classified as *Missing required property*.

In [134]:
detailedQCQuery = """
    query DetailedQueryQCResults(
        $id: ID!,
        $severities: String,
        $first: Int,
        $offset: Int,
        $orderBy: String,
        $sortDirection: String
        $issueCode: String
    ){
        submissionQCResults(
            _id:$id,
            severities: $severities,
            first: $first,
            offset: $offset,
            orderBy: $orderBy,
            sortDirection: $sortDirection,
            issueCode: $issueCode
        ){
        total
        results{
            submissionID
            type
            validationType
            batchID
            displayID
            submittedID
            severity
            uploadedDate
            validatedDate
            errors{
                title
                description
            }
            warnings{
                title
                description
            }
        }
        }
    }
"""

In [135]:
detail_variables = {"id": submissionid, "severities":"All", "first": -1, "offset": 0, "orderBy":"displayID", "sortDirection":"desc", "issueCode":"M003"}

In [136]:
detail_res = apiQuery('stage', detailedQCQuery, detail_variables)

In [137]:
detail_df = pd.DataFrame(detail_res['data']['submissionQCResults']['results'])
display(Markdown(detail_df.to_markdown()))

|    | submissionID                         | type   | validationType   | batchID                              |   displayID | submittedID   | severity   | uploadedDate             | validatedDate            | errors                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | warnings                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
|---:|:-------------------------------------|:-------|:-----------------|:-------------------------------------|------------:|:--------------|:-----------|:-------------------------|:-------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|  0 | d7824107-91f2-4825-9b94-c0993e07b4cc | study  | metadata         | 564c0a73-9b08-4b3d-b4d1-084fba856480 |          17 | phs0002       | Error      | 2025-10-01T18:35:54.471Z | 2025-10-01T18:37:50.333Z | [{'title': 'Study name mismatch', 'description': "[PDXNet_study.tsv: line 2] Study name mismatch: Study name doesn't match the Data Submission's associated study - 'API Example Study'."}, {'title': 'Missing required property', 'description': '[PDXNet_study.tsv: line 2] Required property "short_description" is empty.'}, {'title': 'Missing required property', 'description': '[PDXNet_study.tsv: line 2] Required property "study_external_url" is empty.'}, {'title': 'Missing required property', 'description': '[PDXNet_study.tsv: line 2] Required property "file_types_and_format" is empty.'}, {'title': 'Missing required property', 'description': '[PDXNet_study.tsv: line 2] Required property "study_version" is empty.'}, {'title': 'Value not permitted', 'description': '[PDXNet_study.tsv: line 2] "Genomic" is not a permissible value for property "study_data_types".'}] | [{'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "primary_investigator_name" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "primary_investigator_email" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "co_investigator_name" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "co_investigator_email" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "index_date" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "cds_requestor" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "clinical_trial_system" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "clinical_trial_identifier" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "clinical_trial_arm" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "data_types" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "file_types" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "cds_primary_bucket" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "cds_secondary_bucket" is not defined in the model.'}, {'title': 'Invalid Property', 'description': '[PDXNet_study.tsv: line 2] Property "cds_tertiary_bucket" is not defined in the model.'}] |

Since the actual errors and warnings are buried in lists, we'll parse them out to make them more visible

In [138]:
columns = ['title', 'description']
error_df = pd.DataFrame(columns=columns)
for result in detail_res['data']['submissionQCResults']['results']:
    for error in result['errors']:
        error_df.loc[len(error_df)] = error
display(Markdown(error_df.to_markdown()))

|    | title                     | description                                                                                                                            |
|---:|:--------------------------|:---------------------------------------------------------------------------------------------------------------------------------------|
|  0 | Study name mismatch       | [PDXNet_study.tsv: line 2] Study name mismatch: Study name doesn't match the Data Submission's associated study - 'API Example Study'. |
|  1 | Missing required property | [PDXNet_study.tsv: line 2] Required property "short_description" is empty.                                                             |
|  2 | Missing required property | [PDXNet_study.tsv: line 2] Required property "study_external_url" is empty.                                                            |
|  3 | Missing required property | [PDXNet_study.tsv: line 2] Required property "file_types_and_format" is empty.                                                         |
|  4 | Missing required property | [PDXNet_study.tsv: line 2] Required property "study_version" is empty.                                                                 |
|  5 | Value not permitted       | [PDXNet_study.tsv: line 2] "Genomic" is not a permissible value for property "study_data_types".                                       |

<a href='step5'></a>
# Step 5:  Submitting, Canceling, or Withdrawing
The last step of this process techincally is the submission to CRDC, however the same query is used to cancel a submission, or to withdraw a submission.  Let's quickly go over what each of those means:

- **Submit** : Once all of the validation errors have been corrected and the validation results are either completely clean or only have warnings, the study is ready to be submitted.  Sending a submit request will hand over control of the files and data to the CRDC Data Team for final checks.  Note that once you submit a submission, no further edits are allowed.
  
- **Cancel** : If you want to abandon a submission *that has not been submitted to CRDC yet*, sending a cancellation request will lock the submission and withdraw it from the system.  **Further work is not allowed on cancelled submissions so be sure that you want to cancel before you issue this query.**
  
- **Withdraw** : Withdraw is similar to cancel only it is used on submissions that have already been submitted to CRDC.  So if you find that a study was submitted before everythign was complete, or if other errors are found that necessitate stopping the submission process, sending a **Withdraw** query will prevent the release of the submitted data to the data commons and return the submission to it's previous, unsubmitted, state.



In [110]:
submission_query = """
mutation Submit(
    $id: ID!
    $action: String!
    $comment: String
){
    submissionAction(submissionID: $id, action: $action, comment: $comment){
        name
        submitterID
        submitterName
        dataCommons
        modelVersion
        studyAbbreviation
        dbGaPID
        status
    }
}
"""

In [111]:
submission_variables = {"id":submissionid, "action": "Submit", "comment":"Example submission"}

In [None]:
submission_res = apiQuery('stage', submission_query, submission_variables)

# Conclusions
<a id='conclusions'></a>
At this point, we've walked through the basics of creating a submission, uploading, validating, and submitting (or not) data using the API system.  There are more queries and mutations that are available to provide additional information and capabilties for integrating with your systems and we suggest reading the API documentation for further details.  And while this example is in Python, any language that can use GraphQL queries is suitable for interaction with this API.

If you have any questions about using this API, please contact the [CRDC Helpdesk](mailto:NCICRDC%40mail.nih.gov?subject=Data%20Submission%20API%20Question).