In [1]:
library(TCGAbiolinks)
library(stringr)
library(tidyverse)
library(readr)

── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
✔ ggplot2 3.1.0     ✔ readr   1.1.1
✔ tibble  1.4.2     ✔ purrr   0.2.5
✔ tidyr   0.8.2     ✔ dplyr   0.7.7
✔ ggplot2 3.1.0     ✔ forcats 0.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()

Attaching package: ‘cowplot’

The following object is masked from ‘package:ggplot2’:

    ggsave



This notebook uses the [TCGAbiolinks](http://bioconductor.org/packages/release/bioc/html/TCGAbiolinks.html) R package to download TCGA data from GDC. We then combine MAF files with CNV data and add in information on the cellularity of samples so that we can calculate the mutation copy number (MCN) of each variant.

In [57]:
#get all TCGA cancer subtype codes
tcgacodes <- TCGAbiolinks:::getGDCprojects()$project_id
tcgacodes <- unlist(lapply(tcgacodes[str_detect(tcgacodes, "TCGA")],
                    function(x){strsplit(x, "-")[[1]][2]}))

In [58]:
#download mutect MAF files for all samples and save to csv file

dfhg38 <- data.frame()
for (t in tcgacodes){
  maf <- GDCquery_Maf(t, pipelines = "mutect") %>%
    mutate(cancertype = t)
  dfhg38 <- rbind(dfhg38, maf)
}
dfhg38 <- dfhg38 %>%
    mutate(sampleid = str_sub(Tumor_Sample_Barcode, 1, 16)) #annotate so barcode is consistent with TCGA CNV id
write_delim(dfhg38, "data/TCGA-maf-all-hg38.csv", delim = ",")

#dfhg38 <- read_csv("data/TCGA-maf-all-hg38.csv")

 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2080&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-SARC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-SARC
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-SARC
Of the 1 files for download 1 already exist.
All samples have been already downloaded
 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------

[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=744&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-ACC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-ACC
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-ACC
GDCdownload will download: 2.666716 MB
Downloading as: TCGA.ACC.mutect.81ac2c46-37db-4dcd-923a-061a7ae626a3.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=672&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-MESO%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-MESO
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-MESO
GDCdownload will download: 1.024822 MB
Downloading as: TCGA.MESO.mutect.88b38a05-e46a-49e1-9c4d-e098709256b1.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1318&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-READ%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-READ
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-READ
GDCdownload will download: 15.185535 MB
Downloading as: TCGA.READ.mutect.faa5f62a-2731-4867-a264-0e85b7074e87.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=4248&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-LGG%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-LGG
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-LGG
GDCdownload will download: 8.937968 MB
Downloading as: TCGA.LGG.mutect.1e0694ca-fcde-41d3-9ae3-47cfaf527f25.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=3536&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-STAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-STAD
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-STAD
Of the 1 files for download 1 already exist.
All samples have been already downloaded




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=4040&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-THCA%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-THCA
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-THCA
GDCdownload will download: 2.89136 MB
Downloading as: TCGA.THCA.mutect.13999735-2e70-439f-a6d9-45d831ba1a1a.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=3992&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-GBM%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-GBM
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-GBM
GDCdownload will download: 20.417648 MB
Downloading as: TCGA.GBM.mutect.da904cd3-79d7-4ae3-b6c0-e7127998b3e6.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=3784&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-SKCM%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-SKCM
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-SKCM
GDCdownload will download: 89.383976 MB
Downloading as: TCGA.SKCM.mutect.4b7a5729-b83e-4837-9b61-a6002dce1c0a.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=416&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-CHOL%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-CHOL
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-CHOL
GDCdownload will download: 1.351191 MB
Downloading as: TCGA.CHOL.mutect.c116f412-e251-4192-9bc5-3ce3cfaaa774.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=3016&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-KIRC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-KIRC
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-KIRC
GDCdownload will download: 7.119265 MB
Downloading as: TCGA.KIRC.mutect.2a8f2c83-8b5e-4987-8dbf-01f7ee24dc26.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=8648&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-BRCA%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-BRCA
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-BRCA
GDCdownload will download: 31.058116 MB
Downloading as: TCGA.BRCA.mutect.995c0111-d90b-4140-bee7-3845436c3b42.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=4880&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-OV%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-OV
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-OV
GDCdownload will download: 19.728966 MB
Downloading as: TCGA.OV.mutect.b22b85eb-2ca8-4c9f-a1cd-b77caab999bd.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1256&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-TGCT%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-TGCT
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-TGCT
GDCdownload will download: 876.679 KB
Downloading as: TCGA.TGCT.mutect.6f6a4290-b6be-49f5-be45-97d742957a9e.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=536&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-KICH%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-KICH
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-KICH
GDCdownload will download: 784.872 KB
Downloading as: TCGA.KICH.mutect.ddb523ba-29ac-4056-82ca-4147d2e98ddf.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=992&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-THYM%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-THYM
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-THYM
GDCdownload will download: 1.23841 MB
Downloading as: TCGA.THYM.mutect.91ddbf37-6429-4338-89df-2d246a8e2d00.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=4494&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-LUSC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-LUSC
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-LUSC
GDCdownload will download: 44.346082 MB
Downloading as: TCGA.LUSC.mutect.95258183-63ea-4c97-ae29-1bae9ed06334.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=648&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-UVM%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-UVM
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-UVM
GDCdownload will download: 505.413 KB
Downloading as: TCGA.UVM.mutect.6c7b01bc-b068-4e01-8b4d-0362f5959f65.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=304&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-DLBC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-DLBC
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-DLBC
GDCdownload will download: 1.640535 MB
Downloading as: TCGA.DLBC.mutect.c3df46a9-85d1-45d4-954a-825313d4a26d.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=4496&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-UCEC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-UCEC
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-UCEC
GDCdownload will download: 194.842948 MB
Downloading as: TCGA.UCEC.mutect.d3fa70be-520a-420e-bb6d-651aeee5cb50.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=4032&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-PRAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-PRAD
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-PRAD
GDCdownload will download: 7.615526 MB
Downloading as: TCGA.PRAD.mutect.deca36be-bf05-441a-b2e4-394228f23fbe.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1464&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-LAML%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-LAML
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-LAML
GDCdownload will download: 2.549936 MB
Downloading as: TCGA.LAML.mutect.27f42413-6d8f-401f-9d07-d019def8939e.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=3032&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-LIHC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-LIHC
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-LIHC
GDCdownload will download: 13.824544 MB
Downloading as: TCGA.LIHC.mutect.a630f0a0-39b3-4aab-8181-89c1dde8d3e2.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=4104&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-HNSC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-HNSC
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-HNSC
GDCdownload will download: 25.886548 MB
Downloading as: TCGA.HNSC.mutect.1aa33f25-3893-4f37-a6a4-361c9785d07e.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=3952&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-COAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-COAD
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-COAD
Of the 1 files for download 1 already exist.
All samples have been already downloaded




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1486&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-ESCA%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-ESCA
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-ESCA
GDCdownload will download: 11.16495 MB
Downloading as: TCGA.ESCA.mutect.7f8e1e7c-621c-4dfd-8fad-af07c739dbfc.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2464&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-CESC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-CESC
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-CESC
GDCdownload will download: 24.84317 MB
Downloading as: TCGA.CESC.mutect.5ffa70b1-61b4-43d1-b10a-eda412187c17.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2352&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-KIRP%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-KIRP
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-KIRP
GDCdownload will download: 6.341025 MB
Downloading as: TCGA.KIRP.mutect.1ab98b62-5863-4440-84f9-3c15d476d523.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=5368&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-LUAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-LUAD
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-LUAD
GDCdownload will download: 50.299539 MB
Downloading as: TCGA.LUAD.mutect.0458c57f-316c-4a7c-9294-ccd11c97c2f9.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=3408&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-BLCA%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-BLCA
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-BLCA
GDCdownload will download: 34.215484 MB
Downloading as: TCGA.BLCA.mutect.0e239d8f-47b0-4e47-9716-e9ecc87605b9.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=464&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-UCS%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-UCS
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-UCS
GDCdownload will download: 2.636756 MB
Downloading as: TCGA.UCS.mutect.02747363-f04a-4ba6-a079-fe4f87853788.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1480&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-PCPG%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-PCPG
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-PCPG
GDCdownload will download: 667.558 KB
Downloading as: TCGA.PCPG.mutect.64e23e2f-ec04-4f6b-82b3-375e2d49804b.DR-10.0.somatic.maf.gz




 For more information about MAF data please read the following GDC manual and web pages:
 GDC manual: https://gdc-docs.nci.nih.gov/Data/PDF/Data_UG.pdf
 https://gdc-docs.nci.nih.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/
 https://gdc.cancer.gov/about-gdc/variant-calling-gdc
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1480&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-PAAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Simple%20Nucleotide%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Masked%20Somatic%20Mutation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22MuTect2%20Variant%20Aggregation%20and%20Masking%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.access%22,%22value%22:[%22open%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-PAAD
--------------------
oo Filtering results
--------------------
ooo By access
ooo By data.type
ooo By workflow.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-PAAD
GDCdownload will download: 6.991687 MB
Downloading as: TCGA.PAAD.mutect.fea333b5-78e0-43c8-bf76-4c78dd3fac92.DR-10.0.somatic.maf.gz




In [2]:
dfhg38 <- read_csv("data/TCGA-maf-all-hg38.csv") %>%
    dplyr::mutate(VAF = t_alt_count/t_depth) %>%
    dplyr::rename(chr = Chromosome, start = Start_Position, end = End_Position) %>%
    dplyr::select(sampleid, chr, start, end, Reference_Allele, Tumor_Seq_Allele2, VAF, 
                  t_depth, t_ref_count, t_alt_count, n_depth,
                  n_ref_count, n_alt_count, cancertype) %>%
    dplyr::mutate(nref = str_length(Reference_Allele), nalt = str_length(Tumor_Seq_Allele2)) %>%
    dplyr::mutate(mutation_type = ifelse((nref - nalt) == 0, "SNV", "INS/DEL")) %>%
    dplyr::select(-nref, -nalt)

Parsed with column specification:
cols(
  .default = col_character(),
  Entrez_Gene_Id = col_integer(),
  Start_Position = col_integer(),
  End_Position = col_integer(),
  t_depth = col_integer(),
  t_ref_count = col_integer(),
  t_alt_count = col_integer(),
  n_depth = col_integer(),
  ALLELE_NUM = col_integer(),
  DISTANCE = col_integer(),
  TRANSCRIPT_STRAND = col_integer(),
  GMAF = col_double(),
  AFR_MAF = col_double(),
  AMR_MAF = col_double(),
  EAS_MAF = col_double(),
  EUR_MAF = col_double(),
  SAS_MAF = col_double(),
  AA_MAF = col_double(),
  EA_MAF = col_double(),
  PICK = col_integer(),
  TSL = col_integer()
  # ... with 11 more columns
)
See spec(...) for full column specifications.


In [4]:
dfhg38 %>%
    distinct(sampleid, cancertype) %>%
    group_by(cancertype) %>%
    summarise(n = n())

cancertype,n
ACC,92
BLCA,412
BRCA,986
CESC,289
CHOL,51
COAD,399
DLBC,37
ESCA,184
GBM,393
HNSC,508


### Download copy number data

In [65]:
### Download copy number data
### For some reason this takes much longer than the SNV files
tcgacodes <- TCGAbiolinks:::getGDCprojects()$project_id
query <- GDCquery(project = tcgacodes[str_detect(tcgacodes, "TCGA")],
                data.category = "Copy Number Variation",
                data.type = "Copy Number Segment")

cnvsamples <- getResults(query)
GDCdownload(query)
dfhg38cnv <- GDCprepare(query)
write_delim(dfhg38cnv, "data/TCGA-cnv-all-hg38.csv", delim = ",")

#dfhg38cnv <- read.csv("data/TCGA-cnv-all-hg38.csv")

--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1044&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-SARC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-SARC


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=360&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-ACC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-ACC


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=346&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-MESO%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-MESO


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=640&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-READ%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-READ


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2042&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-LGG%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-LGG


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1812&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-STAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-STAD


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2050&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-THCA%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-THCA


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2320&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-GBM%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-GBM


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1878&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-SKCM%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-SKCM


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=170&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-CHOL%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-CHOL


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2244&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-KIRC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-KIRC


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=4458&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-BRCA%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-BRCA


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2400&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-OV%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-OV


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=606&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-TGCT%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-TGCT


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=264&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-KICH%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-KICH


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=498&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-THYM%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-THYM


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2118&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-LUSC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-LUSC


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=320&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-UVM%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-UVM


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=196&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-DLBC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-DLBC


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2210&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-UCEC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-UCEC


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2076&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-PRAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-PRAD


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=794&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-LAML%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-LAML


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1536&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-LIHC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-LIHC


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2184&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-HNSC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-HNSC


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1952&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-COAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-COAD


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=746&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-ESCA%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-ESCA


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1172&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-CESC%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-CESC


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1216&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-KIRP%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-KIRP


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=2294&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-LUAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-LUAD


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=1628&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-BLCA%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-BLCA


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=220&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-UCS%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-UCS


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=726&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-PCPG%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-PCPG


[1] "https://api.gdc.cancer.gov/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=736&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-PAAD%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22Copy%20Number%20Variation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_type%22,%22value%22:[%22Copy%20Number%20Segment%22]%7D%7D]%7D&format=JSON"


ooo Project: TCGA-PAAD
--------------------
oo Filtering results
--------------------
ooo By data.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
Downloading data for project TCGA-SARC
GDCdownload will download 522 files. A total of 28.168272 MB
Downloading as: Mon_Dec__3_18_40_23_2018.tar.gz


Downloading: 6.4 MB     

Downloading data for project TCGA-ACC
GDCdownload will download 180 files. A total of 6.3478 MB
Downloading as: Mon_Dec__3_18_42_21_2018.tar.gz


Downloading: 1.5 MB     

Downloading data for project TCGA-MESO
GDCdownload will download 173 files. A total of 6.897985 MB
Downloading as: Mon_Dec__3_18_42_57_2018.tar.gz


Downloading: 1.6 MB     

Downloading data for project TCGA-READ
GDCdownload will download 320 files. A total of 13.27483 MB
Downloading as: Mon_Dec__3_18_43_29_2018.tar.gz


Downloading: 3.1 MB     

Downloading data for project TCGA-LGG
GDCdownload will download 1021 files. A total of 31.482694 MB
Downloading as: Mon_Dec__3_18_44_26_2018.tar.gz


Downloading: 7.2 MB     

Downloading data for project TCGA-STAD
Of the 906 files for download 906 already exist.
All samples have been already downloaded
Downloading data for project TCGA-THCA
GDCdownload will download 1025 files. A total of 27.863185 MB
Downloading as: Mon_Dec__3_18_48_24_2018.tar.gz


Downloading: 6.3 MB     

Downloading data for project TCGA-GBM
GDCdownload will download 1160 files. A total of 48.79619 MB
Downloading as: Mon_Dec__3_18_52_20_2018.tar.gz


Downloading: 11 MB     

Downloading data for project TCGA-SKCM
GDCdownload will download 939 files. A total of 36.204626 MB
Downloading as: Mon_Dec__3_18_56_30_2018.tar.gz


Downloading: 8.4 MB     

Downloading data for project TCGA-CHOL
GDCdownload will download 85 files. A total of 2.965378 MB
Downloading as: Mon_Dec__3_19_00_13_2018.tar.gz


Downloading: 680 kB     

Downloading data for project TCGA-KIRC
GDCdownload will download 1122 files. A total of 37.771659 MB
Downloading as: Mon_Dec__3_19_00_34_2018.tar.gz


Downloading: 8.7 MB     

Downloading data for project TCGA-BRCA
GDCdownload will download 2229 files. A total of 87.401959 MB
Downloading as: Mon_Dec__3_19_05_16_2018.tar.gz


Downloading: 20 MB     

Downloading data for project TCGA-OV
GDCdownload will download 1200 files. A total of 83.068421 MB
Downloading as: Mon_Dec__3_19_13_46_2018.tar.gz


Downloading: 19 MB     

Downloading data for project TCGA-TGCT
GDCdownload will download 303 files. A total of 12.012498 MB
Downloading as: Mon_Dec__3_19_18_56_2018.tar.gz


Downloading: 2.8 MB     

Downloading data for project TCGA-KICH
GDCdownload will download 132 files. A total of 4.348909 MB
Downloading as: Mon_Dec__3_19_19_46_2018.tar.gz


Downloading: 1,000 kB     

Downloading data for project TCGA-THYM
GDCdownload will download 249 files. A total of 7.888214 MB
Downloading as: Mon_Dec__3_19_20_07_2018.tar.gz


Downloading: 1.8 MB     

Downloading data for project TCGA-LUSC
GDCdownload will download 1059 files. A total of 48.243441 MB
Downloading as: Mon_Dec__3_19_20_48_2018.tar.gz


Downloading: 11 MB     

Downloading data for project TCGA-UVM
GDCdownload will download 160 files. A total of 4.971229 MB
Downloading as: Mon_Dec__3_19_24_43_2018.tar.gz


Downloading: 1.1 MB     

Downloading data for project TCGA-DLBC
GDCdownload will download 98 files. A total of 4.00475 MB
Downloading as: Mon_Dec__3_19_25_09_2018.tar.gz


Downloading: 920 kB     

Downloading data for project TCGA-UCEC
GDCdownload will download 1105 files. A total of 48.12396 MB
Downloading as: Mon_Dec__3_19_25_25_2018.tar.gz


Downloading: 11 MB     

Downloading data for project TCGA-PRAD
GDCdownload will download 1038 files. A total of 45.67198 MB
Downloading as: Mon_Dec__3_19_29_34_2018.tar.gz


Downloading: 10 MB     

Downloading data for project TCGA-LAML
GDCdownload will download 397 files. A total of 63.554851 MB
Downloading as: Mon_Dec__3_19_33_31_2018.tar.gz


Downloading: 14 MB     

Downloading data for project TCGA-LIHC
GDCdownload will download 768 files. A total of 28.047594 MB
Downloading as: Mon_Dec__3_19_34_55_2018.tar.gz


Downloading: 6.5 MB     

Downloading data for project TCGA-HNSC
GDCdownload will download 1092 files. A total of 40.248999 MB
Downloading as: Mon_Dec__3_19_48_46_2018.tar.gz


Downloading: 9.3 MB     

Downloading data for project TCGA-COAD
Of the 976 files for download 976 already exist.
All samples have been already downloaded
Downloading data for project TCGA-ESCA
GDCdownload will download 373 files. A total of 16.69091 MB
Downloading as: Mon_Dec__3_19_52_57_2018.tar.gz


Downloading: 3.9 MB     

Downloading data for project TCGA-CESC
GDCdownload will download 586 files. A total of 21.998315 MB
Downloading as: Mon_Dec__3_19_54_05_2018.tar.gz


Downloading: 5.1 MB     

Downloading data for project TCGA-KIRP
GDCdownload will download 608 files. A total of 23.124967 MB
Downloading as: Mon_Dec__3_19_55_50_2018.tar.gz


Downloading: 5.3 MB     

Downloading data for project TCGA-LUAD
GDCdownload will download 1147 files. A total of 42.169074 MB
Downloading as: Mon_Dec__3_19_57_33_2018.tar.gz


Downloading: 9.7 MB     

Downloading data for project TCGA-BLCA
GDCdownload will download 814 files. A total of 40.539429 MB
Downloading as: Mon_Dec__3_20_01_24_2018.tar.gz


Downloading: 9.2 MB     

Downloading data for project TCGA-UCS
GDCdownload will download 110 files. A total of 4.210524 MB
Downloading as: Mon_Dec__3_20_04_30_2018.tar.gz


Downloading: 980 kB     

Downloading data for project TCGA-PCPG
GDCdownload will download 363 files. A total of 21.77809 MB
Downloading as: Mon_Dec__3_20_04_49_2018.tar.gz


Downloading: 4.8 MB     

Downloading data for project TCGA-PAAD
GDCdownload will download 368 files. A total of 16.205495 MB
Downloading as: Mon_Dec__3_20_05_53_2018.tar.gz


Downloading: 3.7 MB     

Reading copy number variation files




In [5]:
dfhg38cnv <- read_csv("data/TCGA-cnv-all-hg38.csv") %>%
    dplyr::mutate(sampleid = str_sub(Sample, 1, 16)) %>%
    dplyr::select(-Sample, -X1) %>%
    filter(sampleid %in% dfhg38$sampleid)

“Missing column names filled in: 'X1' [1]”Parsed with column specification:
cols(
  X1 = col_integer(),
  Sample = col_character(),
  Chromosome = col_character(),
  Start = col_integer(),
  End = col_integer(),
  Num_Probes = col_integer(),
  Segment_Mean = col_double()
)
“2 parsing failures.
row # A tibble: 2 x 5 col        row col   expected               actual file                         expected      <int> <chr> <chr>                  <chr>  <chr>                        actual 1  9595400 Start no trailing characters .4     'data/TCGA-cnv-all-hg38.csv' file 2 13891890 End   no trailing characters .4     'data/TCGA-cnv-all-hg38.csv'
”

In [6]:
head(dfhg38)

sampleid,chr,start,end,Reference_Allele,Tumor_Seq_Allele2,VAF,t_depth,t_ref_count,t_alt_count,n_depth,n_ref_count,n_alt_count,cancertype,mutation_type
TCGA-DX-A48O-01A,chr1,43171404,43171404,C,A,0.4444444,36,20,16,14,,,SARC,SNV
TCGA-DX-A48O-01A,chr1,74182619,74182619,T,A,0.3900709,141,86,55,63,,,SARC,SNV
TCGA-DX-A48O-01A,chr1,149488066,149488066,T,A,0.6666667,3,1,2,64,,,SARC,SNV
TCGA-DX-A48O-01A,chr1,156346990,156346990,A,-,0.3255814,344,232,112,114,,,SARC,SNV
TCGA-DX-A48O-01A,chr1,161671461,161671461,C,T,0.1058394,548,490,58,193,,,SARC,SNV
TCGA-DX-A48O-01A,chr1,161756830,161756849,GTATGGTGAACAGTGCTCTT,-,0.199005,201,161,40,69,,,SARC,INS/DEL


In [7]:
head(dfhg38cnv)

Chromosome,Start,End,Num_Probes,Segment_Mean,sampleid
1,61735,629241,32,0.1549,TCGA-KT-A74X-01A
1,668210,12799264,6488,-0.6211,TCGA-KT-A74X-01A
1,12802105,13406956,71,-0.1658,TCGA-KT-A74X-01A
1,13407061,15816465,1791,-0.6491,TCGA-KT-A74X-01A
1,15820537,15827744,19,0.1148,TCGA-KT-A74X-01A
1,15828471,16883108,411,-0.6292,TCGA-KT-A74X-01A


### Read in cellularity

In [8]:
### Import cellularity estimates

cellularity <- read.delim("data/ascat_acf_ploidy.tsv", header = T) %>%
  mutate(sampleid = str_sub(gsub("[.]", "-", Sample), 1, 16)) %>%
  select(-Sample) %>%
  dplyr::rename(ploidy = Ploidy, cellularity = Aberrant_Cell_Fraction.Purity.)
head(cellularity)

Cancer_Type_Code,cellularity,ploidy,sampleid
ACC,0.84,1.969985,TCGA-OR-A5J1-01A
ACC,0.63,3.720382,TCGA-OR-A5J2-01A
ACC,0.77,2.498135,TCGA-OR-A5J3-01A
ACC,0.76,2.66979,TCGA-OR-A5J4-01A
ACC,0.77,2.747113,TCGA-OR-A5J5-01A
ACC,0.52,4.631901,TCGA-OR-A5J6-01A


### Combine all data types

In [8]:
#filter for useful columns and add new ones to snv df

dfsnv <- dfhg38 %>%
    dplyr::mutate(VAF = t_alt_count/t_depth) %>%
    dplyr::rename(chr = Chromosome, start = Start_Position, end = End_Position) %>%
    dplyr::select(sampleid, chr, start, end, Reference_Allele, Tumor_Seq_Allele2, VAF, 
                  t_depth, t_ref_count, t_alt_count, n_depth,
                  n_ref_count, n_alt_count, cancertype) %>%
    dplyr::mutate(nref = str_length(Reference_Allele), nalt = str_length(Tumor_Seq_Allele2)) %>%
    dplyr::mutate(mutation_type = ifelse((nref - nalt) == 0, "SNV", "INS/DEL")) %>%
    dplyr::select(-nref, -nalt)

In [13]:
#combine snv with cellularity
dfsnv <- dfhg38
df1temp <- left_join(dfsnv, cellularity, by = c("sampleid")) %>%
    filter(cellularity > 0.2) #remove cellularity < 20%

In [18]:
#format CNV for easy joining
dfhg38cnvt <- dfhg38cnv %>%
    dplyr::rename(chr = Chromosome) %>%#, start = Start, end = End) %>%
    dplyr::mutate(chr = paste0("chr", chr)) %>%
    dplyr::filter(sampleid %in% unique(df1temp$sampleid))

In [19]:
length(unique(dfhg38cnvt$sampleid))

In [20]:
#join snv, cnv and cellularity
df2temp <- inner_join(df1temp, dfhg38cnvt, by = c("sampleid", "chr")) %>%
    filter(start >= Start & end <= End) %>%
    select(-Start, -End)

In [23]:
length(unique(df2temp$sampleid))

In [25]:
#calculate copy number and mutation copy number MCN

dfcombinedhg38 <- df2temp %>%
    select(-n_ref_count, -n_alt_count, -Num_Probes)

dfout <- dfcombinedhg38 %>%
    mutate(cellularity = ifelse(is.na(cellularity), 1, as.numeric(cellularity))) %>%
    #calculate CN by correcting for cellularit
    mutate(CN = 2^Segment_Mean * 2, CNcorrected = (2^Segment_Mean + cellularity - 1) * (2 / cellularity), 
          absCN = round(CN), absCNcorrected = round(CNcorrected)) %>% 
    #don't allow CN == 0
    mutate(absCN = ifelse(absCN == 0, 1, absCN), absCNcorrected = ifelse(absCNcorrected == 0, 1, absCNcorrected)) %>%
    mutate(MCN = ((CNcorrected - 2) * 1 + 2) * VAF/cellularity)
head(dfout)

sampleid,chr,start,end,Reference_Allele,Tumor_Seq_Allele2,VAF,t_depth,t_ref_count,t_alt_count,⋯,mutation_type,Cancer_Type_Code,cellularity,ploidy,Segment_Mean,CN,CNcorrected,absCN,absCNcorrected,MCN
TCGA-DX-A48O-01A,chr1,43171404,43171404,C,A,0.4444444,36,20,16,⋯,SNV,SARC,0.86,1.819866,0.2133,2.318674,2.370551,2,2,1.225091
TCGA-DX-A48O-01A,chr1,74182619,74182619,T,A,0.3900709,141,86,55,⋯,SNV,SARC,0.86,1.819866,0.1566,2.229314,2.266644,2,2,1.0280838
TCGA-DX-A48O-01A,chr1,149488066,149488066,T,A,0.6666667,3,1,2,⋯,SNV,SARC,0.86,1.819866,0.1915,2.283901,2.330117,2,2,1.8062924
TCGA-DX-A48O-01A,chr1,156346990,156346990,A,-,0.3255814,344,232,112,⋯,SNV,SARC,0.86,1.819866,0.7182,3.290256,3.500298,3,4,1.3251534
TCGA-DX-A48O-01A,chr1,161671461,161671461,C,T,0.1058394,548,490,58,⋯,SNV,SARC,0.86,1.819866,0.7266,3.30947,3.522639,3,4,0.433528
TCGA-DX-A48O-01A,chr1,161756830,161756849,GTATGGTGAACAGTGCTCTT,-,0.199005,201,161,40,⋯,INS/DEL,SARC,0.86,1.819866,0.7266,3.30947,3.522639,3,4,0.8151427


In [26]:
#write hg38 file out
write_delim(dfout, "data/TCGA-combined-hg38-2.csv", delim = ",")

### Download clinical info

In [None]:
# The below will also download clinical information, although we do not use this
tcgacodes <- TCGAbiolinks:::getGDCprojects()$project_id
tcgacodes <- tcgacodes[str_detect(tcgacodes, "TCGA")]

dfclinical <- data.frame()
for (p in tcgacodes[1:length(tcgacodes)-1]){
    print(p)
    query <- GDCquery(project = p, 
                      data.category = "Clinical")
    GDCdownload(query)
    clinical <- GDCprepare_clinic(query, clinical.info = "patient", directory = "GDCdata/")
    names(clinical)
    if("stage_event_system_version" %in% names(clinical)){
        dfclinical <- bind_rows(dfclinical, select(clinical, -stage_event_system_version))
    } else if("patient_id" %in% names(clinical)){
        dfclinical <- bind_rows(dfclinical, select(clinical, -patient_id))
    }
    else{
        dfclinical <- bind_rows(dfclinical, clinical)
    }
}
    
dfout <- select(dfclinical, bcr_patient_barcode, tumor_tissue_site, histological_type,
vital_status, days_to_birth, days_to_last_known_alive, days_to_death, days_to_last_followup,
stage_event_clinical_stage, stage_event_pathologic_stage, stage_event_tnm_categories,
stage_event_gleason_grading)

write.csv(dfout, "data/TCGA-clinical.csv", row.names = F)