# Diagnose prediction using Bladder cancer genomics data

### Installation

In [1]:
if (!"BiocManager" %in% rownames(installed.packages()))
  install.packages("BiocManager")
BiocManager::install("TCGAWorkflow")

Bioconductor version 3.10 (BiocManager 1.30.10), R 3.6.2 (2019-12-12)

Installing package(s) 'TCGAWorkflow'

installing the source package ‘TCGAWorkflow’


Old packages: 'actuar', 'anytime', 'ape', 'arm', 'backports', 'bibtex',
  'BisqueRNA', 'bit', 'bit64', 'bookdown', 'broom', 'car', 'circlize', 'covr',
  'cowplot', 'dbplyr', 'deldir', 'dendextend', 'devtools', 'docopt', 'dplyr',
  'DT', 'effects', 'ellipse', 'europepmc', 'expm', 'fields', 'fit.models',
  'forecast', 'forecastHybrid', 'fpc', 'FrF2', 'fs', 'future', 'future.apply',
  'gbm', 'GetoptLong', 'GGally', 'ggdendro', 'ggforce', 'ggpmisc', 'ggpubr',
  'glmnet', 'GlobalOptions', 'globals', 'gmp', 'gower', 'gplots', 'haven',
  'hdf5r', 'heatmaply', 'Hmisc', 'htmlTable', 'htmltools', 'hts', 'httpuv',
  'httr', 'IRkernel', 'jsonlite', 'kableExtra', 'knitr', 'later', 'lavaan',
  'libcoin', 'lmtest', 'lpSolveAPI', 'lsei', 'lubridate', 'maptools', 'MASS',
  'MBESS', 'MCMCpack', 'metap', 'mgcv', 'mnormt', 'mvtnorm', 'nlme', 'nloptr',


### Loading packages

TCGAWorkflowData: this package contains the data needed to perform each analysis step. This is a subset of the download to make the examples faster. For the real analysis, use all available data.

DT: We will use it to visualize the results.

In [1]:
library(TCGAWorkflowData)
library(DT)
library(TCGAbiolinks)

“replacing previous import ‘vctrs::data_frame’ by ‘tibble::data_frame’ when loading ‘dplyr’”


In [None]:
library(Seurat)

# Abstract

This demo provides a series of biologically comprehensive analyses for different molecular data. We describe how to download, process and prepare TCGA data, and how to extract biologically significant genomic and epigenomic data by utilizing several key Bioconductor packages.

## Introduction

Recent technological developments have made it possible to deposit large amounts of genomic data, such as gene expression, into freely available public international consortia such as the Cancer Genome Atlas (TCGA).

The Cancer Genome Atlas (TCGA) is an initiative of the National Institutes of Health (NIH) that makes publicly available molecular and clinical information on more than 30 human cancers, including exomes (mutation analysis), single nucleotide polymorphisms (SNPs), DNA methylation, transcriptomes (mRNAs), microRNAs (miRNAs), and proteomes. The TCGA makes available The sample types are: primary solid tumor, recurrent solid tumor, hematogenous normal and tumor, metastatic and solid tissue normal (Weinstein et al., 2013).

# Methods

## Access to the data

In [5]:
query.exp <- GDCquery(project = "TCGA-BLCA",
                      data.category = "Gene expression",
                      data.type = "Gene expression quantification",
                      platform = "Illumina HiSeq", 
                      file.type  = "results", 
                      #sample.type = "Primary solid Tumor",
                      legacy = TRUE)

--------------------------------------

o GDCquery: Searching in GDC database

--------------------------------------

Genome of reference: hg19

--------------------------------------------

oo Accessing GDC. This might take a while...

--------------------------------------------

ooo Project: TCGA-BLCA

--------------------

oo Filtering results

--------------------

ooo By platform

ooo By data.type

ooo By file.type

----------------

oo Checking data

----------------

ooo Check if there are duplicated cases

ooo Check if there results for the query

-------------------

o Preparing output

-------------------



In [6]:
GDCdownload(query.exp)

Downloading data for project TCGA-BLCA

GDCdownload will download 433 files. A total of 653.38212 MB

Downloading as: Sun_Sep_20_22_39_54_2020.tar.gz



Downloading: 260 MB       

In [7]:
exp <- GDCprepare(query.exp, save = FALSE)

--------------------

oo Reading 433 files

--------------------





--------------------

oo Merging 433 files

--------------------

Accessing grch37.ensembl.org to get gene information

Downloading genome information (try:0) Using: Human genes (GRCh37.p13)

“`select_()` is deprecated as of dplyr 0.7.0.
Please use `select()` instead.
“`filter_()` is deprecated as of dplyr 0.7.0.
Please use `filter()` instead.
See vignette('programming') for more help
Cache found

Starting to add information to samples

 => Add clinical information to samples

 => Adding TCGA molecular information from marker papers

 => Information will have prefix 'paper_' 

blca subtype information from:doi:10.1016/j.cell.2017.09.007



In [8]:
exp

class: RangedSummarizedExperiment 
dim: 19947 433 
metadata(1): data_release
assays(2): raw_count scaled_estimate
rownames(19947): A1BG|1 A2M|2 ... TMED7-TICAM2|100302736
  LOC100303728|100303728
rowData names(4): gene_id entrezgene ensembl_gene_id
  transcript_id.transcript_id_TCGA-BL-A13J-01B-04R-A277-07
colnames(433): TCGA-BL-A13J-01B-04R-A277-07
  TCGA-KQ-A41N-01A-11R-A33J-07 ... TCGA-4Z-AA7Q-01A-11R-A39I-07
  TCGA-4Z-AA89-01A-11R-A39I-07
colData names(239): barcode patient ... paper_Fusion in TNFRSF21
  paper_Fusion in ASIP