# Differential Expression of RNA-Seq data in GenePattern Notebook

Compute differentially expressed genes or transcripts and visualize the results

## Before you begin

* Sign in to GenePattern by entering your username and password into the form below. 
* Gene expression data must be in a GCT file.
* Learn more by reading about [file formats](http://www.broadinstitute.org/cancer/software/genepattern/file-formats-guide#GCT).


In [6]:
# Requires GenePattern Notebook: pip install genepattern-notebook
import gp
import genepattern

# Username and password removed for security reasons.
genepattern.GPAuthWidget(genepattern.register_session("https://gp-beta-ami.genepattern.org/gp", "", ""))

In the previous notebook, you ran MergeHTSeq to create a single file with 40 samples of read count data. Because this file has already been created, you do not need to run MergeHTSeqCounts here, but it is included as part of a complete end-to-end analysis workflow.

<div class="alert alert-info">
**To continue, go to [DESeq2](#deseq2)**

## Merge read counts from RNA-seq quantitation methods
- RNA-seq quantitation methods create tab-delimited files consisting of transcript identifiers, e.g., ENSG00000000003, and their read counts.
- The MergeHTSeqCounts module combines these files into a single [GCT](http://software.broadinstitute.org/cancer/software/genepattern/file-formats-guide#GCT) format for later analysis in GenePattern.

In [7]:
mergehtseqcounts_task = gp.GPTask(genepattern.get_session(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00354')
mergehtseqcounts_job_spec = mergehtseqcounts_task.make_job_spec()
mergehtseqcounts_job_spec.set_parameter("input.files", "")
mergehtseqcounts_job_spec.set_parameter("output.prefix", "<input.files_basename>")
mergehtseqcounts_job_spec.set_parameter("sampleinfo.file", "")
mergehtseqcounts_job_spec.set_parameter("filenames.column", "0")
mergehtseqcounts_job_spec.set_parameter("class.division.column", "1")
mergehtseqcounts_job_spec.set_parameter("sample.name.column", "")
genepattern.GPTaskWidget(mergehtseqcounts_task)

 <a href id="deseq2"></a>
## Compute differentially expressed transcripts

This module uses the [DESeq2](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4302049/) method to find significantly differentially expressed transcripts.

<div class="alert alert-info">
- Drag [BRCA_unversioned_ensembl_ids.collapsed.gct](https://datasets.genepattern.org/data/TCGA_BRCA/DP_3_1_BRCA_unversioned_ensembl_ids.collapsed.filtered.gct) to the **input file** field below.
- Drag [BRCA_labels.cls](https://datasets.genepattern.org/data/TCGA_BRCA/WP_1_workshop_BRCA_labels.cls) to the **cls file** field.
- Click **Run**.

In [8]:
deseq2_task = gp.GPTask(genepattern.get_session(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00362')
deseq2_job_spec = deseq2_task.make_job_spec()
deseq2_job_spec.set_parameter("input.file", "")
deseq2_job_spec.set_parameter("cls.file", "")
deseq2_job_spec.set_parameter("confounding.variable.cls.file", "")
deseq2_job_spec.set_parameter("output.file.base", "<input.file_basename>")
deseq2_job_spec.set_parameter("qc.plot.format", "skip")
deseq2_job_spec.set_parameter("fdr.threshold", "0.1")
deseq2_job_spec.set_parameter("top.N.count", "20")
deseq2_job_spec.set_parameter("random.seed", "779948241")
genepattern.GPTaskWidget(deseq2_task)

The output of DESeq2 is in a text-based, tab-delimited format. Before visualizing it using the GenePattern ComparativeMarkerSelectionViewer, we must convert it to the ODF file format that that viewer accepts.

<div class="alert alert-info">
- Click the DESeq2_results_report file (`<filename>.DESeq2_results_report.txt`)
- Choose **Send to Existing GenePattern Cell**
- Select **txt2odf**
- Set the *prune gct* parameter to `True`
- Drag [BRCA_with_versioned_ensemble_ids_normalized.gct](https://datasets.genepattern.org/data/TCGA_BRCA/DP_3_BRCA_unversioned_ensembl_ids.collapsed.gct) to the **gct** field below.
- Drag [BRCA_labels.cls](https://datasets.genepattern.org/data/TCGA_BRCA/WP_1_workshop_BRCA_labels.cls) to the **cls file** field.
- Click **Run**.

In [9]:
txt2odf_task = gp.GPTask(genepattern.get_session(0), 'urn:lsid:8080.gpserver.ip-172-31-26-71.ip-172-31-26-71.ec2.internal:genepatternmodules:23')
txt2odf_job_spec = txt2odf_task.make_job_spec()
txt2odf_job_spec.set_parameter("txt_file", "")
txt2odf_job_spec.set_parameter("prune_gct", "False")
txt2odf_job_spec.set_parameter("gct", "")
txt2odf_job_spec.set_parameter("cls", "")
genepattern.GPTaskWidget(txt2odf_task)

# Visualizing Differential Expression Results

The ComparativeMarkerSelectionViewer allows you to view the results of a differential expression analysis as a heatmap, profile of differentially expressed genes, histogram, or list. It also includes features that allow you to filter results, zoom in and out of a section of the gene list, and export results in a number of formats.

Run the ComparativeMarkerSelectionViewer module to view the results. The viewer displays the test statistic score, its p value, and additional statistics as computed by the differential expression method.

* Learn more by reading about the [ComparativeMarkerSelectionViewer](https://gp-beta-ami.genepattern.org/gp/getTaskDoc.jsp?name=ComparativeMarkerSelectionViewer) module.

<div class="alert alert-info">
- In the **comparative marker selection filename** parameter, click the triangle in the file input box.
- Select the txt2odf result file as the input.
- In the **dataset filename** parameter, click the triangle in the file input box.
- Select the txt2odf result file as the input.
- Click **Run**.

Once the job downloads the necessary data it will display a visualization of the differential expression results.

In [10]:
comparativemarkerselectionviewer_task = gp.GPTask(genepattern.get_session(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.visualizer:00045')
comparativemarkerselectionviewer_job_spec = comparativemarkerselectionviewer_task.make_job_spec()
comparativemarkerselectionviewer_job_spec.set_parameter("comparative.marker.selection.filename", "")
comparativemarkerselectionviewer_job_spec.set_parameter("dataset.filename", "")
genepattern.GPTaskWidget(comparativemarkerselectionviewer_task)