# Single-sample GSEA projection (ssGSEA)

## Background

Traditional gene set enrichment analysis assesses the differential coordinate up- or down-regulation of a biological process or pathway between groups of samples belonging to two phenotypes. The ability to assess that enrichment in individual samples, especially independently of pre-assigned phenotype labels, provides the opportunity to analyze transcription data at a higher level, by using gene sets/pathways instead of genes, resulting in a much more biologically interpretable set of features. Single-sample Gene Set Enrichment Analysis (ssGSEA) Projection accomplishes this.

**ssGSEA projects a single sample’s gene expression profile from the space of single genes onto the space of gene sets**. It does this via the ssGSEA enrichment score, which represents the degree to which the genes in a particular gene set are coordinately up- or down- regulated within a sample.  

Any supervised or unsupervised machine learning technique or other statistical analysis can then be applied to the resulting projected dataset. The benefit is that the **ssGSEA projection transforms the data to a higher-level (pathways instead of genes) space representing a more biologically interpretable set of features on which analytic methods can be applied.**

Another benefit of ssGSEA projection is **dimensionality reduction**. Typically the number of gene sets employed in the enrichment analysis is substantially smaller than the number of genes targeted by a gene expression assay, and they are more robust and less noisy, resulting in significant benefits for downstream analysis.

## Before you begin

You must log in to a GenePattern server. In this notebook we will use **```GenePattern Cloud``` **

Note: if you are not familiar with GenePattern Notebook features, you can revise them here: <a href="https://notebook.genepattern.org/services/sharing/notebooks/361/preview/">GenePattern Notebook Tutorial</a>.

<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>
Sign in to GenePattern by entering your username and password into the form below.
</div>

In [6]:
# Requires GenePattern Notebook: pip install genepattern-notebook
import gp
import genepattern

# Username and password removed for security reasons.
genepattern.display(genepattern.session.register("https://cloud.genepattern.org/gp", "", ""))

GPAuthWidget()

## Project gene expression dataset into the space of oncogenic gene sets

We will use the GenePattern ssGSEA analysis to transform a gene expression dataset into a dataset where each row corresponds to a pathway from the [MSigDB oncogenic gene sets collection](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C6), and each column is a sample. Each value in the new dataset will therefore represent the up- or downregulation of a pathway (row) within a sample (column).

<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>

Provide the required parameters for the ssGSEA module below.

- For the **input gct file*** parameter, Provide a file in the [GCT format](https://genepattern.org/file-formats-guide#GCT')  
     - For example: <a href="https://datasets.genepattern.org/data/ccmi_tutorial/2017-12-15/BRCA_HUGO_symbols.preprocessed.gct" target="_blank">BRCA_HUGO_symbols.preprocessed.gct</a>
- For a detailed description of the parameters you can read the <a href='https://gsea-msigdb.github.io/ssGSEA-gpmodule/v10/index.html'>parameter documentation</a>.
- For a description of the <strong><em>gene sets database files*</em></strong> parameter options, visit <a href="https://www.gsea-msigdb.org/gsea/msigdb/index.jsp">the MSigDB webpage</a>.
- Click <strong><em>Run</em></strong> on the analysis below.</li>
</div>

In [7]:
ssgsea_task = gp.GPTask(genepattern.session.get(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00270')
ssgsea_job_spec = ssgsea_task.make_job_spec()
ssgsea_job_spec.set_parameter("input.gct.file", "")
ssgsea_job_spec.set_parameter("output.file.prefix", "")
ssgsea_job_spec.set_parameter("gene.sets.database.files", "")
ssgsea_job_spec.set_parameter("gene.symbol.column", "Name")
ssgsea_job_spec.set_parameter("gene.set.selection", "ALL")
ssgsea_job_spec.set_parameter("sample.normalization.method", "none")
ssgsea_job_spec.set_parameter("weighting.exponent", "0.75")
ssgsea_job_spec.set_parameter("min.gene.set.size", "10")
ssgsea_job_spec.set_parameter("combine.mode", "combine.add")
ssgsea_job_spec.set_parameter("job.memory", "2 Gb")
ssgsea_job_spec.set_parameter("job.queue", "gp-cloud-default")
ssgsea_job_spec.set_parameter("job.cpuCount", "1")
ssgsea_job_spec.set_parameter("job.walltime", "02:00:00")
genepattern.display(ssgsea_task)


GPTaskWidget(lsid='urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00270')

## Visualize projected pathways as a heat map

We will use the GenePattern heat map viewer to visualize the resulting projection of genes into pathways.

<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>
    
- In the **dataset** parameter below, click on the dropdown and select output of the ssGSEA module (it should end with `.PROJ.gct`).
- Click **Run**.
</div>

In [8]:
heatmapviewer_task = gp.GPTask(genepattern.session.get(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.visualizer:00010')
heatmapviewer_job_spec = heatmapviewer_task.make_job_spec()
heatmapviewer_job_spec.set_parameter("dataset", "")
genepattern.display(heatmapviewer_task)

GPTaskWidget(lsid='urn:lsid:broad.mit.edu:cancer.software.genepattern.module.visualizer:00010')

# References

- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS. 2005;102(43):15545-15550. http://www.pnas.org/content/102/43/15545.abstract
- Barbie DA, Tamayo P, et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature. 2009;462:108-112. https://pubmed.ncbi.nlm.nih.gov/19847166/
- MSigDB website (https://www.gsea-msigdb.org/gsea/msigdb/index.jsp)
- GSEA website (https://www.gsea-msigdb.org/gsea/index.jsp)