# Single-sample GSEA projection (ssGSEA)

## Background

Traditional gene set enrichment analysis assesses the differential coordinate up- or down-regulation of a biological process or pathway between groups of samples belonging to two phenotypes. The ability to assess that enrichment in individual samples, especially independently of pre-assigned phenotype labels, provides the opportunity to analyze transcription data at a higher level, by using gene sets/pathways instead of genes, resulting in a much more biologically interpretable set of features. Single-sample Gene Set Enrichment Analysis (ssGSEA) Projection accomplishes this.

**ssGSEA projects a single sample’s gene expression profile from the space of single genes onto the space of gene sets**. It does this via the ssGSEA enrichment score, which represents the degree to which the genes in a particular gene set are coordinately up- or down- regulated within a sample.  

Any supervised or unsupervised machine learning technique or other statistical analysis can then be applied to the resulting projected dataset. The benefit is that the **ssGSEA projection transforms the data to a higher-level (pathways instead of genes) space representing a more biologically interpretable set of features on which analytic methods can be applied.**

Another benefit of ssGSEA projection is **dimensionality reduction**. Typically the number of gene sets employed in the enrichment analysis is substantially smaller than the number of genes targeted by a gene expression assay, and they are more robust and less noisy, resulting in significant benefits for downstream analysis.

## Before you begin

You must log in to a GenePattern server. In this notebook we will use **```GenePattern Cloud``` **

<div class="alert alert-info">
<ul><li>Sign in to GenePattern by entering your username and password into the form below. </li></ul>
</div>

In [4]:
# Requires GenePattern Notebook: pip install genepattern-notebook
import gp
import genepattern

# Username and password removed for security reasons.
genepattern.display(genepattern.session.register("https://cloud.genepattern.org/gp", "", ""))

GPAuthWidget()

## Project gene expression dataset into the space of oncogenic gene sets

We will use the GenePattern ssGSEAProjection analysis to transform the set of TCGA breast cancer samples into a dataset where each row corresponds to a pathway from the [MSigDB oncogenic gene sets collection](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C6), and each column is a sample. Each value in the new dataset will therefore represent the up- or downregulation of a pathway (row) within a sample (column).

<div class="alert alert-info">
<h3>Instructions</h3>

<ol>
<li>Insert a *GenePattern Analysis Cell* to run the ssGSEAProjection module.
<ol>
<li>Make sure this cell is selected by clicking once on it.</li>
<li>In the menu above, select `Insert`, then `Insert Cell Below`.</li>
<li>Turn that cell int a *GenePattern Analysis cell* (click on `Cell`, then select `Cell Type`, and select `GenePattern`).</li>
<li>In the search menu that pops up type `ssgsea` and select `ssGSEAProjection`.</li><br>
</ol>
</li>
<li>For the <strong><em>input gct file</em></strong> parameter, click and drag <a href="https://datasets.genepattern.org/data/ccmi_tutorial/2017-12-15/BRCA_HUGO_symbols.preprocessed.gct" target="_blank">BRCA_HUGO_symbols.preprocessed.gct</a> into the <em>&quot;Enter Path or URL&quot; </em>text box</li> 
<li>For the <strong><em>gene sets database files</em></strong> parameter, select <em>c6.all.v6.2.symbols.gmt [Oncogenic Signatures]</em>.</li>
<li>Click <strong><em>Run</em></strong> on the analysis below.</li>
</ol>

</div>

## Visualize projected pathways as a heat map

We will use the GenePattern heat map viewer to visualize the resulting projection of genes into pathways.

<div class="alert alert-info">
<h3>Instructions</h3>
1. Insert a *GenePattern Analysis Cell* to run the HeatMapViewer module.
<ol>
<li>Make sure this cell is selected by clicking once on it.</li>
<li>In the menu above, select `Insert`, then `Insert Cell Below`.</li>
<li>Turn that cell int a *GenePattern Analysis cell* (click on `Cell`, then select `Cell Type`, and select `GenePattern`).</li>
<li>In the search menu that pops up type `ssgsea` and select `HeatMapViewer`.</li><br>
</ol>
</li>
1. In the **dataset** parameter below, click on the dropdown and select `BRCA_HUGO_symbols.preprocessed.PROJ.gct`
1. Click **Run**.

In [6]:
heatmapviewer_task = gp.GPTask(genepattern.session.get(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.visualizer:00010')
heatmapviewer_job_spec = heatmapviewer_task.make_job_spec()
heatmapviewer_job_spec.set_parameter("dataset", "")
genepattern.display(heatmapviewer_task)

GPTaskWidget(lsid='urn:lsid:broad.mit.edu:cancer.software.genepattern.module.visualizer:00010')

## Project data onto hallmark pathways

[MSigDB Hallmark gene sets](http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=H) summarize and represent specific well-defined biological states or processes and display coherent expression. In this exercise you will project the expression dataset onto the hallmark gene set collection.

<div class="alert alert-info">
<h3>Instructions</h3>

1. Create a new ssGSEA cell
2. Populate it with the result dataset
3. Select the **h.all.v6.2.symbols.gmt [Hallmarks]** gene sets database file
4. Run the cell
5. Create a new HeatMapVisualizer cell and visualize the analysis results in it

**Hint**: if you need to re-run an analysis with some parameters changed, you can click on the gear icon in the job result panel (the panel with the title **Job ######** and select **Duplicate analysis**.
