# Hierarchical Clustering in GenePattern Notebook

Cluster genes and/or samples based on how close they are to one another. The result is a tree structure, referred to as dendrogram.

## Before you begin

* Sign in to GenePattern by entering your username and password into the form below.
* Gene expression data must be in a [GCT or RES file](https://genepattern.broadinstitute.org/gp/pages/protocols/GctResFiles.html) - we have provided files in the correct format.
    * Example file: [all_aml_test.gct](https://software.broadinstitute.org/cancer/software/genepattern/data/all_aml/all_aml_test.gct).
* Learn more by reading about [file formats](http://www.broadinstitute.org/cancer/software/genepattern/file-formats-guide#GCT).


In [7]:
# Requires GenePattern Notebook: pip install genepattern-notebook
import gp
import genepattern

# Username and password removed for security reasons.
genepattern.GPAuthWidget(genepattern.register_session("https://genepattern.broadinstitute.org/gp", "", ""))

## Step 1: HierarchicalClustering

Run hierarchical clustering on genes and/or samples to create dendrograms for the clustered genes (*.gtr) and/or clustered samples (*.atr), as well as a file (*.cdt) that contains the original gene expression data ordered to reflect the clustering.

### Considerations
* Best practice is to normalize (row/column normalize parameters) and center (row/column center parameters) the data being clustered. 
* The CDT output file must be converted to a GCT file before it can be used as an input file for another GenePattern module (other than HierachicalClusteringViewer). For instructions on converting a CDT file to a GCT file, see [Creating Input Files](http://www.broadinstitute.org/cancer/software/genepattern/file-formats-guide#creating-input-files).
* Learn more by reading about the [HierarchicalClustering](https://genepattern.broadinstitute.org/gp/getTaskDoc.jsp?name=HierarchicalClustering) module.

<div class="alert alert-info">
<h3>Instructions</h3>

<ol>
<li>For the <strong><em>input.filename</em></strong> parameter, click and drag <a href="https://datasets.genepattern.org/data/ccmi_tutorial/2017-12-15/BRCA_HUGO_symbols.preprocessed.gct" target="_blank">BRCA_HUGO_symbols.preprocessed.gct</a> into the <em>&quot;Enter Path or URL&quot; </em>text box</li>
<li>Click <strong><em>Run</em></strong>.</li>
</ol>

</div>


In [10]:
hierarchicalclustering_task = gp.GPTask(genepattern.get_session(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00009')
hierarchicalclustering_job_spec = hierarchicalclustering_task.make_job_spec()
hierarchicalclustering_job_spec.set_parameter("input.filename", "")
hierarchicalclustering_job_spec.set_parameter("column.distance.measure", "2")
hierarchicalclustering_job_spec.set_parameter("row.distance.measure", "0")
hierarchicalclustering_job_spec.set_parameter("clustering.method", "a")
hierarchicalclustering_job_spec.set_parameter("log.transform", "")
hierarchicalclustering_job_spec.set_parameter("row.center", "mean.row")
hierarchicalclustering_job_spec.set_parameter("row.normalize", "")
hierarchicalclustering_job_spec.set_parameter("column.center", "mean.column")
hierarchicalclustering_job_spec.set_parameter("column.normalize", "")
hierarchicalclustering_job_spec.set_parameter("output.base.name", "<input.filename_basename>")
genepattern.GPTaskWidget(hierarchicalclustering_task)

## Step 2: HierarchicalClusteringViewer

Display a heat map of the clustered gene expression data, with dendrograms showing how the genes and/or samples were clustered.

### Considerations

* Select File > Save Image to save the heat map and dendrograms to an image file. Supported formats include bmp, eps, jpeg, png, and tiff. 
* Learn more by reading about the [HierarchicalClusteringViewer](https://genepattern.broadinstitute.org/gp/getTaskDoc.jsp?name=HierarchicalClusteringViewer) module.

<div class="alert alert-info">
### Instructions
- For the **cdt file** parameter, click the down arrow in the file input box and choose the result of the HierarchicalClustering job.
- For the **atr file** parameter, click the down arrow in the file input box and choose the result of the HierarchicalClustering job.
- Click **Run**.

In [12]:
hierarchicalclusteringviewer_task = gp.GPTask(genepattern.get_session(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.visualizer:00031')
hierarchicalclusteringviewer_job_spec = hierarchicalclusteringviewer_task.make_job_spec()
hierarchicalclusteringviewer_job_spec.set_parameter("cdt.file", "")
hierarchicalclusteringviewer_job_spec.set_parameter("gtr.file", "")
hierarchicalclusteringviewer_job_spec.set_parameter("atr.file", "")
genepattern.GPTaskWidget(hierarchicalclusteringviewer_task)