# GenePattern Notebook Tutorial


## Introduction

### GenePattern and Jupyter Notebook

<a href="http://genepattern.org"><img src="https://notebook.genepattern.org/static/images/genepattern.png" width=50px style="float: left; margin: 5px;vertical-align:middle"></a><p style="vertical-align:middle">[**GenePattern**](https://genepattern.org) provides hundreds of analytical tools for the analysis of gene expression (RNA-seq and microarray), sequence variation and copy number, proteomic, flow cytometry, and network analysis. These tools are all available through a Web interface with no programming experience required.</p>
<br>
<a href="https://jupyter.org"><img src="https://notebook.genepattern.org/static/images/jupyter.png" width=50px style="float: left; margin: 5px;"></a><p>[**Jupyter Notebook**](https://jupyter.org) is a web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.</p>

### What is GenePattern Notebook?

The [**GenePattern Notebook**](http://genepattern-notebook.org) environment integrates GenePattern's analysis platform with the Jupyter Notebook system, allowing researchers to create documents that interleave formatted text, graphics and other multimedia, executable code, and GenePattern analyses, creating a single "research narrative" that puts scientific discussion and analyses in the same place. This tutorial will familiarize you with some of its most important features.

<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>
All instructions for you to follow will appear in a blue panel like this one.
</div>

### GenePattern Notebook Introduction Video

<p>Below is a brief video introduction to the GenePattern Notebook Environment. This video introduces many of the basic concepts and features provided by the tool. If you would prefer a more &quot;hands on&quot; introduction, scroll down and follow the subsequent interactive tutorial.</p>



<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>
1. Press the Run button to load the video<br>
2. Press the Play button in the middle of the video to start watching
</div>

In [3]:
from IPython.display import IFrame
import genepattern

def youtube():
    display(IFrame('https://www.youtube.com/embed/8npzyGLpUHU', width="100%", height="480"))

genepattern.GPUIBuilder(
    youtube,
    name='GenePattern Notebook Introduction Video',
    description='Press Run to load the video',
    parameters={
        'output_var': {
            'hide': True
        }
    })

UIBuilder(description='Press Run to load the video', function_import='nbtools.tool(id="GenePattern Notebook In…

## Basic Features

These are the most commonly used features in the GenePattern Notebook environment. They also form the building blocks for most advanced use cases.

### Notebook Cells

All notebooks consist of some number of cells. These cells may contain text, images, tables, code or interactive widgets. Try clicking different sections of this notebook and will you notice the different cells as they are selected.

#### Inserting cells

New cells can be inserted by clicking the Insert Cell button ( <i class="fa-plus fa"></i> ) on the toolbar or by using the Insert menu at the top of the screen.

<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>
1. Select this cell by clicking to the left of the text.<br>
2. Click the Insert Cell button ( <i class="fa-plus fa"></i> ) to add a new cell below this one.<br>
</div>

#### Executing cells

Cells can be executed by clicking the Run Cell button ( <i class="fa-step-forward fa"></i> ) on the toolbar or using the menu at the top of the screen to select *Cell > Run Cells*. Depending on the type of cell, when a cell executes it will run any code contained in the cell or render any HTML/markdown as formatted text.

<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>
1. Select this cell by clicking to the left of the text.<br>
2. Create a new cell below this one and type or paste <code> 3 + 7 </code> into the input box. <br>
3. Click the Run Cell button or type Shift+Enter to execute the cell.
</div>

#### Removing cells

Cells are removed by clicking the Cut Cell button ( <i class="fa-scissors fa"></i> ) on the toolbar or using the menu at the top of the screen to select *Edit > Cut Cells*

<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>
1. Select the cell below containing "THIS IS THE CELL TO BE REMOVED"<br>
2. Click on the Cut button ( <i class="fa-scissors fa"></i> ) to remove the cell
</div>

<div class="alert alert-danger">

**THIS IS THE CELL TO BE REMOVED**

</div>

#### Changing cell type

Every cell has a type. Cell types include code cells, markdown cells and GenePattern cells. Code cells contain code that can be executed. Markdown cells contain either markdown or HTML that is rendered when the cell is executed. GenePattern cells contain interactive widgets that allow you to access GenePattern's many analyses.

To change cell type, select a cell and then use the dropdown menu on the toolbar above. Alternatively, you can use the menu at the top of the page to select *Cell > Cell Type*.

<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>  
    
1. Select the cell below.
2. Change the cell type from Markdown to Code by selecting `Cell > Cell Type > Code` on the menu bar.
</div>

### Change the type of this cell
print("This cell is meant to be of code type")

### GenePattern Cells

The GenePattern Notebook environment provides a number of graphical widgets that make performing analyses easy, even for non-programming users. These widgets take the form of GenePattern Cells that allow a user to prepare analyses, launch jobs and visualize results.

To insert a GenePattern Cell, insert a new cell, then select the cell and then change the cell type to GenePattern either by using the Cell > Cell Type > GenePattern menu or by going to the dropdown menu in the notebook toolbar and selecting GenePattern from the list of options.

<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>
1. Insert a new cell.<br/>
2. Change the cell type to GenePattern. <br/>
3. You should now have a new cell that looks exactly like the one shown below.
</div>

<b>Below is an example GenePattern authentication cell. The cell you have just created above should look identical.</b>

In [1]:
# Requires GenePattern Notebook: pip install genepattern-notebook
import gp
import genepattern

# Username and password removed for security reasons.
genepattern.display(genepattern.session.register("https://cloud.genepattern.org/gp", "", ""))

GPAuthWidget()

### Authentication Cells

The first GenePattern cell that you have encountered is an Authentication Cell. This cell allows a user to sign into a GenePattern server. Doing this allows GenePattern to keep a user's results private, and to remember a user's settings.

Authentication cells look like a login form with the additional option of selecting which GenePattern server to sign into. If the user has already authenticated, such as when usng the GenePattern Notebook Repository, the user will instead be prompted to either sign in as the current user or to cancel and sign in as a different user

<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>  
    
- Sign into the GenePattern server by clicking `Login as [username]` in the authentication cell above.<br/>
- Alternatively: fill in your credentials on the authentication cell above and click `Log into GenePattern`.
</div>

### Analysis Cells

After signing into an Authentication Cell, users can now access GenePattern Analysis Cells. Clicking the <em><i class="fa-th fa"></i> Tools</em> button in the toolbar will display the list of available GenePattern analyses. Search for a desired analysis and simply press on it to add to your current noteboook.

<div>
<img src="attachment:gpnb-tutorial-tools.png" width='500px'>
</div>

Every Analysis Cell contains locations for the required parameters of each analysis. Pressing Run will upload the files as a job on the GenePattern server. The status of the job in GenePattern’s queue with be displayed below. Upon completion, the cell will show a list of outputs, which can be displayed in the browser, downloaded, or sent as input to another GenePattern analysis. Outputs are indicated by the <i class="fa fa-info-circle" style="color: rgb(128, 128, 128);"></i> icon. If this analysis includes visualization, the visualization will load and appear inside as well.

For more information about GenePattern analysis modules, see the [**GenePattern Documentation**](https://genepattern.org/concepts#_Analysis_and_Visualization)

<b>Below is an example on using a GenePattern analysis cell for preprocessing:</b>


<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>
    
1. Highlight this cell and then click the <i><i class="fa-th fa"></i> Tools</i> button in the toolbar.
2. Scroll through the list or use the search box in the upper right to find the <i>PreprocessDataset</i> module. This module is used to preprocess data ahead of further analysis.
    1. If you cannot find the module or see an error message stating "You must be authenticated...," you likely are not yet logged into the GenePattern server. Scroll up and sign into the GenePattern authentication cell or select the "GenePattern Login" module in the <i><i class="fa-th fa"></i> Tools</i> menu 
3. Click the listing for <i>PreprocessDataset</i>. This should insert a new analysis cell below, representing the preprocessing analysis.
4. Drag and Drop or Copy and Paste the expression file given below into the <code>input filename</code> parameter 
    1. <a style="display: block;" href="https://datasets.genepattern.org/data/all_aml/all_aml_test.gct">https://datasets.genepattern.org/data/all_aml/all_aml_test.gct</a> (This file contains gene expression data comparing ALL samples with AML samples)
    2. *Note:* Files can also be uploaded locally through the "*Upload File...*" button
5. The default parameters are sufficient for this file. You can find more information about the analysis module by going to the Gear menu ( <span class="fa fa-cog"></span> ) and selecting *Documentation*.
6. Click the Run button to begin the analysis. The job status and output will display below.
</div>

In [2]:
preprocessdataset_task = gp.GPTask(genepattern.session.get(0), 'urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00020')
preprocessdataset_job_spec = preprocessdataset_task.make_job_spec()
preprocessdataset_job_spec.set_parameter("input.filename", "https://datasets.genepattern.org/data/all_aml/all_aml_test.gct")
preprocessdataset_job_spec.set_parameter("threshold.and.filter", "1")
preprocessdataset_job_spec.set_parameter("floor", "20")
preprocessdataset_job_spec.set_parameter("ceiling", "20000")
preprocessdataset_job_spec.set_parameter("min.fold.change", "3")
preprocessdataset_job_spec.set_parameter("min.delta", "100")
preprocessdataset_job_spec.set_parameter("num.outliers.to.exclude", "0")
preprocessdataset_job_spec.set_parameter("row.normalization", "0")
preprocessdataset_job_spec.set_parameter("row.sampling.rate", "1")
preprocessdataset_job_spec.set_parameter("threshold.for.removing.rows", "")
preprocessdataset_job_spec.set_parameter("number.of.columns.above.threshold", "")
preprocessdataset_job_spec.set_parameter("log2.transform", "0")
preprocessdataset_job_spec.set_parameter("output.file.format", "3")
preprocessdataset_job_spec.set_parameter("output.file", "<input.filename_basename>.preprocessed")
preprocessdataset_job_spec.set_parameter("job.memory", "2 Gb")
preprocessdataset_job_spec.set_parameter("job.queue", "gpbeta-default")
preprocessdataset_job_spec.set_parameter("job.cpuCount", "1")
preprocessdataset_job_spec.set_parameter("job.walltime", "02:00:00")
genepattern.display(preprocessdataset_task)


job137230 = gp.GPJob(genepattern.session.get(0), 137230)
genepattern.display(job137230)

GPTaskWidget(lsid='urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00020')

GPJobWidget(job_number=137230)

### Markdown Cells

Markdown cells are another cell type that allows authors to take notes, document methods or embed images. Markdown cells format text using either **<a href="https://developer.mozilla.org/en-US/docs/Web/HTML" target="_blank">HTML</a>** or **<a href="https://daringfireball.net/projects/markdown/syntax" target="_blank">Markdown</a>** syntax. 

To insert a markdown cell, first insert a new cell, either through the *Insert > Insert Cell Below* menu or by clicking the ( <i class="fa-plus fa"></i> ) button in the notebook toolbar. Once a new cell has been inserted, you can select the cell and then change the cell type to Markdown either by using the *Cell > Cell Type > Markdown* menu or by going to the dropdown menu in the notebook toolbar and selecting Markdown from the list of options.

Markdown cells have **two modes**: editing mode and display mode. Double-click a rendered cell to enter editing mode and run the edited cell to enter display mode

<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>
    
1. Insert a new cell below by clicking the ( <i class="fa-plus fa"></i> ) button in the notebook toolbar
2. Use the dropdown menu in the toolbar to change the cell type to *Markdown*
3. Type some text and flip between Editing mode and Display mode 
</div>

### Rich Text Editor

Additionally, we provide a "What You See is What You Get" Rich Text Editor allowing users to format notes and documentation much in the same way that one might use Microsoft Word or Libre Office.

To use the Rich Text Editor, first insert a markdown cell (see the instructions above). Press the <i class="fa fa-file-text-o"></i> button to open the Rich Text Editor and the <i class="fa-step-forward fa"></i> button to run the cell and render the text.

<img src="https://notebook.genepattern.org/static/images/wysiwyg.jpg" width="70%">

<div class="alert alert-info">
<p class="lead"> Instructions <i class="fa fa-info-circle"></i></p>
<ol>
<li>Highlight the markdown cell you created above and click the <i class="fa fa-file-text-o"></i> button on the left side of the cell to activate the Rich Text Editor. If you don't see the button, make sure the cell is in editing mode by double-clicking the cell.</li>
<li>Edit the text of the cell and then click the ( <i class="fa-step-forward fa"></i> ) button to display the rendered text.</li>
</ol>
</div>

For additional information on the unique formatting features of GenePattern Notebook see the [**Formatting Tutorial Notebook**](https://notebook.genepattern.org/services/sharing/notebooks/363/preview/)

## Programmatic Features

In addition to the basic and publishing features intended for use by both non-programming and programming users alike, the GenePattern Notebook environment also provides a variety of features intended primarily for use by coders.

#### UI Builder
- The UI Builder is a way to display any Python function or method call as an interactive widget. This will render the parameters of the function as a web form. [**Learn More**](https://gpnotebook-website-docs.readthedocs.io/en/latest/programmatic/#ui-builder)

#### Python Variable Input
   - As part of the seamless integration between Python and GenePattern, Python variables may be directly used as input in GenePattern Analysis Cells. [**Learn More**](https://gpnotebook-website-docs.readthedocs.io/en/latest/programmatic/#5-python-variable-input)

#### Send to Dataframe
- The GenePattern Python Library also provides functionality for common GenePattern file formats, allowing them seamlessly integrate with **<a href="http://pandas.pydata.org/">Pandas</a>**, a popular Python data analysis library. [**Learn More**](https://gpnotebook-website-docs.readthedocs.io/en/latest/programmatic/#4-send-to-dataframe)

#### Send to Code
   - The GenePattern Python Library seamlessly integrates with GenePattern cells. Code examples of how to reference GenePattern jobs or GenePattern result files are available in GenePattern Job Cells by clicking a job result and selecting “Send to Code” in the menu. [**Learn More**](https://gpnotebook-website-docs.readthedocs.io/en/latest/programmatic/#3-send-to-code)

For more information see the [**Programmatic Features Tutorial**](https://notebook.genepattern.org/services/sharing/notebooks/362/preview/) or [**Programmatic Features Documentation**](https://gpnotebook-website-docs.readthedocs.io/en/latest/programmatic/)

## Additional Resources

### Other Tutorial Notebooks
- [**Programmatic Features**](https://notebook.genepattern.org/services/sharing/notebooks/362/preview/)
- [**GenePattern Python Tutorial**](https://notebook.genepattern.org/services/sharing/notebooks/19/preview/)
- [**GenePattern and Pandas**](https://notebook.genepattern.org/services/sharing/notebooks/67/preview/)
- [**UI Builder Tutorial**](https://notebook.genepattern.org/services/sharing/notebooks/294/preview/) 
- [**Introduction to Text and Document Formatting**](https://notebook.genepattern.org/services/sharing/notebooks/363/preview/)

### Ecosystem
- [**GenePattern Notebook Website**](https://notebook.genepattern.org)
- [**GenePattern Notebook Documentation**](https://gpnotebook-website-docs.readthedocs.io/en/latest/)
- [**GenePattern Website**](http://genepattern.org)
- [**Jupyter Project**](http://jupyter.org/)
- [**GenePattern Notebook GitHub Repository**](https://github.com/genepattern/genepattern-notebook)
- [**GenePattern Module Archive (GParc)**](http://www.gparc.org/)