<img src="https://storage.googleapis.com/terra-featured-workspaces/hail-tutorials/Intro_to_Jupyter_Notebooks/Intro_image1.png" alt="drawing" width="800"/>


Maybe you have heard of Jupyter notebooks and you're interested in a crash course in what they are and how to start using them to do interacftive analysis. If so, we're glad you're here! This notebook will help you understand     
* Why notebooks are used in biomedical research
* The relationship between the notebook and the workspace
* Jupyter Notebook basics: how to use a notebook, install packages, and import modules
* Common libraries in data analysis and popular tutorial notebooks



### Notebooks - The Future of Biomedical Research

- **Notebooks make it easy to record and reproduce data analysis steps**

Insights in biomedical research require data analysis, but complex analysis is hard to document, share and reproduce. Notebooks enable researchers to quickly develop a rich scientific document that conducts an analysis, shows the results, and explains scientific context. Each code cell of a notebook executes commands to manipulate and explore your data. Code cells can be written in Python, R, or other languages already familiar to the researcher. It is straightforward to expand the functionality of the source code by installing pre-existing libraries, packages or modules of code in a variety of languages. Markdown cells contain formatted explanatory text, links, and images to compliment code cells. Better than "notes", Julyter notebooks mean you will never have questions you can't answer because you forgot your exact analysis steps from eight years ago.
 
The notebook's linear structure records each step you take in order. When shared, someone else can see how you manipulated the data and can execute the cells in order to reproduce your analysis.

- **They enable interactive analysis**

When you “run” a code cell, output displays right away in a new cell directly underneath the original cell. Working in a notebook, it is possible to run an analysis, observe the result, then change the parameters and re-run the analysis step by step, in real time. 
 
- **Notebooks extend the information content of published articles**

Today, researchers can lend detail about how they derived their results and make it easy for others to reproduce or replicate their analysis by publishing notebooks as an addendum to a traditional publication. Traditional scientific journals can only capture so much detail. Most of the critical data analytics process is under-the-hood, and missing from a summary section. Seeing and executing the actual code tells so much more. 

Further, when others query your notebook, they can poke around, and even build on your findings. They can easily access your methods and apply them to other populations.  
 
- **They make collaborating and sharing seamless**

Because Notebooks are easy to share, and self-contained, collaborating and sharing work in process and reults is a simple matter of sharing a workspace. 


### How do notebooks fit into a Terra workspace?

The workspace is the home base of your project, where you will find everything needed to do and share your analysis - data, Tools (aka “methods” or workflows), and notebooks. 

Within your workspace, you can use the cohort builder to create a cohort and interactively query data in the cloud using a notebook. Or you can go right to the notebook and create a cohort there. You can access the notebook later by saving it. If you share your workspace, others can access your notebook, too.

### What you need to know about the Terra Notebook Environment 

Understanding what notebooks **are** and what happens behind the scenes when Notebooks are created, opened and saved in a Terra workspace can save a lot of heartache, as well as help enhance your ability to do interactive analyses. These Terra Knowledge-Base articles are a great place to start.

* [Notebooks 101 - How not to lose data output files or collaborator edits](https://broadinstitute.zendesk.com/hc/en-us/articles/360027300571-Notebooks-101-How-not-to-lose-data-output-files-or-collaborator-edits)   
* [Key components (i.e. Billing Projects) in Terra's notebooks environment](https://broadinstitute.zendesk.com/hc/en-us/articles/360027237871-Terra-s-Jupyter-Notebooks-Environment-Part-I-Key-Components)   
* [Key operations and how they impact your work](https://broadinstitute.zendesk.com/hc/en-us/articles/360027083172-Terra-s-Jupyter-Notebooks-Environment-Part-II-Key-Operations)   

### Vocabulary

- **Cell** - the fundamental unit of a notebook, these rectangular boxes are where you execute code or markdown.
- **Dependencies** - all the additional modules, packages, and libraries required to execute the code blocks in your notebook.
- **Kernel** - the program that runs and introspects your code in the background. The type of kernel will indicate what [language](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels) you can write in the notebook's code cells. AoU supports Python and R kernels. 
- **Library** - a collection of functions and methods that allows you to perform several operations without writing your own code. A library could contain several modules.
- **Markdown** - a lightweight markup language with plain text formatting syntax used to write text in a cell.
- **Modules** - code that provides a piece of functionality. 

### Resources
- **How to use notebooks to do research** | [Interactive notebooks: Sharing the code](https://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261) | by Helen Shen, 5 Nov 2014
- **A history of Jupyter notebooks -- and how they improve scientific research today** | [The Scientific Paper Is Obsolete](https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/) | by James Somers, 5 Apr 2018

### Take the preinstalled User Interface Tour (< 3 minutes)
1. Go to Help -> User Interface Tour

<img src="https://storage.googleapis.com/terra-featured-workspaces/hail-tutorials/Intro_to_Jupyter_Notebooks/Intro_image2.png" alt="drawing" width="600"/>

### Exercise One - Executing a code cell

**Try these steps in the cell below:**
1. Click on the cell. 
   The cell will default to "Code" mode. To use "Markdown" mode, change the type in the dropdown under the "Code" (**Note** Heading and Raw NBConvert are advanced cell type options). 
2. Type in some text and see how the kernel interacts in each mode. Click the box icon (stop) to stop the operation.
3.  Type a piece of code, such as the below example

<img src="https://storage.googleapis.com/terra-featured-workspaces/hail-tutorials/Intro_to_Jupyter_Notebooks/Intro_image6.png" alt="drawing" width="600"/>

4. When you are ready to execute the cell, click the play button in the menu-bar or use the short-cut (Shift-Enter). There are several options for executing cells underneath **Cell** in the menubar.
5. The cell is done executing when **In [*]**  turns to **In [a number]**.

**Note about standard output:** When you execute a code cell, the kernel will often send a message with important information, including conflicts and warnings, as well as the actual command outputs. These are useful, but do not always demand action. If there is a problem executing the code, this will be clear when you read teh output. With time, you'll be able to skim these messages. Don't be alarmed if you get output in a pale red box, for example!  


### Exercise Two - Creating a "markdown" type cell


Try to answer the question "why are notebooks used in biomedical research" in the cell below by: 
1. Clicking on the cell
2. Changing the dropdown box in the menubar above from "Code" to "Markdown" (see screenshot below) 
3. Writing your answer in the cell

When you are done, click the "Run" button in the menubar or use the short-cut (Shift-Enter).

<img src="https://storage.googleapis.com/terra-featured-workspaces/hail-tutorials/Intro_to_Jupyter_Notebooks/Intro_image3.png" alt="drawing" width="600"/>


In [None]:
Type your answer here...

### What happens when you leave three hashmarks in front of your answer/ 

**What happens when you have two asterisks on either side?**

**Hint:** 
If you get an error that looks like this:
    
<img src="https://storage.googleapis.com/terra-featured-workspaces/hail-tutorials/Intro_to_Jupyter_Notebooks/Intro_image4.png" alt="drawing" width="600"/>

Make sure that you changed the cell type to **Markdown** before running it.


**To learn more about markdown code**, you can use a shortcut under the help tab.

<img src="https://storage.googleapis.com/terra-featured-workspaces/hail-tutorials/Intro_to_Jupyter_Notebooks/Intro_image6b.png" alt="drawing" width="600"/>



## Notebook Basics

### How to add or remove cells

To add a cell (three options):
* Click the + icon in the menubar
* Insert -> Choose Cell Above or Below your current cell
* press ESC A


To remove a cell (three options):
* Click the scissors icon in the menubar
* Edit -> Delete or Cut cells
* press ESC X (for cut) ESC d d (for delete)

**Try removing the cell below. Then try adding it back**

In [None]:
#Try removing and adding this cell.

### Keyboard shortcuts

 - Evaluate a cell: `SHIFT + ENTER` or `CTRL + ENTER` 
 - Return to navigation mode: `ESC` 
 - Turn a markdown cell into code: `y`
 - Turn a code cell into markdown: `m`
 - Add a new cell **above** the currently selected cell: `a`
 - Add a new cell **below** the currently selected cell: `b`
 - Delete the currently selected cell: `d, d` (repeated)
 - Activate code completion: `TAB`
 
To try this out, create a new cell below this one using `b`, and print `my_variable` by starting with `print(my` and pressing `TAB`!

### How to change the kernel (Python 2, Python 3, or R)

Click **Kernel** -> **Change Kernel** -> Choose the language/version you want

<img src="https://storage.googleapis.com/terra-featured-workspaces/hail-tutorials/Intro_to_Jupyter_Notebooks/Intro_image10.png" alt="drawing" width="600"/>

###  How to install a package 
`pip` is a package management system used to install and manage software packages written in Python. We use it here to install the AoU Python client library.
To install this package, try the steps below.

1. Copy and paste in the cell: 

```
!pip install --user --upgrade 'https://github.com/all-of-us/pyclient/archive/pyclient-v1-11.zip#egg=aou_workbench_client&subdirectory=py'
```

2. Execute the cell

An asterisk to the left of the code cell indicates that the command is executing.

<img src="https://storage.googleapis.com/terra-featured-workspaces/hail-tutorials/Intro_to_Jupyter_Notebooks/Intro_image7.png" alt="drawing" width="300"/>

When the program runs the first time, you will see the cell turn pink. When complete, you will see a large code block of output text. This is the standard output, the same text you would see if you ran this command directly from the terminal. It often contains important information such as warnings, so it can be useful to skim through.

<img src="https://storage.googleapis.com/terra-featured-workspaces/hail-tutorials/Intro_to_Jupyter_Notebooks/Intro_image11.png" alt="drawing" width="800"/>

**TIP** If you don’t need to print the output for review, you can “capture” it by typing this command before the code:

```
%%capture

```

Try adding this line before the command in the block above, and rerunning the cell. 

**Note**: One cell can contain multiple commands.

### Exercise Three: Practice importing packages.

To reduce the time it takes to program commands from scratch, you can access discipline-specific classes, objects, or functions by importing modules. 

Some popular libraries for data analysis are listed below. For more information, clicking “Help” in the menubar will direct you to online resources.


**Note on syntax** `as` defines an alias, which you can use to call the library instead of typing out the full name in every cell.

Running ```!pip list``` will give information on pre-installed packages. The `!` in front of pip indicates that the command is runing external to python, directly on the command line. In the python kernel, code is expected to match the kernel setting, so you will need to add an extra character to indicate that it is being run as if it is a command on the command line, or even another kernel (such as R).


In [None]:
!pip list

When running the code as a direct python command, no additional `!` is needed in front of the command.

If there is no output to the command, it will just run and put a number in the "[ ]" to the left.

In [None]:
# Scientific computing in python
import numpy as np

In [None]:
# Data visualizations
import matplotlib as mpl

In [None]:
# Data analysis tools and data structures like the DataFrame
import pandas as pd

In [None]:
# Statistical data visualization, site: https://seaborn.pydata.org/
import seaborn as sb

In [None]:
# Exploring and analyzing genomic data, site: https://hail.is/index.html
import hail as hl

A quick way to check if a program has been activated in python is to run a command that shows all the possible command attributes.  Try `dir(pd)` or `dir(pd)` in the cell block below.

**Question:  What happens if you try `dir(pandas)` instead?**

Solution:  You should see an error, because the program was imported under the alias `pd` and not the full name of the pakcage, `pandas'

<img src="https://storage.googleapis.com/terra-featured-workspaces/hail-tutorials/Intro_to_Jupyter_Notebooks/Intro_image12.png" alt="drawing" width="600"/>
    


## Interactive plotting in a Jupyter Notebook

Below is an example of a histogram you can generate in Python - if you have installed the package with definitions for these commands on your cluster. If you try to run the code cell below before running the cells that contain the "import" command, you will see an error message. 

If you wish to see the histogram output, make sure to first run the two cells above that import the modules 'numpy' and 'matplotlib.pyplot.' Then rerun the cell below to generate the histogram.


This plot uses a module inside matplotlibs called pyplot. [More information can be found here](https://matplotlib.org/api/pyplot_api.html).

In [None]:
#A hash sign in front of a line of code tells the program that this is a "comment"


# Import a specific module inside the matplotlib packages
import matplotlib.pyplot as plt

#Set a random number as a starting point or "seed"

np.random.seed(19680801)

#Set the middle point, mu, as 100, the distribution, sigma as 15 and add 10000 random points

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

#Generate a histogram using the program "hist" inside matplotlib

# the histogram of the data
n, bins, patches = plt.hist(x, 50, density=True, facecolor='g', alpha=0.75)


#The information below adds labels to your graph
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title('Histogram of IQ')
plt.text(60, .025, r'$\mu=100,\ \sigma=15$')
plt.axis([40, 160, 0, 0.03])
plt.grid(True)
plt.show()

Another example of a plot using matplotlib - a sigmoidal curve

In [None]:
#import matplotlib.pyplot as plt
#import numpy as np 

x = np.linspace(-10 , 10, 100)
y = np.sin(x) 
plt.plot(x, y, marker="x")
plt.show()

## Learn a New Topic with Notebooks

Before you check out some of the great additional resources below, let's try one last trick within the notebook environment: to learn about a command, you can use a question mark (?). 

To try it out, type ```?print``` in the cell below and run it. 

In [None]:
?print

An explanation should pop up at the bottom of the screen.

Note that this functionality is native to both Python and to R. A notebook will execute code based on the language chosen for the kernel, so you should only expect helpful information when the command you are enquiring about exists in the language you have set your kernel to.

For example, ```?print``` will work with either a Python kernel or an R kernel because the "print" command is native to both Python and R.

For a counter example, try running a cell with the following:

```
?log
```

This produces an error, because we are currently using a kernel set to the Python language and this is the R command to calculate logarithms. If you switch your kernel to R and re-run the cell, you should get the information in question.

**Reminder** use the folllowing to change the kernel:

<img src="https://storage.googleapis.com/terra-featured-workspaces/hail-tutorials/Intro_to_Jupyter_Notebooks/Intro_image10.png" alt="drawing" width="400"/>

## Other Tutorial Notebook Resources

Now that you know Jupyter Notebooks basics, check out the notebooks in the [Terra Notebooks Playgound](https://app.terra.bio/#workspaces/help-gatk/Terra%20Notebooks%20Playground), which contains a set of Jupyter Notebooks that allow users to play with this functionality. These include both R and Python Setup notebooks and template notebooks for accessing and analysing data. 

In addition to the wealth of useful notebooks in Terra Playground, there are lots of online resources for honing your notebook skills.  

### Learning Python
* [Python for Coders](http://watpy.ca/learn/introduction/Python%20for%20Coders.md) Part of an introduction to Python course from the Waterloo Python users group. You can download a Jupyter notebook that covers the material. 

* Code Academy's free [Python course](https://www.codecademy.com/learn/learn-python) (Not a notebook, but good!)

### Learning R
* [edX course](https://www.edx.org/learn/r-programming)

* [Interactive R practice](https://swirlstats.com/)

* [RStudio online training resources](https://www.rstudio.com/online-learning/)

### Running Python and R in the same notbook

[Interfacing R from a Python 3 Jupyter Notebook](https://www.linkedin.com/pulse/interfacing-r-from-python-3-jupyter-notebook-jared-stufft/)

### Learning anything
Check out this gallery of interesting Notebooks: https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks

### Reproducible Academic Publications
Here is a [list of academic publications](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks#reproducible-academic-publications) with links to their notebooks.

