# Getting Started with Jupyter Notebooks
## Reproducible Research Workshop
## [dataservices.library.jhu.edu](https://dataservices.library.jhu.edu/)
### JHU Data Services: Marley Kalt
### Date: February 15, 2022

## If you have not already installed Anaconda, please do so now!    

Get it here: [https://www.anaconda.com/products/individual](https://www.anaconda.com/products/individual)  
Click "Download" or scroll down to "Anaconda Installers"

## This workshop will not be recorded  

You will receive all workshop materials by tomorrow afternoon.

<center><img src='./Images/DataServicesAbout.png'></center>

***

## Today, you will learn:

- What interactive notebooks are, including Jupyter Notebooks

- How interactive notebooks are useful for reproducible research

- The basics of running code and writing markdown-formatted text in a Jupyter Notebook

## You will not learn:

- How to write Python code or use specific Python libraries

## Imagine you want to know how your colleague has cleaned their data.

## Which will be easier to understand?   

<table><tr>
<td>A<img src="./Images/CodeExample1.png" alt="Code sample in a text editor" style="width: 460px;"/> </td>
<td>B<img src="./Images/NotebookExample1.png" alt="Code sample in a Jupyter notebook" style="width: 500px;"/> </td>
</tr></table>   

Source: [Data Cleaning using Python with Pandas Library](https://nbviewer.jupyter.org/github/Tanu-N-Prabhu/Python/blob/master/Data_Cleaning/Data_Cleaning_using_Python_with_Pandas_Library.ipynb) by [Tanu Nanda Prabhu](https://github.com/Tanu-N-Prabhu/Python)   
Retrieved from: [A Gallery of Interesting Jupyter Notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)

***

# Software

## Interactive Notebooks

- A graphical user interface (GUI) for writing code, text, visualizations, and more

- Designed to make it easier to read and write code

- Typically made of cells, or chunks of code/text, that can be run individually to explore step-by-step results

- Can be general purpose, or specific to a programming language or discipline

- [Overview of different notebooks](https://morphocode.com/interactive-notebooks-data-analysis-visualization/) from Morphocode

### Why use interactive notebooks?

- Developing and debugging code
    - Enables users to test small chunks of code and examine output in real time

- Sharing code
    - Keeps code and output in a single document
    - Easier for a non-technical audience to read, run, and understand results
    - Can export code, visualizations, and analysis in multiple formats, including PDF and HTML

- Explaining code
    - Notebooks give space for a written explanation of your code and analysis
    - Jupyter Notebooks use markdown for text
        - Markdown: syntax for formatting plain text
        - Learn more at [https://www.markdownguide.org/](https://www.markdownguide.org/)

![Logo for Project Jupyter](./Images/JupyterLogo.png) [jupyter.org](https://jupyter.org/)

__Project Jupyter__
- Jupyter is an open source project for creating interactive and reproducible code

- Creator of Jupyter Notebooks, one of the most popular interactive notebooks for data analysis and visualization

![Logo for Project Jupyter](./Images/JupyterLogo.png)

__Jupyter Notebooks__   
- Interactive notebooks for data analysis and visualization

- Editable user interface, runs in a web browser

- Kernels, software that executes the code
    - Supports several programming languages (Python, R, Julia, and more)

__Screenshot of a Jupyter Notebook:__   

![Screenshot](./Images/NotebookExample1.png)


Source: [Data Cleaning using Python with Pandas Library](https://nbviewer.jupyter.org/github/Tanu-N-Prabhu/Python/blob/master/Data_Cleaning/Data_Cleaning_using_Python_with_Pandas_Library.ipynb) by [Tanu Nanda Prabhu](https://github.com/Tanu-N-Prabhu/Python)   
Retrieved from: [A Gallery of Interesting Jupyter Notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)

![Logo for IPython Project](./Images/IPythonLogo.png)   
[ipython.org](https://ipython.org/)

- A command line interface to use Python interactively

- A Jupyter kernel to write Python code within a Jupyter notebook

- .ipynb file extension
    - File extension for Jupyter Notebooks (started with the IPython kernel)
    - Stores notebooks in JSON format

__Screenshot of a Jupyter Notebook, in a text editor:__   

![Screenshot of a Jupyter Notebook in a text editor](./Images/NotebookExampleTextEditor.png)

Source: [Data Cleaning using Python with Pandas Library](https://nbviewer.jupyter.org/github/Tanu-N-Prabhu/Python/blob/master/Data_Cleaning/Data_Cleaning_using_Python_with_Pandas_Library.ipynb) by [Tanu Nanda Prabhu](https://github.com/Tanu-N-Prabhu/Python)   
Retrieved from: [A Gallery of Interesting Jupyter Notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)

![Anaconda logo](./Images/AnacondaLogo.png) [anaconda.com](https://www.anaconda.com/)

- An open source distribution for scientific computing
- Includes Python, Jupyter Notebooks, hundreds of additional libraries, a package and environment manager, and a graphical user interface (GUI)
- Popular in the data science community

***

# Jupyter Notebooks + Reproducibility

Jupyter Notebooks are a tool to help create reproducible research

Code written in a notebook is not automatically reproducible

Reproducibility includes:
- Availability of data

- Availability of software and libraries/packages

- Documentation of procedures, environments, and versions

- Persistent location

__Discussion: Is this notebook reproducible?__ 

![Screenshot of Jupyter Notebook, not very reproducible](./Images/ReproducibleExample2.png)

Source: [Analysis and visualization of a public OKCupid profile dataset using python and pandas](https://nbviewer.jupyter.org/github/lalelale/profiles_analysis/blob/master/profiles.ipynb) by Alessandro Giusti   
Retrieved from: [A Gallery of Interesting Jupyter Notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)

__Tips for creating reproducible notebooks:__

- Make use of markdown (section headings, narrative text)

- One step = one code cell

- Pay attention to dependencies, notebooks should run top to bottom

- Use descriptive variable names and document them

- Make data accessible, use relative file paths

- Clean up your notebook!

__Workflow for reproducible Jupyter Notebooks:__   

<img src="./Images/ten_rules_workflow.png" alt="Workflow for reproducible Jupyter Notebooks" style="width: 800px;"/>

<br><br>
Source: Rule A, Birmingham A, Zuniga C, Altintas I, Huang S-C, Knight R, et al. (2019) Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks (Figure 1). PLoS Comput Biol 15(7): [e1007007](https://pubmed.ncbi.nlm.nih.gov/31344036/). 
https://doi.org/10.1371/journal.pcbi.1007007

__Discussion: Is this notebook reproducible?__ 

![Screenshot of Jupyter Notebook, more reproducible](./Images/ReproducibleExample1-1.png)  

Source: [Analysis and visualization of a public OKCupid profile dataset using python and pandas](https://nbviewer.jupyter.org/github/lalelale/profiles_analysis/blob/master/profiles.ipynb) by Alessandro Giusti   
Retrieved from: [A Gallery of Interesting Jupyter Notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)

***

# Jupyter Notebook Tutorial

### In this tutorial, we will:

1. Learn how to create a new notebook and open an existing notebook
2. Learn the basics of Jupyter Notebooks, including adding and removing cells, running cells, and switching between code and markdown

For screenshots, written instructions, and additional functions of Jupyter Notebooks, see our additional __JupyterNotebookTutorial.ipynb__ file.
<br>
[Download the tutorial from GitHub here](https://github.com/jhu-data-services/python-installation-instructions) (downloading will let you open the notebook in Anaconda and run cells interactively). 
<br>
You can also [view the tutorial online here](https://github.com/jhu-data-services/python-installation-instructions/blob/main/jupyter-notebook-tutorial/JupyterNotebookTutorial.ipynb) (no download required, but will not run interactively).

***

# Resources

__General Resources__
<br><br>
[Project Jupyter](https://jupyter.org/) - organization behind Jupyter Notebooks   
[Anaconda](https://www.anaconda.com/) - environment manager and GUI for launching Jupyter Notebooks  
[RISE slideshow extension for Jupyter Notebooks](https://rise.readthedocs.io/en/stable/)   
[Guide to interactive notebooks](https://morphocode.com/interactive-notebooks-data-analysis-visualization/)   
[Basic Markdown syntax](https://www.markdownguide.org/basic-syntax) for formatting text elements   

__Tutorials and Examples__
<br><br>
[Real Python introduction to Jupyter Notebooks](https://realpython.com/jupyter-notebook-introduction/)   
[Jupyter Notebooks documentation and tutorials](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/examples_index.html)   
[Programming Historian: Introduction to Jupyter Notebooks](https://programminghistorian.org/en/lessons/jupyter-notebooks)   
[Jupyter Notebooks gallery on GitHub](https://github.com/jupyter/jupyter/wiki)  
[Towards Data Science](https://towardsdatascience.com/tagged/jupyter-notebook) - online publication, dozens of articles on Jupyter Notebooks and other data science topics for beginner to advanced levels 

__Reproducibility__
<br><br>
[Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks](https://doi.org/10.1371/journal.pcbi.1007007)   
[Reproducibility guide for Jupyter Notebooks](https://github.com/jupyter-guide/jupyter-guide)  
[Jupyter Notebooks and reproducible data science](https://markwoodbridge.com/2017/03/05/jupyter-reproducible-science.html)

__Conferences__
<br><br>
[JupyterCon](https://jupytercon.com/) - past talks [available on YouTube](https://www.youtube.com/playlist?list=PL055Epbe6d5b572IRmYAHkUgcq3y6K3Ae)    

# Take our survey to help us improve this workshop:   
# https://www.surveymonkey.com/r/ReproducibleResearch    


# Workshop series schedule:   
Tuesday 3/01, 11:00-12:30pm: Getting Started with R Markdown    
Wednesday 3/30, 1:00-2:00pm: Introduction to Reproducible Research     
Thursday 3/31, 11:00-12:30pm: Version Control: Using Git and GitHub  

# Questions?   

## Contact us at dataservices@jhu.edu

### About this Presentation  
This presentation was created using Jupyter Notebooks version 6.0.1 and the RISE notebook extension version 5.6.1.    

### Terms of Use 
This material is licensed under a [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/), attributable to [Data Services](https://dataservices.library.jhu.edu/), Johns Hopkins University.  

The images, external resources, and other referenced materials may have other licensing and terms of use.   

Please cite this material as:

> Johns Hopkins University Data Services. (2022, February 15). Reproducible research: Getting started with Jupyter Notebooks [workshop presentation].