# Reproducible Data Visualization with Python (Matplotlib) in Jupyter Notebooks - Part I 

Credits: These workshop materials are adapted from: [Reproducible Science Curriculum](https://github.com/Reproducible-Science-Curriculum) and [Lectures on scientific computing with Python](https://github.com/jrjohansson/scientific-python-lectures)

# 1. Background on reproducible research
(See slides for details)
* A motivating example: the Neutrino study 
* The reproducibility crisis 
* Factors contributing to poor reproducibility and solutions (documentation, organization, automation, dissemination) 
* __Computational reproducibility__
* There are tools that can help with each of these steps! 



# 2. Data & Project Organization
(See slides for details)
* How to structure project folders and name files 
* File naming conventions 
* Slides to adapt from: https://reproducible-science-curriculum.github.io/organization-RR-Jupyter/slides/02_slideshow_organization.slides.html#/

# 3. Intro to Jupyter Notebook 
* An open source literate programming tool that can interlink narratives and code
* Can document coding process like writing a diary
* Can run the code in-line and see the intermediate results and visualization, and document the output together with code in the same file  
* Easy to publish and share with collaborators, reviewers, and the public
* A well-known example: [LIGO](http://www.ligo.org/) (Laser Interferometer Gravitational-Wave Observatory)
>* All [data](https://losc.ligo.org/data/) are publically available free of charge.
>* Jupyter Notebooks running Python are produced for each [publication](https://www.ligo.caltech.edu/page/detection-companion-papers). 
>* These notebooks allow full reproducibility: all analyses and figures can be recreated.
>* Produce in-depth [Tutorials](https://losc.ligo.org/tutorials/) using Jupyter Notebooks and Python
>* [Signal processing tutorial](https://losc.ligo.org/s/events/GW150914/GW150914_tutorial.html)


# 4. Hands-on: Familiarizing with Jupyter Notebook

## Set up
* First, show of hands, who already:  
    * have python installed; 
    * have jupyter notebook installed
    * know basic Python
    * know basic Markdown

## Launching Jupyter Notebook

* If you have previously installed Jupyter Notebook on your computer, lauch locally from your command line by typing in `jupyter notebook`
* If you haven't installed Jupyter Notebook on you computer, you can start a temporary notebook on Jupyter server (Try Jupyter: https://jupyter.org/try) to avoid installations. Once there, select "Try Classic Notebook". An instance will oepn in your brswser. 
* In the long run you would want to install Python environment and relevant packages on you coputer. If you would like help with local installation, stay for half an hour longer after the workshop, or visit [dataCoLAB office hours](https://cmu-lib.github.io/data-colab/) Wednesday afternoons.  **Important: Do NOT install in the Root directory. If you are installing with Anaconda, be sure to change the default path to Home directory ( choose "Install for me only")**


## Lauching repository directly using Binder
For this workshop, we will launch pre-built notebooks using Binder from the GitHub repository. Go  to this link: https://github.com/huajinw/reproducibility_plotting and click on the "launch binder" logo at the bottom. It might take a while to build. 

In this case, you do not need to follow the set up in the next section, but we will go over it anyways. 

## Organizing your files and downloading data


### Organizing your file structure
Create a working directory using the file organizing priciples and naming conventions learned above. This can be done in the Jupyter Dashboard. 
1. Create `../reproducibility_workshop/` as your __working directory__ to store all files related to this workshop
2. Go to the __working directory__, create three sub directories: 
   >`data` -- this is where downloaded data will be stored
   >
   >`notebooks` -- this is where the notebook files will be generated and saved
   >
   >`plots` -- this is where plots will be exported and saved
3. Enter the `notebooks` folder and create a new Notebook in Python 3. Rename the Notebook as you wish. 

### Downloading data

In this workshop, we will use the Gapminder dataset. Please download the raw data from the web [here](https://raw.githubusercontent.com/Reproducible-Science-Curriculum/data-exploration-RR-Jupyter/gh-pages/data/gapminderDataFiveYear_superDirty.txt) and [here](https://raw.githubusercontent.com/Reproducible-Science-Curriculum/data-exploration-RR-Jupyter/gh-pages/data/PRB_data.txt) on GitHub and  processed and cleaned data [here](https://github.com/huajinw/reproducibility_plotting/blob/master/data/gapminder_cleaned.txt). 
* If you are running Jupyter Notebook locally, save the files in the `data` folder that you created earlier.
* If you are running Jupyter Notebook on the Jupyter server, download files to your computer, then uplaod them to the `data` foler that you created on the Jupyter server. 

## Basics of Navigating the Notebook

Learning objectives: Familiarise with the notebook environment, build up a simple notebook from scratch demonstrating the following operations:

* Insert & delete cells
* Change cell type (& know different cell types)
* Run a single cell from taskbar & keyboard shortcut (shift + Enter)
* Run multiples cells, all cells
* Re-order cells
* Split & merge cells
* Stop a cell

Text can be added to Jupyter Notebooks using Markdown cells. Markdown is a popular markup language that is a superset of HTML. To learn more, see [Jupyter's Markdown guide](http://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/Working%20With%20Markdown%20Cells.html) or revisit the [Reproducible Research lesson on Markdown](https://github.com/Reproducible-Science-Curriculum/introduction-RR-Jupyter/blob/master/notebooks/Navigating%20the%20notebook%20-%20instructor%20script.ipynb). 


#### Now let's make a code cell below, and use print statement to print out 'hello world'

#### What if we make the the same thing a markdown cells? 

**_Question: What is the difference when you run the same lines as code or as Markdown?_**

* To learning more about navigating Jupyter Notebook: https://github.com/Reproducible-Science-Curriculum/introduction-RR-Jupyter/blob/gh-pages/notebooks/Navigating%20the%20notebook%20-%20instructor%20script.ipynb
* A Markdown cheatsheet can be found here: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet

## Saving, Checkpointing and Reverting the Notebook
* The notebook wil autosave every few minutes.
* You can also create a checkpoint using the floppy/save icon on the toolbar or File -> Save and Checkpoint.
* You can revert the notebook to a saved checkpoint using File -> Revert to Saved Checkpoint.

## Checking Reproducibility
* One of the aims of using notebooks is to produce an executable document that can be rerun to reproduce the results.
* To run cells from scratch (i.e. from a fresh kernel), Kernel -> Restart and Clear Output and then run the cells you want.
* To run all the cells in the notebook from scratch: Kernel -> Restart and Run All

__<span style="color:red">Note for reproducibility</spam>:__ Jupyter Notebook allows running cells out of order, but we highly recommend to run code cells in order all the time. If you do not know what has been run, restart the Kernel and run again.