## What are notebooks?

<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.14.0/css/regular.min.css" integrity="sha512-qgtcTTDJDk6Fs9eNeN1tnuQwYjhnrJ8wdTVxJpUTkeafKKP6vprqRx5Sj/rB7Q57hoYDbZdtHR4krNZ/11zONg==" crossorigin="anonymous" />

<div style="font-size: 20px">

<i class="fas fa-align-justify ic"></i> text + <i class="fas fa-code"></i> code + <i class="fas fa-photo-video"></i> outputs = <img src="media/Jupyter_logo.svg" width=20 style="position: relative; top: 5px; display: inline"> notebook<br><br>

<i class="fas fa-align-justify"></i> text + <i class="fas fa-code"></i> code = <img src="media/R_logo.svg" width=25 style="display: inline"> Markdown

</div>

Showcase:
- [Lung Cancer Post-Translational Modification and Gene Expression Regulation](https://nbviewer.jupyter.org/github/MaayanLab/CST_Lung_Cancer_Viz/blob/master/notebooks/CST_Data_Viz.ipynb?flush_cache=true) - heatmaps
- [An open RNA-Seq data analysis pipeline tutorial with an example of reprocessing data from a recent Zika virus study
](https://nbviewer.jupyter.org/github/maayanlab/Zika-RNAseq-Pipeline/blob/master/Zika.ipynb) - plotly
- [Population Genetics in an RNA World](https://nbviewer.jupyter.org/github/gocarli/RNA-Popgen-Notebook/blob/master/Population_Genetics.ipynb) - equations

For other examples from across various scientific disciplines check the [gallery of interesting Jupyter Notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks)

## Different kinds of notebooks

### R Markdown/R Notebook (RStudio)

- the two terms are almost identical when used **within RStudio**

- Markdown document with code

- outputs not saved with the notebook

- notebook together with outputs can be exported ("knited") to a number of different files types:
 - PDF
 - HTML
 - Word document 

- predominantly R, other languages somewhat supported (e.g. Python with or without reticulate)

- tied to RStudio
  - but can be rendered in any Markdown viewer!
  - few other editors support R Markdown

- widgets: htmlwidgets (standalone) and shiny (reactive; dashboard server)

### Jupyter notebooks

- JSON file (needs a dedicated viewer)

- multiple language "kernels":
  - JuPyteR = Julia, Python, R
  - C++, Java, MATLAB, Perl, Scala, Mathematica & 135+ more  

- multi-lingual cells using cell "magics"

- can be exported to:
 - PDF
 - HTML
 - Markdown
 - slides 

- multiple editors (Jupyter Notebook, JupyterLab, VSCode, Spyder, PyCharm, ...)

- widgets: ipywidgets (both interactive and standalone), Voilà (dashboard server)

### Other notebooks solutions

- Netflix PolyNote: inspired by Jupyter, multi-lingual by default

- Jupyter-based
  - Google Colab,
  - JetBrains Datalore,
  - Microsoft Azure Notebooks,
  - CoCalc  

- Mathematica (Wolfram) Notebooks: predecessor of Jupyter

## Short tour of notebooks features

- For Jupyter notebook: [Examples_Jupyter_notebook.ipynb](https://mybinder.org/v2/gh/krassowski/computational-notebooks-for-biomedical-research/master?urlpath=lab/tree/Examples_Jupyter_notebook.ipynb)
- For R Markdown: [Examples_R_Markdown.Rmd](https://mybinder.org/v2/gh/krassowski/computational-notebooks-for-biomedical-research/master?urlpath=rstudio)

## Resources

- writing reproducible papers using R Markdown: [resulumit.com/blog/rmd-workshop](https://resulumit.com/blog/rmd-workshop/)
- reproducible research with Jupyter: [reproducible-science-curriculum.github.io/workshop-RR-Jupyter](https://reproducible-science-curriculum.github.io/workshop-RR-Jupyter/)

## Searching for a balance

How to make analysis:

- easy to write
- easy to read

What do you find more challenging:

- writing your own code
- reading code written by someone else
- reading your own code
- reading your own code from three years ago

## Balancing challenges in scientific research

How to keep the project:

- easy to maintain?

- easy to understand?

- easy to reproduce?

- on track to meet the deadlines?

## Applications of notebooks

Created for:

 - <i class="fas fa-search"></i> exploration 

 - <i class="fas fa-book"></i> story telling 

Also applicable for:

- writing reproducible manuscripts (some journals are evaluating accepting notebooks!)

- creating interactive dashboards:
  - [Shiny](https://shiny.rstudio.com/articles/interactive-docs.html) with [flexdashboard](https://rmarkdown.rstudio.com/flexdashboard/):
    - Shiny dashboards can be also created without notebooks
    - see gallery in the flexdashboard link
  - [Voilà](https://blog.jupyter.org/and-voil%C3%A0-f6a2c08a4a93):
    - [gallery](https://voila-gallery.org/) (see Visum-Clustergrammer2)

- batch-generating reports from a template

- assigning homework and grading coursework: [nbgrader](https://nbgrader.readthedocs.io/en/stable/)

## Notebooks replicability and biomedical research

### FDA Title 21 CFR Part 11

- standards required from groups submitting applications to FDA

- includes: validation, audit trial (detailed history, timestamps), record retention
 - good audit trial = records every step of the analyst in a tamper-resistant way

- you are likely not bound by it

- commercial software DOES offer 21 CFR compliant Audit Trial support
 - e.g. SIMCA for metabolomics analyses

### Computational notebooks are not 21 CFR compliant *by default*

- You can execute code cells in any order

- The history of execution (and changes) is not kept with the file

- R Markdown enforces correctness of code before export

- Jupyter notebooks can be exported in any state

- Jupyter notebooks assume you are a responsible person, you need to act like one

- You still can use:
  - extensions to add timestamps
  - continuous integration to re-run your notebooks every time
  - a notebooks platform build with replicability as a main goal
  - linters, tests and diffs to prevent mistakes in the first place

**While common notebooks tools are not designed for reproducible research, it is easy to make your own notebooks replicable.**

For example, when using R Markdown, it is a good practice to write out the entire session information at the very end of the document. You can do that calling `sessionInfo()`.

### General FDA notes
- at current only for-profit companies developing closed-source proprietary software invest in full FDA compliance (e.g. SAS)

- validation issues somewhat addressed in:
   - a huge progress on the R core in May 2018, see: [Regulatory Compliance and Validation Issues: A Guidance Document for the Use of R in Regulated Clinical Trial Environments](https://www.r-project.org/doc/R-FDA.pdf)
     - only core + some chosen packages
     - CRAN 
     - tidyverse? maybe in the future
   - RStudio has a largely similar document: [RStudio: Regulatory Compliance and Validation Issues](https://rstudio.com/wp-content/uploads/2014/06/RStudio-Commercial-IDE-Validation.pdf)   

- audit trial:

> R is not intended to create, maintain, modify or delete Part 11 relevant records but to perform calculations and draw graphics.
>
> Where R’s use may be interpreted as creating records, however, R can support audit trail creation within the record
>
> R includes `date()`, `Sys.time()`, `Sys.Date()` and `Sys.timezone()` functions  which  enable users to include date and time stamps on report, graphical and other output, thus enabling the use of this information in the tracking of user sessions.

- Any Jupyter extension that you use which will be automatically and reliably adding timestamps could be better than that!

## Advice on working with notebooks

- start a notebook by defining its scope with:
   - a description (what is this notebook doing?)
   - list of aims (what are deliverables? any questions to be answered?)
   - (optional) list of non-aims (if needed to distinguish similar notebooks with otherwise overlapping scope)

- separate data preparation, data exploration and the final analyses notebooks

- DRY: if you find yourself writing the same code at the beginning of each notebook:
   - create a file with the repetitive setup commands
   - execute it in the first cell of every notebook:
     - for Jupyter notebook: `%run notebook_setup.ipynb`
     - for R Markdown: `source('init.R')`

## Three tools that may help you

1. Tests/assertions
2. Linters/spellcheckers
3. Diffs

### 1. Tests/assertions

 - check if the code performs as you would expect
 - different from validation: works as expected + expectations are correct (a task for an entire team!)
 - at minimum, prevents breaking the properly functioning code when trying to *improve* it

↓

#### Tests

- keep in separate files (e.g. see [pytest](https://docs.pytest.org/en/stable/) for Python and [testhat](https://testthat.r-lib.org/) for R)
- test the code

In [27]:
def divide(a, b):
    try:
        return a / b
    except ZeroDivisionError:
        return 'Warning: division by zero'

In [28]:
def test_divide():
    assert divide(1, 2) == 0.5
    assert divide(10, 2) == 5


test_divide()

In [29]:
assert divide(1, 0) == 'Warning: division by zero'

#### Assertions
- include in notebooks
- verify the data:
    - sanity checks, e.g. `assert all(patients.age > 0)`
      - protects you in case if the data changes in an unexpected way
      - gives the reader a signal that the frame patients has a column "age"
    - assumption checks (for statistical models, or to prevent duplicate index entries in pandas)
    - unique values check e.g. `assert set(patients.smoker) == {'Yes', 'No', 'Unknown'}`
      - a weak assumption which should also hold if new data points are added

In [109]:
from pandas import Series

age = Series([1, 4, 2, 3])

In [31]:
assert all(age > 0)

### 2. Linters and spellcheckers

- check the code as you write it
- can only catch some obvious mistakes and typos (syntax errors)
- often can teach you good *style* (i.e. the conventions adopted by other language users)

![lsp](https://raw.githubusercontent.com/krassowski/jupyterlab-lsp/master/examples/screenshots/panel.png)

### Python

General:
- fun fact: `import this`
- PEP8, flake8, pycodestyle, mypy....
- mypy: static type checking

In Jupyter notebooks:
- early days (extension required)
- [jupyterlab-lsp](https://github.com/krassowski/jupyterlab-lsp) with [pyls](https://github.com/palantir/python-language-server)

### R

In R Notebooks:
- by default in RStudio
- [styleR](https://github.com/r-lib/styler)

In Jupyter notebooks:
- early days...
- [jupyterlab-lsp](https://github.com/krassowski/jupyterlab-lsp) with R [languageserver](https://github.com/REditorSupport/languageserver)

### 3. Diffs

- git is more then a back-up solution
- comparing changes in code between every version is a standard practices
- what about comparing notebooks? how do you diff them?
   - R markdown: just git diff
   - Jupyter: [nbdime](https://nbdime.readthedocs.io/en/latest/), [jupyterlab-git](https://github.com/jupyterlab/jupyterlab-git)   

![test](https://github.com/jupyterlab/jupyterlab-git/raw/master/docs/figs/preview.gif)

### Many more tools at your disposal

> The three alone will not make your code perfect, but it's a good step forward

Getting your code towards perfection:
- thoughtful design of interfaces,
- domain-specific knowledge of the actual science (your code may be working as expected, but what if the expectations were wrong?)
- in-depth and easy to understand (e.g. hierarchical) documentation

## Publishing notebooks

Interactive (runnable):
- Binder: [mybinder.org](https://mybinder.org/)
- ShinyApps [shinyapps.io](https://www.shinyapps.io/)
- voila-gallery: [voila-gallery](https://github.com/voila-gallery/voila-gallery.github.io#contributing-new-examples)

Static (non-executable):
- nbviewer: [nbviewer.jupyter.org](https://nbviewer.jupyter.org/)
- GitHub
- Rpubs.com

## Embracing extensions/add-ons

- Jupyter Notebook
  - many [useful extensions](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions.html)
  - deprecated, use JupyterLab

- JupyterLab
   - see this [introduction](https://jupyterlab.readthedocs.io/en/stable/user/extensions.html)
   - version 2.0 often requires installation of both a Python extension package and the extension itself
      - mostly resolved in JupyterLab 3.0
   - browse extensions:
     - by [popularity on GitHub](https://github.com/topics/jupyterlab-extension) (noisy)
     - by [pictures](https://github.com/Yogayu/awesome-jupyterlab-extension) (curated list, outdated)
     - by [topic](https://github.com/mauhai/awesome-jupyterlab) (curated list)

- RStudio
  - not specific to notebooks, but to the IDE; see [introduction](https://rstudio.github.io/rstudioaddins/)
  - browser the [list of add-ins](https://cran.r-project.org/web/packages/addinslist/readme/README.html)