# Jupyter notebooks for reproducible research

Alexander Konovalov

_Centre for Interdisciplinary Research in Computational Algebra (CIRCA)_

## Quote from classics ...

* _Practices in source code sharing in astrophysics_, L.Shamir at al., Astronomy and Computing, vol. 1,
Feb. 2013, 54–58
* One of the references is almost 24 centuries old: this is “On Interpretation” by Aristotle
* _“One of the important advantages of releasing source code is that it allows replication of the results, which is a key concept in science (Aristotle, 350BC).”_

## Your mileage may vary

* Have you been frustrated by trying to use someone else’s code which is non-trivial to install?
* Have you tried to make supplementary code for your paper to be easily accessible for the reader?
* This may require non-trivial efforts


## Notebook interfaces

* SageMath, R, IPython, Jupyter, ...
* Combine code, results, text and graphics in the same document
* Lower some barriers to reproducibility
* But introduce some own challenges

# What is a notebook?

In [1]:
a=21 # code cell without output

In [2]:
print(a) # code cell with output

21


**This** is a _markdown_ cell. You can use LaTeX too: $21 \times 2 = 42$



In [3]:
# could contain many lines
def double(x):
    '''double the argument'''
    return 2*x

In [4]:
double(a)

42

## Hot off the press

* **Ten Simple Rules for Reproducible Research in Jupyter Notebooks** by Adam Rule, Amanda Birmingham, Cristal Zuniga, Ilkay Altintas, Shih-Cheng Huang, Rob Knight, Niema Moshiri, Mai H. Nguyen, Sara Brin Rosenthal, Fernando Pérez, Peter W. Rose 

* https://arxiv.org/abs/1810.08055

* The auhors consider barriers, opportunities and challenges, and tools for reproducible computaitonal research

## Spaghetti notebooks?

* One of the studies found out that only a small fraction of Jupyter notebooks mentioned in PubMedCentral publicaitons is runnable without problems with accessing data, resolving dependencies and using different platforms

* Another analysis of over a million of Jupyter notebooks publicly available on GitHub found that about 25% of them do not have any text

    - even those with text rarely contained detailed description of the steps or interpretation of results
    
* Are we going from undocumented code to undocumented notebooks?
* We can and should do better!

## Aspects of notebook development
* Organise and document
* Work with code
* Share

## Organise and document:
    
1. Tell a story, and match it to the audience
2. Document the process, not just the results
3. Get the right balance while splitting code and text between cells

## Work with code
4. Write modular code
5. Document dependencies
6. Use version control
7. Establish a pipeline

## Share
8. Share and explain your data, not only the results
9. Enable your notebooks to be read, run and explored
10. Contribute to reproducible and open research!

## Let us see some examples on Binder

* Python: https://github.com/alex-konovalov/repro-jupyter

* GAP: https://github.com/gap-system/try-gap-in-jupyter

* GAP https://github.com/alex-konovalov/gap-teaching/

* GAP + Travis CI + Codecov https://github.com/sukru-yalcinkaya/unipoly
