# Interactive and Reproducible Data Science
### _A COMPUTE Course at Lund University_
_Giulio Tesei, Catarina Doglioni, Mikael Lund_

![](http://jupyter.org/assets/jupyterpreview.png)

# Practical Details

1. This week: Monday (ML), Thursday (GT), Friday (GT). **10:15-15:00** (lunch 12-13)
2. Laptop w. miniconda installed; internet enabled
3. All slides available online
4. Project work:
  - Deadline **December 27**.
  - Two referees per project; written feedback on **January 3rd**
  - Presentations on **January 5th**
  - Full details on course homepage
5. Time divided roughly equally between lectures and exercises.

# Overview of Day 1

- Why Jupyter Notebooks?
- Course overview and practical details
- Introduction to Jupyter
  - Navigation, cell types, markdown, magic commands
  - Exporting, sharing, citing
  - Mixing languages, online contents
  - Help to help yourself

# Course Focus

- scientific _reproducibility_
- _documentating_ the scientific process
- _sharing_ specialized knowledge
- _data exploration_ using interactive programming

# Scientific Workflow

<img src="https://camo.githubusercontent.com/4782751a54025e99f6f32adeed2ddf6e8f8db724/687474703a2f2f723464732e6861642e636f2e6e7a2f6469616772616d732f646174612d736369656e63652e706e67" width="800" />

_Jupyter Notebooks encapsulates all of the above into a single, sharable document_

# Examples

- [Interactive notebooks: Sharing the code](http://dx.doi.org/10.1038/515151a), _Nature, 2014_
- [Binary Black Hole Signals](http://nbviewer.jupyter.org/github/minrk/ligo-binder/blob/master/index.ipynb), _PRL_

Some notebooks from Theoretical Chemistry, acting as SI for published works:

- https://github.com/mlund/decaarginine
- https://github.com/mlund/CPPM
- https://github.com/mlund/SI-proteins_in_multivalent_electrolyte
- https://github.com/mlund/cosan

Some teaching material:

- https://github.com/mlund/particletracking (image recognition; MC simulation)
- https://github.com/mlund/chemistry-notebooks/blob/master/statistical-mechanics/statmek.ipynb (ipywidgets)

# A tribute to Hans Rossling (1948-2017)

## bqplot

![bqplot](https://github.com/bloomberg/bqplot/raw/master/bqplot-screencast.gif)

# Installation and Setup

1. A complete working environment can be set up using Continuum Analytic's ANACONDA.
   To save space, use [miniconda](https://conda.io/miniconda.html)
2. On linux and mac. anaconda installs in user space and admin rights are not required
3. Several _environments_ can be installed, each containing a specific set of packages.

~~~ bash
conda env create -f environment.yml
source activate LUcompute
jupyter nbextension enable rubberband/main
jupyter nbextension enable exercise2/main
~~~   

In [None]:
!head environment.yml

# Task

Familialise yourself with the `conda` command and learn how to

- install additional packages
- (de)activate environments
- save an environment to an `environment.yml` file. This is relevant for your project work
  where you need to tell others how to reproduce your environment.

# Opening and Basic Navigation

## Start a local notebook server
This will open a jupyter session and display it in your browser:
~~~ bash
source activate LUcompute
jupyter-notebook
~~~
## Topics

- Cell types (_code_, _markdown_)
- Run cell: `shift`+`enter`
- Run all cells; kernel; keyboard shortcuts
- Saving; exporting; viewing online

In [None]:
# code cells use python per default (since we opened a Python notebook)
a=2
print('value =', a)

In [None]:
%%bash
# everything in this code cell is bash.
# will probably not work on Windows(?)
ls -l

## Tasks

1. Modify and run the two code blocks above (python and bash)
2. Find the keyboard shortcuts for
   - Toggling line numbers
   - Setting the cell to _code_
   - Setting the cell to _markdown_
3. Export this notebook as HTML and open it in a web-browser
4. Go back to the Home page (usually a browser tab) and check which notebooks that are currently running

# Help to help yourself

- `shift`-`tab`-`tab`: access information about python functions (place cursor between brackets)
- `tab`: tab complete functions and objects
- `?command` or `command?`
- The help menu has links to detailed help on Python, Markdown, Matplotlib etc.

## Task

Use the above different ways to explore the arguments for the `print()` function we used earlier.
What does the `end` argument mean?

In [None]:
print(233, end='a')

# Output

The result from running a code cell is shown as output directly below it. In particular, the output from the _last_ command will be printed, unless explicitly suppressed by a trailing `;`

Previous output can be retrieved by:
- `_` last output
- `__` last last output
- `_x` where `x` is the cell number.

In [None]:
a=3
a

In [None]:
b=7
b;

In [None]:
_

# Built-in _Magic commands_

- Line magic (`%`): operates on a single line and can be mixed with other languages
- Cell magic (`%%`): operates on the whole cell
- More info: http://ipython.readthedocs.io/en/stable/interactive/magics.html

In [None]:
%lsmagic # lists available magic commands

## Task: Measuring Speed

Use _line magic_ to calculate the speed of the python function below

In [None]:
from math import sqrt
def myfunction(x):
    for i in range(10):
        x=x+sqrt(x)
    return x

print('f =', myfunction(5))

In [None]:
%timeit myfunction(5)

## Speeding up using Cython via _cell magic_

We have already seen an example of cell magic, namely `%%bash` and we will now run a different language called cython which is a mix of C and python.

In [None]:
%load_ext Cython

In [None]:
%%cython
cdef extern from "math.h":
    double sqrt(double)

cpdef myfunction2(double x): # note the 'p' in 'cpdef'
    for i in range(10):
        x=x+sqrt(x)
    return x

In [None]:
print('f =', myfunction2(2))
%timeit myfunction2(2)

## Other Kernels

- So far Python, BASH, Cython
- R, Julia, Ruby, ROOT, Cling (C++), Matlab, Gnuplot, [fortran](https://nbviewer.jupyter.org/github/sourceryinstitute/jupyter-CAF-kernel/blob/master/index.ipynb) etc
- More at https://github.com/jupyter/jupyter/wiki/Jupyter-kernels

## Example: Cling
![](https://cdn-images-1.medium.com/max/1600/1*NnjISpzZtpy5TOurg0S89A.gif)

# Changing Directories

More ways exist:

- `!cd` and corresponding `!pwd`
- `%cd` and corresponding `%pwd`

## Task

There are important differences between these two approaches. Find out what they are!

# Documentation using Markdown

Markdown is a _lightweight_ markup language that

- is intended to be as easy-to-read and easy-to-write as is feasible
- should be publishable as-is, as plain text
- supports equations ($f(x)=x$), [links](http://), ~~~text formatting~~~, tables, images etc.

For more information see [here](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet).

## Task: Pythagoras

Use a Markdown cell to explain Pythagoras theorem. Your answer should include

- headers and text formatting
- a link to an external web page
- LaTeX math

_Hint:_ images need not be local, but can be linked via a url.
</span>

### Answer

#### Pythagoras Theorem

For a **right triangle** with legs $a$ and $b$, and hypotenuse, $c$,

$$a^2 + b^2 = c^2$$

![](https://upload.wikimedia.org/wikipedia/commons/d/d2/Pythagorean.svg "Pythagoras")

Further reading [here.](http://mathworld.wolfram.com/PythagoreanTheorem.html)

## Task: Markdown Tables</span>

1. Use a markdown cell to create a 3x1 table with column labels **Element**, **Symbol**, **Atomic number**, and a single row with **Hydrogen**, **H**, and **1**.
2. In a _code cell_, import the `Markdown` function from `IPython` using
   ```.py
   from IPython.display import Markdown
   ```
   and explore its documentation.
3. Redo subquestion 1 using the `Markdown` command and a [formatted Python string](https://pyformat.info)

### Answer

#### This is a table with hard-coded contents:
Element  | Symbol   |   Number
-------- | -------- | ----------
Hydrogen | H        |  1

In [None]:
from IPython.display import Markdown
data = ['Hydrogen', 'H', '1'] # python list

Markdown('''
   #### This is a table filled with data from a python `list`
   Element  | Symbol   |   Number
   -------- | -------- | ----------
   {d[0]}   | {d[1]}   |  {d[2]}

   (triple quotes infer a _string literal_ that can span several lines)
'''.format(d=data))

## Task: Videos

The `IPython.display` module contain many more features to insert LaTeX, images, geographical maps etc.
Use it to insert a __youtube video__ of your choice.


In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('dxyti_wCWaE')

## Task: Embedded web content

Lund University Publications ([LUP](https://lup.lub.lu.se/search)) allows you to search for publications from specific LU departments or authors. They also provide the possibility to _embed_ the search result. Use `IPython.display.IFrame` to display a search of your choice.

In [None]:
# Here's an example showing a protein - replace with something from LUP!
from IPython.display import IFrame
IFrame(src="http://www.ncbi.nlm.nih.gov/Structure/icn3d/full.html?pdbid=4lzt", width=800, height=400)

![](https://github.com/arose/nglview/raw/master/examples/images/membrane.gif)

## Remarks on Markdown

Markdown can be extended with:

- Table of contents
- Bibliography (bibtex)
- Customized layouts

### Suggested reading

- [Markdown Cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#emphasis)
- [Making publication ready python notebooks](http://blog.juliusschulz.de/blog/ultimate-ipython-notebook)
- [28 Jupyter Notebook tips, tricks and shortcuts](https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/)

# Sharing Notebooks

## Common export options directly from the notebook:

- `.ipynb` (default format)
- HTML (convenient and compact, single file)
- PDF, Markdown, LaTeX etc. (may require additional packages)
- (remember the `environment.yml` file!)

## Online options

- https://github.com (notice their student package!)
- https://nbviewer.jupyter.org (view notebooks)
- https://tmpnb.org (online, fixed environment)
- http://mybinder.org (online, custom environments)
- Microsoft Azure (online coding)

# Long term storage

- **Zenodo** (https://zenodo.org) can make a DOI from a Github repository.
- The repository state (every release) is archived for long term storage
- Funded by the European Commission solution