In [1]:
from IPython.core.display import HTML
css_file = "./presentation_notebook_style.css"
HTML(open(css_file, 'r').read())

# 5. Publishing Scientific Codes

## Publishing Scientific Codes

- At some point may want to *publish* your code (or at least show it to another person)
- Want to make it as easy as possible for some other person to install and run you code 
    * Use tools which enable easy installation and replication of your runtime environment

<center>![Automation](https://intellyx.com/wp-content/uploads/2015/11/automate-all-the-things.jpg)</center>

## Automating installation

- For simple python codes, installation may be as easy as *downloading* the code and running `python filename.py`. 
- For more complex codes spread over many files, require compilation and/or have multiple dependencies, installation can get a lot trickier
- Need some way of automating the installation and build process
- Can also save *you* a lot of time
    * makes sure project always gets build correctly using the correct file versions
- ***Makefiles*** one way to do this

## Make

<center>![Automation](https://imgs.xkcd.com/comics/automation.png )</center>
##### Automation - [xkcd](https://xkcd.com/1319/)

### Make

- Projects often end up spread across many files and directories with *complex inter-dependencies*
    * A change made to one file can have a knock on effect to several other files up the chain of dependencies
    * Keeping track of this and making sure that all files are kept up to date can quickly get complicated and time-consuming

- Automate this using a tool such as `GNU Make`
- `Make` describes the *dependencies* between files
- When a project is rebuilt, it looks to see which files have changed, only rebuilds the bits that need rebuilding. 
- Saves time for large projects where rebuilding everything from scratch can take a while

## Replicating runtime environments

- No two runtime environments are exactly the same. 
    * Every computational setup has its own *unique hardware* and selection of installed software
- However, we want our code to be as **reproducible** as possible
    * should be possible to gain exactly the same results, regardless of what system the code is run on
- We also would like our code to be as **accessible** as possible
    * support a range of operating systems and computer hardware. 

### Conda
- `conda` is a package and environment management system which allows users to create multiple runtime environments (e.g. containing different versions of libraries) and easily switch between them
- Allows users of your code to install exact version of libraries you use without them having to wreck their own setup

### Containers

- ***Containers*** are a way of producing a lightweight, standalone, executable package of a piece of software
- This package (or *container image*) includes everything required to run the software: the code itself, the runtime environment, libraries, settings etc
- Software run inside container is *isolated* from the rest of the system
- Many containers can be run on the same machine at once, allowing multiple versions of a code or runtime environment to exist at once and run at the same time without conflict

### Containers vs virtual machines

- Containers share many similarities with virtual machines: both allow code to be run in isolated runtime environments
- Container images are much more **lightweight** (typically tens of MBs) as share the operating system (OS) kernel with other containers
- Virtual machines include a full copy of an operating system, so typically take up tens of GBs
- Container images contain exactly what is required by the software and no more
- Virtual machines contain a of lot of extra material that is likely to already exist on the host machine (e.g. a full operating system)

<div style="float:left;width:45%">
     <img src="../images/Container@2x.png" alt="Containers">
     
     <center>Containers</center>
</div>
<div style="float:right;width:45%">
    <img src="../images/VM@2x.png" alt="Virtual machines">
    
    <center>Virtual machines</center>
</div>

##### Source: [Docker](https://www.docker.com/what-container)

## Distributing code

- Sharing code can be as simple as sharing a link to your github repo via email / twitter / your blog
- There are also various package indexes you can upload your code to, making it easier for others to find and download. 
    * For python, can upload to the [Python Package Index](https://pypi.python.org/pypi), PyPI. 
    * For python and general code projects, conda packages can be uploaded to [anaconda.org](Anaconda.org)
    * Can upload a Docker image of your project to the Docker cloud.

### Licences

- Before you share your code, you should make sure that it has the correct software licence
- The licence states who *owns* the software, who is *allowed* to use it and what *rights* the users and owners have. 
- Typically a text file called `LICENSE` in the project's root directory
- [Choose a license](http://choosealicense.com/) is a good place to start for open source projects

### DOIs

- The internet is a pretty transient place 
    * Websites are often rearranged, making it hard to provide links that will still be valid in the future
- Good idea to give your code a ***digital object indentifier (DOI)***
- A DOI is a unique alphanumeric string by which content can be identified and which will provide a persistent link to its location on the internet
- Free DOIs for GitHub repositories can be obtained using [zenodo](https://zenodo.org/)

## Presenting results


- One of the best ways to present results is through **figures** - graphical representations of data allow the reader to better see patterns and are (hopefully) much easier to understand than a bunch of numbers. 
- To maximise the impact of your figures, you should make sure that they are
    * **easy to understand** 
    * **informative **
    * **attractive & eye catching**

![Defaults vs tweaked](../images/journal.pcbi.1003833.g004.png)
##### Matplotlib defaults vs informative settings

### Presenting data fairly

The way that data is presented has a **huge impact** on how it is interpreted by the reader

<center>![Spot the mistake...](http://68.media.tumblr.com/9690a36c2860b2cca759e81c056ed948/tumblr_onc2gy3o0v1sgh0voo1_1280.png)</center>

<center>![Daily Fail](http://i.dailymail.co.uk/i/pix/2017/02/22/11/3D851AC500000578-0-image-a-2_1487761593852.jpg)</center>
##### From the [Daily Mail...](http://www.dailymail.co.uk/news/article-4248690/Economy-grew-0-7-final-three-months-2016.html)

<center>![Try a scatter next time](http://68.media.tumblr.com/c1100353d116fc246f53515ad35ae969/tumblr_ojmy4cC94j1sgh0voo1_1280.jpg)</center>

### Beyond matplotlib

- `Matplotlib` is extremely powerful, but often takes a lot of work to make good plots 
- There are many packages that extend `matplotlib`'s capabilities, allowing you to make better looking plots with less work
- **Interactive plots** can be a much more powerful way to present data than traditional static plots
    * Libraries such as `bokeh` and `plotly` allow easy generation of such plots