# Reproducible Research using IPython Interactive Publications

Robin Scheibler and Amir Hesam Salavati
<br>
<span style="font-size:0.75em;">Laboratory of Audiovisual Communications (LCAV), EPFL</span>

<div style="float:right;">
3 November 2015
</div>

## Transforming this...

<div style="color:#5C5858;width:95%;border: 1px solid;padding: 15px 15px 15px 15px;border-radius: 4px;">
from scipy.io import loadmat, wavfile
<br>
<br>
from IPython.lib.display import Audio
<br>
<br>
r,speech = wavfile.read('Data/german_speech_44100.wav')
<br>
<br>
_,rir = wavfile.read('Data/RIRs.wav')
<br>
<br>
print('Speech')
<br>
<br>
display(Audio(data=speech, rate=r))
<br>
<br>
print('Room Impulse Response')
<br>
<br>
display(Audio(data=rir[:,0], rate=r))
</div>

## Into this...

<iframe width="1800" height="600" src="https://www.youtube.com/embed/vMrj2TYGSiY?start=20&autoplay=1" frameborder="0" allowfullscreen></iframe>

## Outline

<table style="border: 0px solid white;width:100%;">
<tr style="border: 0px solid white;width:100%;">
<td style="width:75%;border: 0px solid white;"> <ul><li>Troubleshooting</li></ul>  </td>
<td style="width:25%;border: 0px solid white;"> 5 Min</td>
</tr>
</table>

<table style="border: 0px solid white;width:100%;">
<tr style="border: 0px solid white;width:100%;">
<td style="width:75%;border: 0px solid white;"> <ul><li>Reproducible Research: What, Why and How</li></ul>  </td>
<td style="width:25%;border: 0px solid white;"> 20 Min</td>
</tr>
</table>

<table style="border: 0px solid white;width:100%;">
<tr style="border: 0px solid white;width:100%;">
<td style="width:75%;border: 0px solid white;"> <ul><li>Introduction to IPython Notebooks Features</li></ul> </td>
<td style="width:25%;border: 0px solid white;"> 30 Min</td>
</tr>
</table>

<table style="border: 0px solid white;width:100%;">
<tr style="border: 0px solid white;width:100%;">
<td style="width:75%;border: 0px solid white;"> * IPython Notebooks for Reproducible Research  
</td>
<td style="width:25%;border: 0px solid white;"> 30 Min</td>
</tr>
<tr style="border: 0px solid white;width:100%;">
<td style="width:75%;border: 0px solid white;"> 
<ul><ul>
<li>Github troubleshooting</li>
<li>IPython Notebook for our papers</li>
<li>Hosting notebooks on Github</li>
<li>Github pages</li>
</ul></ul>
</tr>
</table>

<table style="border: 0px solid white;width:100%;">
<tr style="border: 0px solid white;width:100%;">
<td style="width:75%;border: 0px solid white;"> <ul><li>Final Q&A</li></ul>  </td>
<td style="width:25%;border: 0px solid white;"> 5 Min</td>
</tr>
</table>

<table style="border: 0px solid white;width:100%;">
<tr style="border: 0px solid white;width:100%;">
<td style="width:75%;border: 0px solid white;"> <ul><li> Concluding remarks </li></ul> </td>
<td style="width:25%;border: 0px solid white;"> 10 Min</td>
</tr>
</table>

<p style="font-size:1.25em;">
This is an <span style="color:red;">interactive</span> workshop.
Please do not hesitate to interrupt WHENEVER your want!
</p>

## Reproducible Research

### <ul><li>What?</li></ul>

### <ul><li>Why?</li></ul>

### <ul><li>How?</li></ul>

## What

<span style="color:#E5E4E2;">“The term *<span style="color:#5C5858;">reproducible research</span>* refers to the idea that the ultimate product of academic research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to <span style="color:#FBB917;">reproduce</span> the results and <span style="color:red;">create new work</span> based on the research."</span>

[wikipedia]


## Advancement of Science

<img src="./Figures/Verifyability.png" alt="image from economist" style="width: 700px;margin-top:-40px;margin-left:125px;"/>

## Advancement of Science

<img src="./Figures/advancementOFscience.jpg" alt="image from PhD Comics" style="width: 800px;"/>

## Different Requirements in Different Fields

## Mathematics
<div style="width: 50%;margin-right:20px;float:right;">
<img src="./Figures/MathReproducibility.jpg" alt="image from PhD Comics" style="width: 100%;"/>
</div>
<div style="width: 40%;margin-right:20px;float:left;margin-top:140px;">
<ul style="font-size:1.25em;">
<li> Clear theorems </li>
<li style="margin-top:50px;"> Complete proofs </li>
</ul>

</div>

## Computational Sciences
<div style="width: 50%;margin-right:20px;float:right;">
<img src="./Figures/ReproducibilityComputational.png" alt="" style="width: 100%;"/>
</div>
<div style="width: 40%;margin-right:20px;float:left;margin-top:140px;">
<ul style="font-size:1.25em;">
<li> Clear theorems </li>
<li style="margin-top:50px;"> Complete proofs </li>
<li style="margin-top:50px;"> Algorithm </li>
<li style="margin-top:50px;"> Simulation code </li>
</ul>

</div>



## Lab Sciences
<div style="width: 50%;margin-right:20px;float:right;">
<img src="./Figures/Bacteria.png" alt="" style="width: 100%;"/>
</div>
<div style="width: 40%;margin-right:20px;float:left;margin-top:140px;">
<ul style="font-size:1.25em;">
<li> Detailed method </li>
<li style="margin-top:50px;"> Lab Protocols </li>
<li style="margin-top:50px;"> Data </li>
<li style="margin-top:50px;"> Code </li>
</ul>

</div>


# What is the Problem Then?

Research works seem to be "difficult" to reproduce!

### In an study done for *Nature* in 2012
<h3 style="font-size:2em;margin-top:75px;margin-bottom:100px;"> 
<span style="color:red;">47 out of 53 </span> medical research papers were irreproducible.
</h3>

<p style="font-size:0.5em; color:grey;">
\* Begley, C. G.; Ellis, L. M. (2012). "Drug development: Raise standards for preclinical cancer research". Nature 483 (7391): 531–533.
</p>



### In another study in 2009
<h3 style="font-size:2em;margin-top:75px;margin-bottom:100px;"> 
<span style="color:red;">10(+6) out of 18 </span> bioinformatic papers could not be reproduced completely

</h3>

<p style="font-size:0.5em; color:grey;">
\* Ioannidis JPA, Allison DB, Ball CA, et al. Repeatability of published microarray gene expression analyses. Nat Genet 2009;41(2):149–55
</p>



### In a study on the 134 papers of "IEEE Transactions on Image Processing" in 2004, it was found that
<h3 style="font-size:2em;margin-top:75px;margin-bottom:100px;"> 
<span style="color:red;">Less than 9%  </span> of papers share their code.
</h3>

<p style="font-size:0.5em; color:grey;">
\* Vandewalle, Patrick, Jelena Kovacevic, and Martin Vetterli. "Reproducible research in signal processing." Signal Processing Magazine, IEEE 26.3 (2009): 37-47.
</p>



## Possible Reasons

* Lack of space in papers

* It is difficult to combine text, code and figures

* <b style="color:#FBB917;">It takes time!</b>

# Why Should We Care Then?

## Why NOT?
<div style="width: 50%;margin-right:20px;float:right;">
<img src="./Figures/baby-scientist.jpg" alt="image from PhD Comics" style="width: 80%;"/>
</div>
<div style="width: 40%;margin-right:20px;float:left;margin-top:140px;">
<ul style="font-size:1.25em;">
<li> For the sake of science</li>
<li style="margin-top:50px;"> and our inner scientist </li>
</ul>

</div>

# Reproducibility Increases Visibility

### Multiple studies have shown that
<div style="clear:both;margin-top:50px;"> 

<ul style="font-size:1.15em;">
<li style="margin-top:20px;"> Papers with shared data were cited about <span style="color:red;">70%</span> more frequently [1].</li>
</ul>

<ul style="font-size:1.25em;">
<li style="margin-top:20px;"> Open access papers seems to be consitently cited more [2]. </li>

</ul>

<ul style="font-size:1.25em;">
<li style="margin-top:20px;"> Papers with code available are probably more cited than those without code [3]. </li>
</ul>

<p style="font-size:0.25em; color:grey;">
[1] Piwowar, H. a et al. Sharing detailed research data is associated with increased citation rate. PloS one. 2, (2007), 308.
<br>
[2] Antelman, Kristin. "Do open-access articles have a greater research impact?." College & research libraries 65.5 (2004): 372-382.
<br>
[3] Vandewalle, Patrick, Jelena Kovacevic, and Martin Vetterli. "Reproducible research in signal processing." Signal Processing Magazine, IEEE 26.3 (2009): 37-47.

</p>


### In a study in 2007 on microarray DNA research publications,
<h3 style="font-size:2em;margin-top:75px;margin-bottom:30px;"> 
Papers with shared data were cited about <span style="color:red;">70%</span> more frequently.
</h3>

* 85 papers were studies
* 41 shared their data

### Those 41 took 85% of the aggregate citations

<p style="font-size:0.5em; color:grey;margin-top:75px;"">
\* Piwowar, H. a et al. Sharing detailed research data is associated with increased citation rate. PloS one. 2, (2007), 308.
</p>


## Study 2: Effect of Open Access

### In a study on the 134 papers of "IEEE Transactions on Image Processing" in 2004, it was found that
<h3 style="font-size:2em;margin-top:25px;margin-bottom:40px;"> 
Open access papers seems to be consitently cited more.
</h3>

<img src="./Figures/OpenAccessCitation.png" alt="image from PhD Comics" style="width: 80%;"/>

<p style="font-size:0.5em; color:grey;">
\* Antelman, Kristin. "Do open-access articles have a greater research impact?." College & research libraries 65.5 (2004): 372-382.

</p>



## Study 3: Effect of Reproducibility

<h3 style="font-size:2em;margin-top:25px;margin-bottom:40px;"> 
Papers with code available are probably more cited than those without code.
</h3>

<img src="./Figures/CitationReproducibilityLCAV.png" alt="image from PhD Comics" style="width: 70%;"/>

<p style="font-size:0.5em; color:grey;">
\* Vandewalle, Patrick, Jelena Kovacevic, and Martin Vetterli. "Reproducible research in signal processing." Signal Processing Magazine, IEEE 26.3 (2009): 37-47.

</p>

# It Might Result in Fruitful Collaborations

### Collaboration in Very Hard Problems

<h3 style="margin-top:25px;margin-bottom:40px;"> 
A group of mathemtician collaboratively solved a very difficult problem in combinatorial geometry in a few weeks.
</h3>

<img src="./Figures/Polymath.jpg" alt="image from PhD Comics" style="width: 70%;"/>

<p style="font-size:0.5em; color:grey;">
\* Gowers, Timothy, and Michael Nielsen. "Massively collaborative mathematics." Nature 461.7266 (2009): 879-881.
</p>

### Collaboration in Very Large Problems

<div style="width:45%;float:left;">
<h4> CERN ATLAS Project </h4>
<img src="./Figures/CERNATLAS.jpeg" alt="image from PhD Comics" style="width: 100%;"/>
</div>

<div style="width:45%;float:right;">
<h4 style="margin-left:50px;"> Citizen Science </h4>
<img src="./Figures/SafeCAst.jpg" alt="image from PhD Comics" style="width: 60%;"/>
</div>

<p style="font-size:0.5em; color:grey;clear:both;">
\* http://atlas.web.cern.ch/Atlas/Collaboration/
<br>
\* http://safecast.org/
</p>

## Finally, you might soon have to!

### More and more journals require availability of data/code. 
<div style="margin-top:40px;"> 
<div style="width:30%;float:left;margin-right:10px;">
<img src="./Figures/NPGLogo.JPG" alt="" style="width: 80%;"/>
</div>
<div style="width:30%;float:left;margin-right:10px;">
<img src="./Figures/PlosOneLogo.png" alt="" style="width: 80%;"/>
</div>
<div style="width:30%;float:left;">
<img src="./Figures/PNASLogo.jpg" alt="" style="width: 80%;"/>
</div>
</div>

<div style="clear:both;margin-top:50px;"> 
A recent study shows an increase of 
<br>
<ul style="font-size:1.25em;">
<li style="margin-top:20px;"> 16% in the number of data policies</li>
<li style="margin-top:20px;"> 30% increase in code policies </li>
</ul>


<p style="font-size:0.25em; color:grey;">
\* Vandewalle, Patrick, Jelena Kovacevic, and Martin Vetterli. "Reproducible research in signal processing." Signal Processing Magazine, IEEE 26.3 (2009): 37-47.

</p>


## How

### Again, depends on the field

<ul style="font-size:1.25em;">
<li style="margin-top:20px;"> All details in the paper</li>
<li style="margin-top:20px;"> Reproducible code </li>
<li style="margin-top:20px;"> Share data </li>
</ul>



## Reproducible Code

<ul style="font-size:1.25em;">
<li style="margin-top:20px;"> <span style="color:red;">Code</span>: Should reproduce the results in 1 click (ideally!)</li>
</ul>



<ul style="font-size:1.25em;">
<li style="margin-top:20px;"> With explanatory <span style="color:orange;">comments</span> </li>
</ul>



<ul style="font-size:1.25em;">
<li style="margin-top:20px;"> That can be <span style="color:green;">interacted with</span></li>
</ul>

# The IPython Notebook

An <span style="color:green;">interactive</span> <span style="color:red;">computational environment</span> with <span style="color:orange;">rich text comments</span>.

## Computational Environment

<img src="./Figures/IpythonNotebookComputation.png" alt="" style="width: 85%;"/>

## Rich Text Comments

<img src="./Figures/IpythonNotebookRichText.png" alt="" style="width: 85%;"/>

## Interactive Computation

<img src="./Figures/IpythonNotebookInteraction.png" alt="" style="width: 75%;"/>

<h1 style="color:#FBB917">So without further delay...</h1>

<h1 style="color:#1657FA;margin-left:150px;">Introduction to IPython Notebook's features</h1>