# Pre-Class: Reproducible Computational Research

## Introduction ##

The concept of **reproducible computational research** is that any computational results you generate, such as numbers, figures, tables, etc. can be re-generated with minimal effort by other people and yourself. The purpose of this lecture is to show you why reproducible is necessary in computational research and how to perform reproducible research. 

There are two papers talking about this idea in computational research. You can download them from Sagemathcloud.
1. Reproducible Research in Computational Science by Roger D. Peng
2. Ten Simple Rules for Reproducible Computational Research by Sandve et. al.


The following two sections will introduce a great tool for reproducible research, which is the markdown file. You will learn R markdown for R and IPython Notebook for Python.  Since we have used IPython Notebook for previous in-class and homework assignments, you probably know IPython Notebook already. Therefore, we will spend most of the time learning R markdown.

## Part1: R markdown

One important idea in reproducible computational research is to use **literate programming** with a **markdown** file. Don't be intimidated by these terms if you never heard them before. Simply put, literate programming is putting your code with your results and annotations together in just one file. In R, you can generate this type of file with RMarkdown (or knitr) package. In Python, you can do it with IPython Notebook. Most of the course documents for this course are written in IPython Nootbook. In this lecture, we will show you how to generate this type of file in R. But in order to do that, you need to install RStudio in your computer. 

RStudio is an IDE (integrated development environment) for R programming. RStudio integrates many useful tools and is easy to use. We will use RSudio to generate markdown file. 

** 0. Please download and install RStudio from [here](https://www.rstudio.com/products/rstudio/download/).   **

After you open RStudio, you will see four panels. The upper left panel is the "source" panel, where you write you R code. The lower left panel is the "console" panel, where the output of R code will show here. In the following sections, you will be using these two panels in most of the time. The upper right panel is the "environment" panel, where all variables you create will show here.  The lower right panel is the "file" panel, where you can create, delete or move your files.  By the way, you can arrange the panels as you like in "Tools"->"Global Options"->"Pane Layout". You can maximize or minimize each panel by clicking the maximize or minimize button on the top right of each panel.

Here is a video tutorial about [R Markdown with RStudio](https://youtu.be/DNS7i2m4sB0). You can watch this video and get familar with RStudio and R markdown.

** 1. Create an R markdown file **

Open your RStudio, click "File"->"New File"->"R Markdown". A window will pop out. You can type the title and the author of the document. Click "OK" to continue. RStudio will create a template R markdown file like the following:

**2. Generate an R markdown file**

Just use the template for now and don't change anything. Click the "Knit HTML" button. If RSudio asks you to save the file, rename the file name from "Untitled.Rmd" to anything you like, for example "example.Rmd", and save it. You will see a new window poping out. This is the HTML file generated from the R markdown file. You can see the HTML has code along with the output, such as numbers and figures as well as text annotations. After you generate this HTML file, you can distribute it to other people, and they will know excatly what code was used to generate the tables and figures.

**3. Understand an R markdown file**

Let's go back and take a look at this R markdown file (the template generated by the RStudio). The first part of the file is:

The part between --- is the header of the document. Usually you will specify this when you create the document and there is no need to change it.  

Next, let's focus on this chunk:

R markdown file uses a specific syntax to distinguish ordinary text and R code, which is called **code chunk**. The code chunk starts with \`\`\`{r}  and ends with \`\`\`. Anything inside the code chunk will be considered R code and will be executed when you run the file. Therefore, summary(cars) inside the code chunk will be excuted as R code by RStudio. You can click "Code"->"Insert Chunk" to create a new code chunk and put new R code in the chunk. Anything outside the code chunk will be considered ordinary text. RStudio will not excute them as R code. You can write anything you want outside the code chunk, for example, some annotations about the code or explanations about the results.

Next, take a look at following chunk. It is different from the previous one.

Did you notice the echo=FALSE? Anything inside the {} is the chunk parameter, which instructs RStudio how to run the code when generate the R markdown file. Here echo=FALSE will prevent the R code to be printed. You will understand this when you see the output R markdown file.

Let's focus on the text outside the code chunk. For example, the following markdown text:

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Notice the \*\*knit\*\* in the above markdown text, it tells RStudio to generate the bold text. Take a look at the HTML file you just generated, is it the word "knit" bold?  You can also generate headers and list items with simple syntax. If you are interested in how to generate them, click the question mark to the left of the "Knit HTML" buttom. You will see a help page with the information about the markdown syntax.

## Part2: IPython Notebook

Similar to R markdown, we can also write markdown files in python with IPython Notebook. The IPython Notebook can be created on Sagemathcloud by clicking "New"->"Jupyter Notebook". It is called Jupyter notebook because the IPython developers updated the notebook to support other programming languages besides Python and gave it a new name. For Python, IPython notebook and Jupyter notebook are essentially the same and we will use these two terms interchangeably.

In IPython notebook, we write code or markdown text in the space called "cells". You can create a cell by clicking the "+" button. You need to specify the cell format by selecting "Code"  or "Markdown" from the "Cell Type" button. You can write regular markdown text in the "Markdown" cell and python code in "Code" cell. Also, you can run each cell by choosing "Cell" -> "Run".

Here is a short video tutorial introducing [IPython Notebook](https://youtu.be/H6dLGQw9yFQ). You can find this video and some other materials on the [IPython Notebook website](http://ipython.org/notebook.html).

**Other resources**

There are serveral great courses in Coursera teaching reproducible research. For those who are interested in this topic, you can go to [this course](https://www.coursera.org/course/repdata)