# Introduction to R & Jupyter Notebooks

Matthew D. Turner, PhD  
Georgia State University

Some rights reserved: [cc by-nc-sa](https://creativecommons.org/licenses/by-nc-sa/4.0/) See bottom of document for details.

This workshop is intended as a brief introduction to:

+ The R Language
+ The Jupyter Scientific Notebook System

The emphasis is on the R language, but as the class is delivered in Jupyter notebooks, we will cover the basics of this system, as well. Both R and Jupyter are systems that are too big to be covered in one class, so this is really just enough to get you started with either of them. For more information on Jupyter see the [Jupyter Project website](http://jupyter.org/). For more on R, see the [R Project website](https://www.r-project.org/).

***

## Section 1: Jupyter

We will start with Jupyter so that you can work with it today.

You are currently looking at a **Jupyter Notebook for R**. This is an easy to use system for working with scientific computing. Many of you will prefer to use [RStudio](https://www.rstudio.com/products/rstudio/) on your own computers. However, when working in a networked environment, Jupyter is better for sharing. (There is a web-based version of RStudio, however it is not an affordable option for GSU.) If you would like to use Jupyter on your own computer it is _relatively_ easy to set up. This will not be discussed today, please contact the workshop leader for more information.

It is important to note that R is a computer language, _so while interacting with R notebooks in Jupyter will feel different from the workflow in RStudio, the **commands are identical**_. For more information on using Jupyter notebooks for research and teaching, see [HERE](https://jupyter.brynmawr.edu/services/public/dblank/Jupyter%20Notebook%20Users%20Manual.ipynb) (note that this page is a little out of date but still very useful).

### Cells (The essential feature of Jupyter)

Jupyter is organized into **cells** which are little blocks of code or formatted text. The output of cells is either statistical results or graphics.

This text that you are reading is a **text** cell. Below this is a **code** cell that is empty. Let's try using it:

+ Double-click on the cell below this
+ Type `2 + 2` (or some other numbers of interest)
+ Then press shift-enter (shift and the enter or return key on your keyboard; **remember this, you use it constantly!**)

Note a couple of things:

+ The result is printed just below the cell
+ The cell gets a **number** showing the order in which it was run (in this case 1)

If you were to click on the cell again, you could change your code and then press shift-enter to run it again. The results will update to show the new result and the number of the cell will increment to the next number. Try it!

### Text and Code Together in Jupyter

Another great feature of Jupyter notebooks is that they allow the mixing of text, data, and computer code. Everything in this notebook, including **this cell's text**, is editable. If you would like to see the raw form of this typeset text, just _double_ click inside of this cell. The typesetting is done in [markdown](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) a **simple code for marking up text** with things like: headings, web links, lists, computer code, and so on. It also supports typesetting math in LaTeX (if you are familiar with that). 

For example: writing $C = 2 \pi r$ or $e^x=\sum_{i=0}^\infty \frac{1}{i!}x^i$ is fairly easy.

Cells are executed (code is run, text is formatted) when you press shift-enter while within the cell. Just click on the cell to make it the active cell (or double-click to make it active and editable if it is a text cell) and that will be the cell that gets executed when you press shift-enter.

### Graphics

Scientific graphics are included in Jupyter R notebooks by running the required code in R to make the figures. They will simply appear as output for the code. Jupyter may also include images or other figures by inclusion as you would in a normal webpage:

<img width=33% src="./lolcat_orly.jpg" />

## Section 2: Elementary R

Let's cover the most basic parts of R starting with comments. Comments are for human consumption and are ways to remind yourself and others why you did something. You should use them liberally &mdash; you will simply not remember what you were doing today six months from now. So make a note to tell yourself!

In [9]:
# Comments are marked with a # (hashtag/pound sign/octothorpe/whatever) and R
# ignores anything that comes after them.

# Here is the circumference of a circle with radius 3...

2*pi*3 # Note how this text does nothing other than delight you!

### Basic Operations

R is a language for doing math and statistics. Any simple operations you know from these fields are there to use. More advanced ones may need to be loaded from external libraries. We will discuss this later on.

R uses the _usual_ way of writing math into a computer or calculator. At least for basic arithmetic.

For now, let's look at some example operations. In the cell below we will do some arithmetic; the `print` function is there to show us the results:

In [2]:
2+2
2*3
10^4

In [3]:
# Sometimes we want to force R to print the results. For this we can use the print statement
# In Jupyter notebooks this is USUALLY optional, see cell above, but not always. If output
# is missing, try a print() to fix it!

print(2+2)
print(2*3)
print(10^4)

[1] 4
[1] 6
[1] 10000


Note that the previous two code cells do the exact same thing and both show all of their output, just with slight differences in formatting. The `print` function makes Jupyter show you the _raw_ R output, formatted as it usually is when using R interactively from the command line.

### Lists and List Operations
In statistics we usually operate on large amounts of data. In R we will store these in **lists** and **data frames**. The former are pretty much self explanatory, while the latter are essentially simple spreadsheets organized in the usual way for statistics and data analysis. Columns of data frames can be treated like lists.

In [4]:
x = c(1,2,3)
y = c(2,2,2)

Operations on lists of the same length are done **elementwise**, that is, the operation is done to each pair of numbers:

In [5]:
x+y
x-y
x*y
x/y

Make sure you understand what the previous cell did!


Operations between a list and a **single number** apply the operation to each element of the list and the single number. It is as though the single element is a list of the same number repeated. This is one of many R shorthand notations to save you work. Compare the results below:

In [6]:
x + 1          # Shorthand
y - 2

x + c(1,1,1)   # Written out as full lists
y - c(2,2,2)

In [7]:
x  = c(1,2,3,4,5,6,7,8,9)  # Give a list of numbers a name
y <- c(2,1,3,4,6,5,8,8,7)  # Same thing! <- and = can both be used for assignment

### Mathematical Functions Made Out of Arithmetic

We can create mathematical functions using the usual translation of math into _typed_ math. For instance, the mean is defined as: $$\bar{X} = \frac{1}{N}\sum_{i = 1}^{N} X_i$$ which is just the sum of the data points which is then divided by the number of points. It could also be written (in somewhat reduced notation) as: $\frac{\sum X_i}{N}$ which is how some of you might have seen it elsewhere.

$$\bar{X} = \frac{\sum_{i = 1}^{N} X_i}{N} = \frac{\mathrm{sum}}{\mathrm{number\ of\ data\ points}}$$

For a list of numbers `X`, the sum is given by the `sum` function, and division is shown above. To determine `N` you need to find the _length_ of the list, and there is a function called `length` that does just that. In the cell below, figure out the mean of `X`:

In [10]:
# Write the function for the mean in this cell
# Use X as the list of data (notice the CAPITAL letter here!)
# Store the results in the variable xbar
# Print the result

X = c(1,2,3,4,5,6,7,8,9)


In [11]:
x  = c(1,2,3,4,5,6,7,8,9)  # Give a list of numbers a name
y <- c(2,1,3,4,6,5,8,8,7)  # Same thing! <- and = can both be used for assignment

plot(x, y, pch = 20)
cor(x, y)

Version 1.0  
2018.06.06  

To contact the author, please email [mturner46@gsu.edu](mailto:mturner46@gsu.edu). Please contact me with recommendations for improvement or if you find any errors. This work may be adapted for any purpose within the bounds of the license.

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.