# Introduction

In this first in a number of tutorials, we'll cover the very basics of R. If you've programmed before you can skip much of this. But regardless of your background, we hope you'll find this and subsequent tutorials useful for learning R's many tools for graphing, statistical analysis, and data collection and management --- or what we collectively might call "data science." 


## Installing R

First, download R *free* [here](https://cloud.r-project.org/)!

After you've downloaded R itself, you will probably want to also download a program called [RStudio](https://www.rstudio.com/) (installation instructions [here](https://www.rstudio.com/products/rstudio/download/); note that you need to already have R to use RStudio). RStudio is a little "helper" program that makes it a little easier to write code for R (it is what is referred to as an "integrated development environment" (IDE)). There are also a lot of other IDEs, but RStudio is the easiest and one of the most popular. 

## Interacting with R

R has a text-based interface program, which means you can't ask it to do things by clicking buttons or using drop-down menus. Instead, it has a command prompt where you type messages to R. The place you can type has a little right arrow symbol (`>`). Just type your message after that right arrow and hit return. In screenshot below, for example, I'm about to ask R to print the phrase "Hello!" (though I haven't hit return, so it hasn't done anything yet):

<img src="./images/R_Prompt.png">

After you type a command to R, hit return and R will try and do what you've asked of it. If I hit return after `print("Hello!")`, for example, R will print out the phrase "Hello!":

<img src="./images/R_Prompt_executed.png">

You will notice that after it printed out "Hello!", a new right arrow appeared. That's R's way of saying it's done doing the last thing you asked it to do. 

### Code Examples On This Site

On this website, you'll find that code examples don't look quite like they do when you're typing in R yourself. Instead, you'll see code appear in grey blocks with a number on the left side. Below these blocks, you will see the output R has returned after running that code. For example, here's that same `print("Hello!")` command in the style used on this site:

In [5]:
print("Hello!")

[1] "Hello!"


## Basic Math in R

Now that we've learned how to pass commands to R, we can start asking R to do things for us. For example, R can do all the normal math operations you are familiar with:

Addition:

In [2]:
2 + 2

Multiplication:

In [4]:
2 * 3

Division:

In [6]:
4 / 2

And even exponentiation (e.g. $2^3$)

In [8]:
2 ^ 3

## Variables

Congratulations! You now know how to do math in R!

If we want to do more than use R as a calculator, though, we need to be able to not only do math problems, but also store the results of our calculations so we can reuse them in the future, or combine the results of lots of different calculations. In the examples above, R did the math we asked it to do, and printed out the results, but it didn't keep a copy of those results anywhere.

In order to store the *value* of our calculations, we need to assign them to a *variable*. Once we've assigned a value to a variable, we can then recall that value any time by invoking the variable. For example, let's calculate the weight of a [velociraptor](http://images.dinosaurpictures.org/1038-Velociraptor_303a.jpg) in pounds. 

First, let's store the weight of a velociraptor in kilograms ([113 kg](http://www.dinodatabase.com/dinorcds.asp) in a variable called `velociraptor_weight_in_kg`:


In [24]:
velociraptor_weight_in_kg = 113

Basically, all I've done is given a name (the variable name) to a value (in this case, 113). Now any time I use that variable name, R knows that that variable name is just a stand-in for 113. For example, if you just type a variable name in R, it will tell you the value associated with that variable:

In [14]:
velociraptor_weight_in_kg

And now we can do calculations with that variable. There are 2.2 pounds in a kilogram, so to get a velociraptor's weight in pounds, we can just multiple our weight in kg variable by the conversion factor:

In [16]:
velociraptor_weight_in_kg * 2.2

We can also do math with multiple variables, because really anywhere you see a variable, you can just imagine that the value associated with the variable is there instead. 

For example, suppose I have two pet dinosaurs, and my partner has three dinosaurs. If we got married, how many dinosaurs would we have? Let's do this super-complicated math using variables.

In [30]:
nick_pet_dinosaurs = 2
adriane_pet_dinosaurs = 3

nick_pet_dinosaurs + adriane_pet_dinosaurs

And if we wanted to, we could also store that new value in a new variable called `family_pet_dinosaurs`.

In [31]:
family_pet_dinosaurs = nick_pet_dinosaurs + adriane_pet_dinosaurs
family_pet_dinosaurs

One important thing about variables is that you can change the value associated with a variable. Suppose that while walking to work, I stumbled upon a truely adorable [Nigersaurus](https://en.wikipedia.org/wiki/Nigersaurus#/media/File:Nigersaurus_model_aus.jpg) and couldn't resist adopting her. If that happened, we'd need to update the number of pet's I have by 1!  

In [32]:
nick_pet_dinosaurs = nick_pet_dinosaurs + 1

Now if we ask R for the value of `nick_pet_dinosaurs`, we'll see it has increased by 1:

In [33]:
nick_pet_dinosaurs

It's worth pausing to note something a little weird about the order in which things happen here: when we assign something to a variable by writing `variable_name = some_expression`, R evaluates the expression *first*, *then* assigns the results of that expression to the variable on the left hand side. Given how we normally read left-to-right, this can be a little confusing. So what R did here was *first* calculate `nick_pet_dinosaurs + 1` (which is the same as `2 + 1`), *then* assigned the value to the variable `nick_pet_dinsoaurs`, replacing the old value of 2. 

### Variable Exercises

OK, this is a great time to pause and try a few exercises for yourself. 

Let's suppose that you have a dinosaur zoo. In your zoo, you have two [T-Rexes](https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/Tyrannosaurus_Rex_Dinosaurierland_Ruegen_2009.jpg/240px-Tyrannosaurus_Rex_Dinosaurierland_Ruegen_2009.jpg), three [Unaysaurus](http://images.dinosaurpictures.org/unaysaurusJB_57a6.jpg), and five [Spinosaurus](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f4/Spinosaurus_-_Museu_Blau_-_2016_-_01.jpg/1024px-Spinosaurus_-_Museu_Blau_-_2016_-_01.jpg)

1. Create variables for the number of each dino you have called `my_trexes`, `my_unas`, and `my_spinos`. 
2. Now use those variables to calculate how many total dinosaurs you have.
3. Oh no! One of your t-rexes got out and ate an Unaysaurus. Decrease the value of `my_unas` by one. 
4. Double oh no! Your T-Rexes were male and female, and they just had a baby! Increase you number T-Rexes by one!
5. Sadly, one of your Spinosauruses died of old age. :( Decreases your count of Spinosauruses by one. 
6. How many dinos do you have now? You've probably lost count of all these changes, but thankfully they're all stored in variables, so you can just add them all up!

**NOTE:** There are actually <b>two</b> ways to assign a value to a variable name in R. They work exactly the same, so you can use whichever you want. The first is with a single equal sign (`=`), and the other is with the two symbols that make an arrow (`<-`). So the following two commands are exactly the same:

In [34]:
x = 72
x <- 72

## Types of Data

Up till now, we've only been working with numbers, but R is actually equiped to work with a number of different *kinds* of data. In the course of this tutorial we'll introduce all of them, but there are really three main ones to be aware of:

- `numeric` and `integer`: The main data types for numbers. These two types are *slightly* different, but you can think of them as interchangable for now.
- `character`: Text data, like a person's name, or a quote from a book. Written with `"` before and after (or single quotes (`'`) before and after if you'd prefer.
- `logical`: Data that only takes on the values of true and false. Written `TRUE` and `FALSE`

If you're ever unsure of the type of a variable (or more precisely, of the type of the value associated with a variable), you can ask R with the `class()` function:


In [41]:
pi = 3.1416
class(pi)

In [42]:
mystery_novel = "T'was a dark and story night"
class(mystery_novel)

In [43]:
my_boolean = TRUE
class(my_boolean)

Does the output make sense to you? (I'm not displaying it here so that
you can try to make sense of it yourself first.)

## Data Types

So R 

```R
a <- 5
b <- "7"
a + b 

Error in a + b: non-numeric argument to binary operator
```

Not knowing that `b` is a character object (which often is much
less obvious than here) would be frustrating. Other classes include
<span class="fw">factor</span>, <span class="fw">matrix</span>,
<span class="fw">data.frame</span> and many more that we'll get to. 
</div>

5. **Overwriting objects**. You can overwrite objects without R
complaining (e.g., `b <- 1` above). This can be confusing if you don't
keep track of which objects you have overwritten, but in general this
ability is a good thing. R stores every object you create in
memory, and overwriting an object saves memory to the extent that R
doesn't need to hold an additional object, which may be consequential
when you work with large datasets.


## Packages 

The true power of R is that it's open-source. As such, anyone can
extend its core functionality through packages. This often results in
remarkable improvements to how we can approach complex data tasks. In
subsequent tutorials, I will make particular use of a set of R packages 
developed by [Hadley Wickham](http://had.co.nz/). 

To use a package you must: 

1. **Install it.** You only need to do this once. 
2. **Load it.** You need to re-load packages every time you open R. 


To install packages `plyr`, `dplyr`, and `tidyr`, which we'll use a lot,
run

```R
install.packages(c("plyr", "dplyr", "tidyr"), dep = T)
```

And to load these packages, use:

```R
require(plyr)
require(dplyr)
require(tidyr)
```

Or you can load many packages more compactly like so:

```R
pkgs <- c("plyr", "dplyr", "tidyr")
sapply(pkgs, require, character.only = T)
```

<div style="margin-top: 15px"> </div>

## Style

Writing good code requires a lot of practice. Good code executes
quickly and is non-repetitive (e.g., uses functions for tasks that
need to be executed several times). But most importantly for you right
now, good code is easy to understand --- both for yourself if you were
to come back to it and for someone else reading it.

To that end, comments are extremely useful. Comments can delineate
different blocks of code and, more importantly, clarify what a command
or set of commands is doing. Especially use comments if you're doing
something complex --- save them for routine tasks that most R
programmers will understand.

There are some other style guidelines you should attempt to follow.
Please take a look at
[Google's R Style Guide](https://google.github.io/styleguide/Rguide.xml).
In particular, try to observe the spacing and line lengths limits
outlined in that document. (Some other rules seem less important to
me: I personally like underscores for object and function names.)




## Trouble shooting 
 

Many, many times when coding you'll have an idea of what you want to
do but won't know how to do it in R. This happens even for
experienced coders. With the right strategies, you'll be able to solve
a majority of issues you run into yourself. Not having to ask someone
else every time you run into a problem will save you a lot of time.

When you get stuck, google is your friend. For example, if you want
to find the mean of a variable, try googling "how to find mean in R"
and there will be tons of explanations of how to do this.

R also has a help feature that can be called using the following
syntax: `?commandname`, where `commandname` is the name of the command
that you need help with. For example, `?mean` will bring up a help
dialog box with information about how to use R's `mean()` command.