# Introduction to R

In this first in a number of tutorials, we'll cover the very basics of R. If you've programmed before you can skip much of this. But regardless of your background, we hope you'll find this and subsequent tutorials useful for learning R's many tools for graphing, statistical analysis, and data collection and management --- or what we collectively might call "data science." 


## Installing R

Everything we do in this course will be doing using the *free!* open-source programming language R. 

But the way we'll mostly *use* R is using a program called [RStudio](https://www.rstudio.com/). RStudio is a "helper" program that makes it easier to work with R (it is what is referred to as an "editor" or "integrated development environment" (IDE)). When you're working in RStudio, all the code you run is being run by R itself, but RStudio provides tools to make it easier to see what R is doing, to organize the code your write to run in R, to look at plots R generates, etc. In other words, you don't need RStudio to use R (some people use other editors, like VS Code), but you definitely need R to run RStudio, any code you write in RStudio will work anywhere R is available. 

For an installation tutorial and introduction to R functionality, please [**go watch this video**](https://www.youtube.com/watch?v=ulIv0NiVTs4).

## Code Examples On This Site

On this website, you'll find that code examples don't look quite like they do when you're typing in R yourself. Instead, you'll see code appear in grey blocks with a number on the left side. Below these blocks, you will see the output R has returned after running that code. For example, here's that same `"Hello!"` line in the style used on this site:

In [1]:
"Hello!"

In addition, some code will include "comments". Comments are notes placed in someone's code to explain what's going on to other programmers. Comments always start with a `#` in R, which tells R that the text that follows is not something it should try and execute. Comments will always appear in italics and in a different color. 

In [2]:
# This is a comment. In the next line, I'll add 2 and 3.
2 + 3

## Basic Math in R

Now that we've learned how to pass commands to R, we can start asking R to do things for us. For example, R can do all the normal math operations you are familiar with:

In [3]:
# Addition
2 + 2

In [4]:
# Multiplication
2 * 3

In [5]:
# Division
4 / 2

In [6]:
# And even exponentiation (e.g. 2 raised to the third power)
2 ^ 3

In [7]:
# R can also do logical comparisons
5 < 7

In [8]:
3 >= 5

## Variables

Congratulations! You now know how to do math in R!

If we want to do more than use R as a calculator, though, we need to be able to not only do math problems, but also store the results of our calculations so we can reuse them in the future, or combine the results of lots of different calculations. In the examples above, R did the math we asked it to do, and printed out the results, but it didn't keep a copy of those results anywhere.

In order to store the *value* of our calculations, we need to *assign them to a variable*. OK, but what does that actually mean?

If you’d done any programming before, you probably have an implicit notion of a variable and assignment, but in this course we’d like to provide you with an explicit metaphor we’ll keep coming back to: **a variable is a box that holds a value**. So when we assign the value of 6 to the variable `a`, we’re really saying "put `6` in the box `a`."

To illustrate, let's walk through a few lines of code and illustrate what's going in pictures on the right. 

Here, in the first line, we're assigning the value of `6` to `a`, so as we see R is putting a `6` in the box labelled `a`. 


![reading_assignment_1](images/reading_assignment_1.png)

Then R does the same thing with `b`, putting the assigned value of `7` in a box named `b`. 

![reading_assignment_2](images/reading_assignment_2.png)

Now things get a little more interesting. Instead of a single value being assigned to a variable, we now have an expression. The way R handles this is by first evaluating the expression on the right side of the assignment operator (`<-`), *then* assigning it to the variable. 

Note there's something a little weird about this: even though most of us are used to reading from left-to-right, does the opposite, evaluating the expression on the right *first*, *then* assigning it to the variable on the left. 

![reading_assignment_3](images/reading_assignment_3.png)

Finally, we see an example of re-assignment -- here a new value is assigned to `a` (`2`), and that value *overwrites* the old value that was in the box `a`. The old value (`6`) has been lost by this re-assignment, it's job done. 

![reading_assignment_4](images/reading_assignment_4.png)

And that's how you can read code by thinking of variables as boxes that store data!

Now, if you've done any programming before, this idea of variables as boxes may seem obvious or like an unnecessary metaphor for something you always understood implicitly. But as we will see later in this course when we start working with more complicated structures -- like function -- we will see that this metaphor is extremely powerful.

Indeed, this idea of variables as boxes is so fundamental to programming that it's even embodied in RStudio. If you look at the top right corner of RStudio, you'll find a tab called "Environment". That tab displays all the variables defined in the current session of R. And if you run the code we just worked through, you'll see that it displays the values of `a`, `b`, and `c` basically the same way we wrote them above: as values in boxes with the variable name next to them as a label! (Don't worry if you don't have that `p` variable in your RStudio -- that's just a variable my R session loads automatically on startup.):

![rstudio_variables](images/rstudio_variables.png)

<div class="alert alert-info">
    
<b>NOTE:</b> There are actually <b>two</b> ways to assign a value to a variable name in R.


The first, which we will use in these tutorials, is to use a less than sign and a dash to create an arrow (`<-`): 

`x <- 42`

and the second is to use a single equal sign (`=`):

`x = 42` 

When R was first created it only supported the `<-` operator, but most other languages use `=` is the assignment operator, so eventually R decided to support both, so you can use them interchangeably. 

However, be aware that most style guides for R still suggested that the `<-` operator is the preferred choice (though I will admit that I often just use `=` in my own code... ;)).

</div>

## Checking Variable Values

One way to see the value of variables is to just look at "Environment" tab in RStudio, as shown above. But while this is a nice feature of RStudio -- and sufficient in many cases -- it's good to know how to get the values of variables directly in the R console. 

There are two ways to see the value of a variable in R:

1. Type in the variable name (or any other expression) without assigning it to a variable. 
2. Use the `print()` statement. 

The first is what we were actually doing at the top of this document -- if you run a line of code and don't include an assignment to a variable, then R will evaluate the given expression and print out the value in the console. This is true whether the expression is just the name of a variable (`a`, in which case R just prints the value that's been assigned to `a`), or something more complicated (`(a * 123) / 42`): if you don't assign it, the evaluated expression will just get printed out. e.g.:

In [9]:
z <- 42
z

The second option is to use the `print()` function in R. Basically, any expression you put between the two parenthesis of the print function will get printed out. For simple scripts like what we're writing now, there's rarely a reason to use `print()` instead of typing out the expression you wanted evaluated and printed, but it'll become really useful later when we write some fancier code!

In [10]:
print(z)

[1] 42


## Order of Evaluation

As noted above, the way that R evaluates a line of code is by (1) evaluating the expression on the right-hand side of the assignment operator, *then* inserting that value into the box associated with the variable on the left-hand side of the assignment operator. 

Given that, can you predict what the value of `a` will be after the following code is run?


In [11]:
a <- 4
a <- a + 1

If you said `5`, you'd be right! 

When the expression on the right-hand side is more complex, R will evaluate it using the same order of operations you learned in high school math ([PEMDAS](https://www.mathsisfun.com/operation-order-pemdas.html)!). So R will parse the following code: 

In [12]:
z <- 5 + 2 * (20 - 2)

By evaluating things in parentheses, then any multiplication/division operations, and then any addition, subtraction operations:
```R
  5 + 2 * (20 - 2)
= 5 + 2 * 18 # Parenthesis evaluated
= 5 + 36     # Multiplication evaluated
= 41         # Addition evaluated. 
```

**NOT** left-to-right: 

```R
  5 + 2 * (20 - 2)
= 7 * (20 - 2)
= 7 * 18
= 126
```


## Functions

The last thing to talk about in this reading is perhaps the most important construct in programming: the function!

Up until now, we've mostly been doing little mathematical manipulations that you can imagine doing on a calculator. But often times we want to be able to do complicated manipulations of our data without writing out all the steps ourselves, and that's where functions come in.

The basic idea of a function is that it is an object that accepts data as an input, does something with that data, and then returns something to you. For example, `sqrt` is a function that calculates the square root of a number. It takes a number as an input (i.e. you put a number in between the two parentheses that follow the function name) and returns the square root of that number:

In [14]:
sqrt(9)

How did it just calculate the square root of 9? Here's the magic of functions: you don't have to know! 

My intro to computer science teacher once said that if took remembered only one thing from his class, we should remember **"a function is a toaster."** OK, stay with me: by that, he meant that just like a toaster, it's not your job to know how a function works; you're only responsible for knowing what you're allowed to put in, and for accepting what it returns. If one day my toaster suddenly changed from heating my bread with electricity to channeling cosmic rays, it wouldn't really change how I interact with the toaster: I put bread in, I get toast back.

Now this idea of a toaster taking something in and returning something to you (the "return value") is relevant to our discussion of variables and assignment. Just like any other expression, if you ask R to evaluate a line of code with a function in it, and you don't assign that value back to a variable, that return value gets printed out and is then forgotten. So if you want to keep the result of a function, remember you always have to assign it to a variable. e.g.

In [19]:
# Let's get the value of 7!
# (e.g. 7 x 6 x 5 x 4 x 3 x 2 x 1, also known as 7 factorial)

a <- 7
a_factorial <- factorial(a)

In [20]:
a_factorial

It's also important to understand that when functions are nested, just evaluates them from the inside out. For example, can you figure out what this code would return (`abs` returns the absolute value of the input)? 

```r
a <- -9
b <- sqrt(abs(a))
```

?

Let's run it and find out:

In [21]:
a <- -9
b <- sqrt(abs(a))
b

Because R evaluates the expression `sqrt(abs(a))` by starting at the innermost level and working it's way out, effectively saying: "well, `a` is `-9`, so `abs(a)` is `abs(-9)`. And that's `9`. So then `sqrt(9)` is `3`."

Anyway, we'll talk a LOT more about functions later on in the course, including learning about all the amazing functionality that comes built into R in the form of functions it provides. And by the end of the course you'll even be able to write your own functions! 

But for now it's enough to know that they're just objects that take inputs, do something, and return an output to you to use (and you need to assign that return value to a variable if you want it remembered!).

And that's it! You now know all the basics of how assignment works in R. Next up, we'll talk about types of data in R.

## Where's the Social Science!

Don't worry, I promise we'll get to good, applied social science examples very soon! 

One goal of this course is to ensure that you have a solid understanding of programming principles, rather than just teach you how to chain together a handful of functions to do some basic data manipulations. No discipline is changing faster than data science and computational social science, so if you just learn a handful of specific commands, not only will you not be prepared to transfer your skills to new tools and applications, but you may also struggle if the commands you know change, as they surely will. By contrast, by learning these fundamental principles about *how* programming languages work, you'll develop a more robust, less "brittle" skill set that will provide you with a solid foundation for learning new tools over the course of your entire social science career.

## Exercises