Skip to content

Commit

Permalink
merged pull requests
Browse files Browse the repository at this point in the history
  • Loading branch information
ndphillips committed Aug 3, 2017
2 parents 0caf1d3 + 6aad4b5 commit 66b0a22
Show file tree
Hide file tree
Showing 17 changed files with 1,354 additions and 2,244 deletions.
8 changes: 4 additions & 4 deletions 04-basics.Rmd
Expand Up @@ -11,7 +11,7 @@ library(yarrr)
```


If you're like most people, you think of R as a statistics program. However, while R is definitely the coolest, most badass, pirate-y way to conduct statistics -- it's not really a program. Rather, it's a programming \textit{language} that was written by and for statisticians. To learn more about the history of R...just...you know...Google it.
If you're like most people, you think of R as a statistics program. However, while R is definitely the coolest, most badass, pirate-y way to conduct statistics -- it's not really a program. Rather, it's a programming *language* that was written by and for statisticians. To learn more about the history of R...just...you know...Google it.


```{r, fig.cap= "Ross Ihaka and Robert Gentlemen. You have these two pirates to thank for creating R! You might not think much of them now, but by the end of this book there's a good chance you'll be dressing up as one of them on Halloween.", fig.margin = TRUE, echo = FALSE, out.width = "50%", fig.align='center'}
Expand Down Expand Up @@ -45,7 +45,7 @@ In R, the command-line interpreter starts with the `>` symbol. This is called th
1+1
```

As you can see, R returned the (thankfully correct) value of 2. You'll notice that the console also returns the text [1]. This is just telling you you the index of the value next to it. Don't worry about this for now, it will make more sense later. As you can see, R can, thankfully, do basic calculations. In fact, at its heart, R is technically just a fancy calculator. But that's like saying Michael Jordan is *just* a fancy ball bouncer or Donald Trump is *just* a orange with a dead fox on his head. It (and they), are much more than that.
As you can see, R returned the (thankfully correct) value of 2. You'll notice that the console also returns the text [1]. This is just telling you you the index of the value next to it. Don't worry about this for now, it will make more sense later. As you can see, R can, thankfully, do basic calculations. In fact, at its heart, R is technically just a fancy calculator. But that's like saying Michael Jordan is *just* a fancy ball bouncer or Donald Trump is *just* an orange with a dead fox on his head. It (and they), are much more than that.

## Writing R scripts in an editor

Expand Down Expand Up @@ -120,7 +120,7 @@ names(movies)
# What percent of movies are sequels?
mean(movies$sequel, na.rm = T)
# How much did Pirate's of the Caribbean: On Strager Tides make?
# How much did Pirate's of the Caribbean: On Stranger Tides make?
movies$revenue.all[movies$name == 'Pirates of the Caribbean: On Stranger Tides']
```

Expand Down Expand Up @@ -241,7 +241,7 @@ To create new objects in R, you need to do *object assignment*. Object assignmen

To do an assignment, we use the almighty `<-` operator called *assign* To assign something to a new object (or to change an existing object), use the notation `object <- ...`}, where `object` is the new (or updated) object, and `...` is whatever you want to store in `object`. Let's start by creating a very simple object called `a` and assigning the value of 100 to it:

Good object names strike a balance between being easy to type (i.e.; short names) and interpret. If you have several datasets, it's probably not a good idea to name them `a`, `b`, `c` because you'll forget which is which. However, using long names like `March2015Group1OnlyFemales` will give you carpel tunnel syndrome.
Good object names strike a balance between being easy to type (i.e.; short names) and interpret. If you have several datasets, it's probably not a good idea to name them `a`, `b`, `c` because you'll forget which is which. However, using long names like `March2015Group1OnlyFemales` will give you carpal tunnel syndrome.


```{r}
Expand Down
11 changes: 6 additions & 5 deletions 05-scalersvectors.Rmd
Expand Up @@ -4,7 +4,8 @@ output:
html_document: default
word_document: default
---
# Scalers and vectors {#scalersvectors}

# Scalars and vectors {#scalersvectors}

```{r, echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE)
Expand Down Expand Up @@ -129,7 +130,7 @@ knitr::include_graphics(c("images/magrittepipe.jpg"))


```{r}
char.vec <- c("Leci", "nest", "pas", "une", "pipe")
char.vec <- c("Ceci", "nest", "pas", "une", "pipe")
char.vec
```

Expand Down Expand Up @@ -164,7 +165,7 @@ Here are some examples of the `a:b` function in action. As you'll see, you can g
| `length.out`| The desired length of the final sequence (only use if you don't specify `by`)|


The `seq()` function is a more flexible version of `a:b`. Like `a:b`, `seq()` allows you to create a sequence from a starting number to an ending number. However, `seq()}`, has additional arguments that allow you to specify either the size of the steps between numbers, or the total length of the sequence:
The `seq()` function is a more flexible version of `a:b`. Like `a:b`, `seq()` allows you to create a sequence from a starting number to an ending number. However, `seq()`, has additional arguments that allow you to specify either the size of the steps between numbers, or the total length of the sequence:


The `seq()` function has two new arguments `by` and `length.out`. If you use the `by` argument, the sequence will be in steps of the input to the `by` argument:
Expand Down Expand Up @@ -360,7 +361,7 @@ The output of the `sample()` function above is a vector of 10 strings indicating

In the next section, we'll cover how to generate random data from specified *probability distributions*. What is a probability distribution? Well, it's simply an equation -- also called a likelihood function -- that indicates how likely certain numerical values are to be drawn.

We can use probability distributions to represent different types of data. For example, imagine you need to hire a new group of pirates for your crew. You have the option of hiring people form one of two different pirate training colleges that produce pirates of varying quality. One college "Pirate Training Unlimited" might tend to pirates that are generally ok - never great but never terrible. While another college "Unlimited Pirate Training" might produce pirates with a wide variety of quality, from very low to very high. In Figure \@ref(fig:piratecollege) I plotted 5 example pirates from each college, where each pirate is shown as a ball with a number written on it. As you can see, pirates from PTU all tend to be clustered between 40 and 60 (not terrible but not great), while pirates from UPT are all over the map, from 0 to 100. We can use probability distributions (in this case, the uniform distribution) to mathematically define how likely any possible value is to be drawn at random from a distribution. We could describe Pirate Training Unlimited with a uniform distribution with a small range, and Unlimited Pirate Training with a second uniform distribution with a wide range.
We can use probability distributions to represent different types of data. For example, imagine you need to hire a new group of pirates for your crew. You have the option of hiring people from one of two different pirate training colleges that produce pirates of varying quality. One college "Pirate Training Unlimited" might tend to pirates that are generally ok - never great but never terrible. While another college "Unlimited Pirate Training" might produce pirates with a wide variety of quality, from very low to very high. In Figure \@ref(fig:piratecollege) I plotted 5 example pirates from each college, where each pirate is shown as a ball with a number written on it. As you can see, pirates from PTU all tend to be clustered between 40 and 60 (not terrible but not great), while pirates from UPT are all over the map, from 0 to 100. We can use probability distributions (in this case, the uniform distribution) to mathematically define how likely any possible value is to be drawn at random from a distribution. We could describe Pirate Training Unlimited with a uniform distribution with a small range, and Unlimited Pirate Training with a second uniform distribution with a wide range.


```{r piratecollege, fig.cap = "Sampling 5 potential pirates from two different pirate colleges. Pirate Training Unlimited (PTU) consistently produces average pirates (with scores between 40 and 60), while Unlimited Pirate Training (UPT), produces a wide range of pirates from 0 to 100.", echo = FALSE}
Expand Down Expand Up @@ -477,7 +478,7 @@ legend(x = -2, y = 1.2,



Next, let's move on to the Uniform distribution. The Uniform distribution gives equal probability to all values between its minimum and maximum values. In other words, everything between its lower and upper bounds are equally likely to occur. To generate samples from a uniform distribution,use the function `runif()`, the function has 3 arguments:
Next, let's move on to the Uniform distribution. The Uniform distribution gives equal probability to all values between its minimum and maximum values. In other words, everything between its lower and upper bounds are equally likely to occur. To generate samples from a uniform distribution, use the function `runif()`, the function has 3 arguments:


| Argument| Definition|
Expand Down
4 changes: 2 additions & 2 deletions 06-vectorfunctions.Rmd
Expand Up @@ -251,7 +251,7 @@ Yep, sure enough our new sample y (containing 100,000 values) has a sample mean

## Counting statistics

Next, we'll move on to common counting functions for vectors with discrete or non-numeric data. Discrete data are those like gender, occupation, and monkey farts, that only allow for a finite (or at least, plausibly finite) set of responses. Common functions for discrete vectors are in Table \@ref(tab:discretevectorfunctiontable). Each of these vectors takes a vector as an argument -- however, unlike the previous functions we looked at, the used as arguments to these functions can be either numeric or character.
Next, we'll move on to common counting functions for vectors with discrete or non-numeric data. Discrete data are those like gender, occupation, and monkey farts, that only allow for a finite (or at least, plausibly finite) set of responses. Common functions for discrete vectors are in Table \@ref(tab:discretevectorfunctiontable). Each of these vectors takes a vector as an argument -- however, unlike the previous functions we looked at, the arguments to these functions can be either numeric or character.


| Function| Description | Example|Result |
Expand Down Expand Up @@ -435,5 +435,5 @@ How much treasure did Renata find on average when she was sober? What about when

3. Using Renata's data again, create a new vector called `difference` that shows how much more treasure Renata found when she was drunk and when she was not. What was the mean, median, and standard deviation of the difference?

4. There's an old parable that goes something like this. A man does some work for a king and needs to be paid. Because the man loves rice (who doesn't?!), the man offers the king two different ways that he can be paid. *You can either pay me 100 kilograms of rice, or, you can pay me as follows: get a chessboard and put one grain of rice in the top left square. Then put 2 grains of rice on the next square, followed by 4 grains on the next, 8 grains on the next...and so on, where the amount of rice doubles on each square, until you get to the last square. When you are finished, give me all the grains of rice that would (in theory), fit on the chessboard.* The king, sensing that the man was an idiot for making such a stupid offer, immediately accepts the second option. He summons a chessboard, and begins counting out grains of rice one by one... Assuming that there are 64 squares on a chessboard, calculate how many grains of rice the main will receive. If one grain of rice weights 1/64000 kilograms, how many kilograms of rice did he get? *Hint: If you have trouble coming up with the answer, imagine how many grains are on the first, second, third and fourth squares, then try to create the vector that shows the number of grains on each square. Once you come up with that vector, you can easily calculate the final answer with the `sum()` function.*
4. There's an old parable that goes something like this. A man does some work for a king and needs to be paid. Because the man loves rice (who doesn't?!), the man offers the king two different ways that he can be paid. *You can either pay me 100 kilograms of rice, or, you can pay me as follows: get a chessboard and put one grain of rice in the top left square. Then put 2 grains of rice on the next square, followed by 4 grains on the next, 8 grains on the next...and so on, where the amount of rice doubles on each square, until you get to the last square. When you are finished, give me all the grains of rice that would (in theory), fit on the chessboard.* The king, sensing that the man was an idiot for making such a stupid offer, immediately accepts the second option. He summons a chessboard, and begins counting out grains of rice one by one... Assuming that there are 64 squares on a chessboard, calculate how many grains of rice the main will receive. If one grain of rice weights 1/6400 kilograms, how many kilograms of rice did he get? *Hint: If you have trouble coming up with the answer, imagine how many grains are on the first, second, third and fourth squares, then try to create the vector that shows the number of grains on each square. Once you come up with that vector, you can easily calculate the final answer with the `sum()` function.*

6 changes: 3 additions & 3 deletions 07-indexingvectors.Rmd
Expand Up @@ -92,7 +92,7 @@ boat.ages[c(1, 1, 1)]
```


It it makes your code clearer, you can define an indexing object before doing your actual indexing. For example, let's define an object called `my.index` and use this object to index our data vector:
If it makes your code clearer, you can define an indexing object before doing your actual indexing. For example, let's define an object called `my.index` and use this object to index our data vector:

```{r}
my.index <- 3:5
Expand Down Expand Up @@ -123,7 +123,7 @@ a[c(TRUE, FALSE, TRUE, FALSE, TRUE)]

As you can see, R returns all values of the vector `a` for which the logical vector is TRUE.

```{r fig.cap = "Logical comparison operators in R", echo = FALSE}
```{r comparison, fig.cap = "Logical comparison operators in R", echo = FALSE}
par(mar = rep(.1, 4))
plot(1, xlim = c(0, 1.1), ylim = c(0, 10),
xlab = "", ylab = "", xaxt = "n", yaxt = "n",
Expand All @@ -143,7 +143,7 @@ text(rep(.2, 9), 9:1,
```


However, creating logical vectors using `c()` is tedious. Instead, it's better to create logical vectors from *existing vectors* using comparison operators like < (less than), == (equals to), and != (not equal to). A complete list of the most common comparison operators is in Figure~\ref{fig:comparison}. For example, let's create some logical vectors from our `boat.ages` vector:
However, creating logical vectors using `c()` is tedious. Instead, it's better to create logical vectors from *existing vectors* using comparison operators like < (less than), == (equals to), and != (not equal to). A complete list of the most common comparison operators is in Figure \@ref(fig:comparison). For example, let's create some logical vectors from our `boat.ages` vector:

```{r}
# Which ages are > 100?
Expand Down
14 changes: 7 additions & 7 deletions 08-matricesdataframes.Rmd
Expand Up @@ -124,7 +124,7 @@ Table: (\#tab:matrixfunctions) Functions to create matrices and dataframes.

`cbind()` and `rbind()` both create matrices by combining several vectors of the same length. `cbind()` combines vectors as columns, while `rbind()` combines them as rows.

Let's use these functions to create a matrix with the numbers 1 through 30. First, we'll create three vectors of length 10, then we'll combine them into one matrix. As you will see, the `cbind()` function will combine the vectors as columns in the final matrix, while the `rbind()` function will combine them as rows.
Let's use these functions to create a matrix with the numbers 1 through 30. First, we'll create three vectors of length 5, then we'll combine them into one matrix. As you will see, the `cbind()` function will combine the vectors as columns in the final matrix, while the `rbind()` function will combine them as rows.


```{r}
Expand Down Expand Up @@ -195,9 +195,9 @@ survey

#### `stringsAsFactors = FALSE`

There is one key argument to `data.frame()` and similar functions called `stringsAsFactors`. By default, the `data.frame()` function will automatically convert any string columns to a specific type of object called a **factor** in R. A factor is nominal variable that has a well-specified possible set of values that it can take on. For example, one can create a factor `sex` that can *only* take on the values `"male"` and `"female"`.
There is one key argument to `data.frame()` and similar functions called `stringsAsFactors`. By default, the `data.frame()` function will automatically convert any string columns to a specific type of object called a **factor** in R. A factor is a nominal variable that has a well-specified possible set of values that it can take on. For example, one can create a factor `sex` that can *only* take on the values `"male"` and `"female"`.

However, as I'm sure you're discover, having R automatically convert your string data to factors can lead to lots of strange results. For example: if you have a factor of sex data, but then you want to add a new value called `other`, R will yell at you and return an error. I *hate*, *hate*, *HATE* when this happens. While there are very, very rare cases when I find factors useful, I almost always don't want or need them. For this reason, I avoid them at all costs.
However, as I'm sure you'll discover, having R automatically convert your string data to factors can lead to lots of strange results. For example: if you have a factor of sex data, but then you want to add a new value called `other`, R will yell at you and return an error. I *hate*, *hate*, *HATE* when this happens. While there are very, very rare cases when I find factors useful, I almost always don't want or need them. For this reason, I avoid them at all costs.

To tell R to *not* convert your string columns to factors, you need to include the argument `stringsAsFactors = FALSE` when using functions such as `data.frame()`

Expand Down Expand Up @@ -300,7 +300,7 @@ To learn about the classes of columns in a dataframe, in addition to some other
str(ToothGrowth)
```

Here, we can see that `ToothGrowth` is a dataframe with 60 observations (ie., rows) and 3 variables (ie., columns). We can also see that the column names are `len`, `supp`, and `dose`
Here, we can see that `ToothGrowth` is a dataframe with 60 observations (ie., rows) and 5 variables (ie., columns). We can also see that the column names are `index`, `len`, `len.cm`, `supp`, and `dose`



Expand Down Expand Up @@ -459,7 +459,7 @@ ToothGrowth[1:6, 1]
```


Because the first column is `len`, the primary dependent measure, this means that the tooth lengths in the first 5 observations are `r ToothGrowth[1:6, 1]`.
Because the first column is `len`, the primary dependent measure, this means that the tooth lengths in the first 6 observations are `r ToothGrowth[1:6, 1]`.

Of course, you can index matrices and dataframes with longer vectors to get more data. Now, let's look at the first 3 rows of columns 1 and 3:

Expand Down Expand Up @@ -605,14 +605,14 @@ Now let's say we want to add a new column called `bmi` which represents a person

```{r}
# Calculate bmi
health$weight / health$height ^ 2
health$height / health$weight ^ 2
```

As you can see, we have to retype the name of the dataframe for each column. However, using the `with()` function, we can make it a bit easier by saying the name of the dataframe once.

```{r}
# Save typing by using with()
with(health, weight / height ^ 2)
with(health, height / weight ^ 2)
```

As you can see, the results are identical. In this case, we didn't save so much typing. But if you are doing many calculations, then `with()` can save you a lot of typing. For example, contrast these two lines of code that perform identical calculations:
Expand Down

0 comments on commit 66b0a22

Please sign in to comment.