# Metadata 

```yaml
Course:  DS 5100
Module:  13 R Programming 2
Topic:   The 5 verbs of dplyr
Author:  R.C. Alvarado (adapted from Ben Stenhaug [1])
```

[1] [Source](https://teachingr.com/content/the-5-verbs-of-dplyr/the-5-verbs-of-dplyr-article.html)

# Getting started

As always, the first thing we will do is load the tidyverse.

Note: If you haven’t yet installed the tidyverse, you’ll first have to
run following:

In [2]:
install.packages("tidyverse")


The downloaded binary packages are in
	/var/folders/14/rnyfspnx2q131jp_752t9fc80000gn/T//RtmpeBhj7l/downloaded_packages


In [4]:
library(tidyverse)

The Tidyverse is a collection of packages made by Hadley Wickham. 

One of the key packages in that collection is called `dplyr`. 

The magic of dplyr is that with just a handful of commands (the verbs of
dplyr), you can do nearly anything you’d want to do with your data.

This article will cover the five verbs of dplyr: select, filter,
arrange, mutate, and summarize.

Before we walk through each command, let’s make a data frame to play
with.

Notice that dplyr uses `data_frame()` while basic R uses `data.frame()`.

In [81]:
hamsters <- data_frame(
        name = c("Megan", "Amy", "Jen", "Karl", "Jeremy"),
        gender = c("female", "female", "female", "male", "male"),
        hamsters = c(5, 7, 6, 2, 1),
        hamster_cages = c(2, 1, 3, 3, 4)
    )

In [64]:
hamsters

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Megan,female,5,2
Amy,female,7,1
Jen,female,6,3
Karl,male,2,3
Jeremy,male,1,4


Don’t worry too much about the above code, but you might stop and
inspect it. 

Notice that we’re creating a data frame named hamsters. 

The first column is name and then we list the names. `c()` is the R function
concatenate; its how we stick multiple things together. 

All strings (aka words) need to have quotes around them. 

The second column is gender and so on.

By just running the name of the data frame, `hamsters`, it will show it to
us and tell us a little bit about it. 

The 5 x 4 says it has 5 rows and 4 columns. 

Each column has a type: 
* name and gender are which stands for characters, 
* and the other columns are which stands for double and is another word for number.

Now that we have a data frame to work with, we can dive into the 5 verbs
of dplyr. The code blocks below will show small little examples of what
is possible. Before doing something with hamsters, I’ll typically print
the original hamsters data frame first because the easiest way to see
what a function is doing is to see a before and after.

# Arrange

Arrange keeps all of the information in the data frame, but just **changes
the order of the rows**. 

This is the same thing that the **sort** in most other languages and applications, e.g. Excel, SQL, and Pandas.

By default, arranging happens in ascending order:

In [65]:
hamsters %>% 
  arrange(hamsters)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Jeremy,male,1,4
Karl,male,2,3
Megan,female,5,2
Jen,female,6,3
Amy,female,7,1


We can instead arrange in descending order with the desc() function:

In [66]:
hamsters %>% 
  arrange(desc(hamster_cages))

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Jeremy,male,1,4
Jen,female,6,3
Karl,male,2,3
Megan,female,5,2
Amy,female,7,1


Character columns get arranged in alaphetical order:

In [67]:
hamsters %>% 
  arrange(name)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Amy,female,7,1
Jen,female,6,3
Jeremy,male,1,4
Karl,male,2,3
Megan,female,5,2


If we input multiple column names, arrange uses the additional columns
to break ties.

In [68]:
hamsters %>% 
  arrange(hamster_cages, hamsters)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Amy,female,7,1
Megan,female,5,2
Karl,male,2,3
Jen,female,6,3
Jeremy,male,1,4


# Select

Select is used to choose which columns to work with. 

For example, maybe we want just the name and hamsters columns:

In [69]:
hamsters

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Megan,female,5,2
Amy,female,7,1
Jen,female,6,3
Karl,male,2,3
Jeremy,male,1,4


In [71]:
hamsters %>%
  select(name, hamsters)

name,hamsters
<chr>,<dbl>
Megan,5
Amy,7
Jen,6
Karl,2
Jeremy,1


> In R basic, you'd do `hamsters[c('name', 'hamsters')]`\
In Pandas, you'd do `hamsters[['name','hamsters']]`.

Notice how dplyr does not use literal strings for column names in these contexts.

We can use a `-` to get rid of a column and leave the rest of the
columns:

In [73]:
hamsters %>% 
  select(-name)

gender,hamsters,hamster_cages
<chr>,<dbl>,<dbl>
female,5,2
female,7,1
female,6,3
male,2,3
male,1,4


We also could have gotten just the name and hamsters columns by removing
the gender and hamster_cages columns:

In [74]:
hamsters %>% 
  select(-gender, -hamster_cages)

name,hamsters
<chr>,<dbl>
Megan,5
Amy,7
Jen,6
Karl,2
Jeremy,1


Select can also be used to **rearrange the order** of columns:

In [76]:
hamsters %>% 
  select(hamsters, hamster_cages, gender, name)

hamsters,hamster_cages,gender,name
<dbl>,<dbl>,<chr>,<chr>
5,2,female,Megan
7,1,female,Amy
6,3,female,Jen
2,3,male,Karl
1,4,male,Jeremy


`everything()` is a convenient shortcut that adds **all the columns that
haven’t been used yet**. 

It is very useful if you want to move a column to
the front of a data frame:

In [78]:
hamsters %>% 
  select(hamster_cages, everything())

hamster_cages,name,gender,hamsters
<dbl>,<chr>,<chr>,<dbl>
2,Megan,female,5
1,Amy,female,7
3,Jen,female,6
3,Karl,male,2
4,Jeremy,male,1


# Filter

Filter is used to select which rows you want. For example, maybe we only
want students with more than 3 hamsters:

In [82]:
hamsters %>% 
  filter(hamsters > 3)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Megan,female,5,2
Amy,female,7,1
Jen,female,6,3


Notice that there is a variable named the same thing as the data
frame. 

The first “hamsters” in the following code refers to the data
frame, while the second “hamsters” refers to the hamsters column.

Or maybe we only want female students:

In [83]:
hamsters

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Megan,female,5,2
Amy,female,7,1
Jen,female,6,3
Karl,male,2,3
Jeremy,male,1,4


In [84]:
hamsters %>% 
  filter(gender == "female")

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Megan,female,5,2
Amy,female,7,1
Jen,female,6,3


Notice that we had to use `==` instead of `=`. 

This is because `=` is for assignemnt – making something equal something else – whereas `==` is
for comparison – seeing if two things are equal or not.

If we want to use an “and” (require that multiple conditions hold) we
can either use the `&` sign or separate the conditions with a comma.

For example, the following two filters are equivalent:

In [85]:
hamsters %>% 
  filter(gender == "female" & hamsters >= 6)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Amy,female,7,1
Jen,female,6,3


In [86]:
hamsters %>% 
  filter(gender == "female", hamsters >= 6)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Amy,female,7,1
Jen,female,6,3


If we want to use an “or” (require that just 1 of multiple conditions
holds) we have to use the `|` sign.

For example:

In [89]:
hamsters %>% 
  filter(gender == "male" | hamsters >= 7)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Amy,female,7,1
Karl,male,2,3
Jeremy,male,1,4


A case that commonly comes up is requiring that a variable has one of a
set of specific values. For example, maybe we only want students with 2,
4, 6, or 8 hamsters.

The most intuitive way to do this is with a series of “or” statements:

In [90]:
hamsters %>% 
  filter(hamsters == 2 | hamsters == 4 | hamsters == 6 | hamsters == 8)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Jen,female,6,3
Karl,male,2,3


It gets tedious having to type and retype the word “hamsters” over and
over again, though.

A nice shortcut is to supply the values you’re interested in as a vector
by typing c(2, 4, 6, 8).

The `c()` stands for concatenate which basically
means to glue 2 to 4 to 6 to 8 all together in one vector.

Once we have that vector we can simply check if the number of hamsters
for that row is “%in%” the vector we created. 

Here’s how that looks:

In [92]:
hamsters %>% 
  filter(hamsters %in% c(2, 4, 6, 8))

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Jen,female,6,3
Karl,male,2,3


This can be a little bit confusing, so make sure you understand this.

Think about the filter as happening row by row: first it checks the
first row to see if 5 is in c(2, 4, 6, 8) – it isn’t so it doesn’t
include that row. 

Then it checks the second row and also doesn’t include
that one because 7 isn’t in the vector. 

It then checks the third row and
keeps it because 6 is in there, and so on.

# Filter with groups

<span style="color:red; font-weight:bold;">NOT CLEAR WHAT group_by IS DOING HERE: RESULTS SAME AS DOING NOTHING</span>

Groups can be confusing at first but they are incredibly useful. Usually
code operates on rows. For example, the code above checked each row to
see if the gender was female in that row.

But sometimes we want to work with groups of rows instead of one row at
a time. To do so, we add a group attribute to the data frame before we
do anything.

Here’s example code and a visual depiction of grouping a data frame by
the `x` column:

```r
dataframe %>% 
   group_by(x)
```

<img src="http://swcarpentry.github.io/r-novice-gapminder/fig/12-plyr-fig1.png" width="50%" height="50%"/>

[Source](https://cities.github.io/datascience2017/11-split-apply-combine.html)

This pattern is called **split-apply-combine**. 

The idea is to: 
* **split** a data frame into multiple groups,
* **apply** something to each group, 
* then **combine** the groups back into a single data frame.

Let’s look at an example with hamsters:

In [108]:
hamsters %>% 
  group_by(gender) %>% 
  filter(max(hamster_cages) == 3)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Megan,female,5,2
Amy,female,7,1
Jen,female,6,3


We first grouped by gender. After that, every operation will happen at
the group level instead of the row level.

The final line is where the magic happens. It tells R to return only the
gender group where the max number of hamster cages is 3. This is the
female group.

Notice that the male group isn’t included because the max number of
hamster cages is 4, not 3.

Similarly, we can get the gender group where the mean number of hamsters
is 1.5. This time its the male group because there are two males – one
with 1 hamster and the other with 2 hamsters.

In [104]:
hamsters %>% 
  group_by(gender) %>% 
  filter(mean(hamsters) == 1.5)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Karl,male,2,3
Jeremy,male,1,4


The n() function is a shortcut for the number of rows in the group. So,
the following code finds the gender group with 3 rows in it:

In [110]:
hamsters %>% 
  group_by(gender) %>% 
  filter(n() == 4)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>


Of course, there isn’t any reason that we have to group by gender. We
could instead group by the number of hamster cages.

**Self-explanation: Why don’t Jen or Karl appear in the data frame after
we filter?**

In [112]:
hamsters %>% 
  group_by(hamster_cages) %>% 
  filter(n() == 1)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Megan,female,5,2
Amy,female,7,1
Jeremy,male,1,4


# Mutate

So far, arrange has sorted our data, select has chosen which columns to
work with, and filter has sorted which rows to use. We haven’t changed
our data at all yet though – that’s what mutate does!

For example, maybe we want to create a new variable based on the number
of hamsters per cage for each person:

In [115]:
hamsters %>% 
  mutate(hamsters_per_cage = hamsters / hamster_cages)

name,gender,hamsters,hamster_cages,hamsters_per_cage
<chr>,<chr>,<dbl>,<dbl>,<dbl>
Megan,female,5,2,2.5
Amy,female,7,1,7.0
Jen,female,6,3,2.0
Karl,male,2,3,0.6666667
Jeremy,male,1,4,0.25


In [116]:
hamsters %>% 
  mutate(hamsters_per_cage = hamsters / hamster_cages) %>% 
  arrange(hamsters_per_cage)

name,gender,hamsters,hamster_cages,hamsters_per_cage
<chr>,<chr>,<dbl>,<dbl>,<dbl>
Jeremy,male,1,4,0.25
Karl,male,2,3,0.6666667
Jen,female,6,3,2.0
Megan,female,5,2,2.5
Amy,female,7,1,7.0


Or maybe we want an indicator of if the person has 5 or more hamsters:

In [61]:
hamsters %>% 
  mutate(five_or_more_hamsters = hamsters >= 5)

name,gender,hamsters,hamster_cages,five_or_more_hamsters
<chr>,<chr>,<dbl>,<dbl>,<lgl>
Megan,female,5,2,True
Amy,female,7,1,True
Jen,female,6,3,True
Karl,male,2,3,False
Jeremy,male,1,4,False


We can also use mutate to input new data:

In [62]:
hamsters %>% 
  mutate(cats = c(4, 5, 2, 3, 1))

name,gender,hamsters,hamster_cages,cats
<chr>,<chr>,<dbl>,<dbl>,<dbl>
Megan,female,5,2,4
Amy,female,7,1,5
Jen,female,6,3,2
Karl,male,2,3,3
Jeremy,male,1,4,1


Of course, this only works if we give it the right amount of values:

In [117]:
# hamsters %>% 
#   mutate(dogs = c(1, 3, 5))

In [118]:
tryCatch(
    exp = {hamsters %>% mutate(dogs = c(1, 3, 5))},
    error = function (e) {print(e)}
)

[1m[1m[1m[34m<error/dplyr:::mutate_error>[39m[22m
[1m[33mError[39m in [1m[1m`mutate()`:[22m
[1m[22m[33m![39m Problem while computing `dogs = c(1, 3, 5)`.
[31m✖[39m `dogs` must be size 5 or 1, not 3.
---
[1mBacktrace:[22m
[90m  1. [39m[1mIRkernel[22m::main()
[90m 31. [39m[1mdplyr[22m:::mutate.data.frame(., dogs = c(1, 3, 5))


Interestingly, we can give it just 1 value and it will repeat it the
correct number of times automatically:

In [68]:
hamsters %>% 
  mutate(walruses = 0)

name,gender,hamsters,hamster_cages,walruses
<chr>,<chr>,<dbl>,<dbl>,<dbl>
Megan,female,5,2,0
Amy,female,7,1,0
Jen,female,6,3,0
Karl,male,2,3,0
Jeremy,male,1,4,0


We can create multiple new columns with one use of mutate if we separate
each new column with a `,`:

In [119]:
hamsters %>% 
  mutate(hamsters_per_cage = hamsters / hamster_cages,
         five_or_more_hamsters = hamsters >= 5)

name,gender,hamsters,hamster_cages,hamsters_per_cage,five_or_more_hamsters
<chr>,<chr>,<dbl>,<dbl>,<dbl>,<lgl>
Megan,female,5,2,2.5,True
Amy,female,7,1,7.0,True
Jen,female,6,3,2.0,True
Karl,male,2,3,0.6666667,False
Jeremy,male,1,4,0.25,False


Notice that “mutate” leaves all of the original columns in the dataframe
and adds new columns. If we instead use **transmute** we’ll only get the
new columns:

In [70]:
hamsters %>% 
  transmute(hamsters_per_cage = hamsters / hamster_cages,
         five_or_more_hamsters = hamsters >= 5)

hamsters_per_cage,five_or_more_hamsters
<dbl>,<lgl>
2.5,True
7.0,True
2.0,True
0.6666667,False
0.25,False


Slightly more complex things can be done by using values calculated from
the data frame in the creation of a new column.

In [120]:
hamsters %>% 
  mutate(hamster_cages_centered = hamster_cages - mean(hamster_cages))

name,gender,hamsters,hamster_cages,hamster_cages_centered
<chr>,<chr>,<dbl>,<dbl>,<dbl>
Megan,female,5,2,-0.6
Amy,female,7,1,-1.6
Jen,female,6,3,0.4
Karl,male,2,3,0.4
Jeremy,male,1,4,1.4


Notice that first the mean of the hamster_cages column is calculated to
be 2.6, then the new column is created by subtracting 2.6 off of each
value of the hamster_cages column.

# Mutate with groups

Sometimes its useful to define new variables based on a group. Remember
groups tell R to operate on the data frame one group at a time as
opposed to using all of the rows in the data frame.

For example, examine the following – note how it’s different from the
code above:

In [121]:
hamsters %>% 
  group_by(gender) %>% 
  mutate(hamster_cages_centered_by_gender = hamster_cages - mean(hamster_cages))

name,gender,hamsters,hamster_cages,hamster_cages_centered_by_gender
<chr>,<chr>,<dbl>,<dbl>,<dbl>
Megan,female,5,2,0.0
Amy,female,7,1,-1.0
Jen,female,6,3,1.0
Karl,male,2,3,-0.5
Jeremy,male,1,4,0.5


Before in `hamster_cages_centered` we subtracted the mean of
`hamster_cages` which was $2.6$ off of every value of `hamster_cages`.

Now because we are grouping by `gender`, we subtract 2 off of
`hamster_cages` for females and 3.5 off of `hamster_cages` for males. 

This is because `mean(hamster_cages)` operates on groups of rows defined by
`gender` after we add the `group_by(gender)` attribute to the data frame.

**Challenge: See if you can understand what the following code is doing.
Warning, the “new_varible” doesn’t really make sense context.**

In [123]:
hamsters %>% 
  group_by(hamster_cages) %>% 
  mutate(new_variable = hamster_cages - n())

name,gender,hamsters,hamster_cages,new_variable
<chr>,<chr>,<dbl>,<dbl>,<dbl>
Megan,female,5,2,1
Amy,female,7,1,0
Jen,female,6,3,1
Karl,male,2,3,1
Jeremy,male,1,4,3


# Summarize

Mutate kept the same number of rows in the data frame and added a
column.

We also want to be able to collapse rows of a data frame which we might
think of summarizing. One of the most common ways to summarize a set of
numbers is to take the mean:

In [124]:
hamsters %>% 
  summarize(hamsters_mean = mean(hamsters))

hamsters_mean
<dbl>
4.2


Another common method of summarizing is the median. We can summarize
multiple variables with multiple functions at the same time:

In [126]:
hamsters %>%
  summarize(hamsters_mean = mean(hamsters),
            hamsters_median = median(hamsters),
            hamster_cages_mean = mean(hamster_cages),
            hamster_cages_median = median(hamster_cages))

hamsters_mean,hamsters_median,hamster_cages_mean,hamster_cages_median
<dbl>,<dbl>,<dbl>,<dbl>
4.2,5,2.6,3


# Summarize with groups

Summarize isn’t that useful by itself, but when we add groups it becomes
crazy powerful!

It allows us to get a summary row for each group in the data frame:

In [127]:
hamsters

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Megan,female,5,2
Amy,female,7,1
Jen,female,6,3
Karl,male,2,3
Jeremy,male,1,4


In [128]:
hamsters %>% 
  group_by(gender) %>% 
  summarize(mean_hamsters = mean(hamsters))

gender,mean_hamsters
<chr>,<dbl>
female,6.0
male,1.5


Just as before, we can create multiple summary statistics all at once:

In [129]:
hamsters %>% 
  group_by(gender) %>% 
  summarize(mean_hamsters = mean(hamsters),
            max_hamsters = max(hamsters),
            count = n())

gender,mean_hamsters,max_hamsters,count
<chr>,<dbl>,<dbl>,<int>
female,6.0,7,3
male,1.5,2,2


Of course, we don’t have to group by gender (it just happens to be the
most natural in this case):

In [130]:
hamsters %>% 
  group_by(hamster_cages) %>% 
  summarize(max_hamsters = max(hamsters),
            count = n())

hamster_cages,max_hamsters,count
<dbl>,<dbl>,<int>
1,7,1
2,5,1
3,6,2
4,1,1


It is easy to get the difference between mutate and summarize confused.
Remember that mutate returns the same number of rows in a data frame,
summarize returns just one row, and summarize with groups returns a row
for each group.

# The power of combining verbs!

The true power of dplyr comes from combining these 5 verbs to solve
problems. For example, see how we can piece commands together to do more
and more complex operations:

In [131]:
hamsters %>%
  arrange(hamsters)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Jeremy,male,1,4
Karl,male,2,3
Megan,female,5,2
Jen,female,6,3
Amy,female,7,1


In [132]:
hamsters %>%
  arrange(hamsters) %>% 
  select(-name) 

gender,hamsters,hamster_cages
<chr>,<dbl>,<dbl>
male,1,4
male,2,3
female,5,2
female,6,3
female,7,1


In [133]:
hamsters %>%
  arrange(hamsters) %>% 
  select(-name) %>% 
  mutate(walruses = 0) 

gender,hamsters,hamster_cages,walruses
<chr>,<dbl>,<dbl>,<dbl>
male,1,4,0
male,2,3,0
female,5,2,0
female,6,3,0
female,7,1,0


In [142]:
hamsters %>%
  arrange(hamsters) %>% 
  select(-name) %>% 
  mutate(walruses = 0) %>% 
  group_by(gender) %>% 
  mutate(hamsters_centered_by_gender = hamsters - mean(hamsters))

gender,hamsters,hamster_cages,walruses,hamsters_centered_by_gender
<chr>,<dbl>,<dbl>,<dbl>,<dbl>
male,1,4,0,-0.5
male,2,3,0,0.5
female,5,2,0,-1.0
female,6,3,0,0.0
female,7,1,0,1.0


# A few more examples

There are many little tricks that dplyr can do that we haven’t talked
about. Below are a few of those.

Grouping and summarizing to get counts is so common that there is a
shortcut “count” function:

In [143]:
hamsters %>% 
  group_by(gender) %>% 
  summarise(n = n())

gender,n
<chr>,<int>
female,3
male,2


In [144]:
hamsters %>% 
  count(gender)

gender,n
<chr>,<int>
female,3
male,2


If you’re working with a lot of columns, select has some really useful
helper functions. For example, we can get all of the columns that start
with the letter “h”:

In [145]:
hamsters %>% 
  select(starts_with("h"))

hamsters,hamster_cages
<dbl>,<dbl>
5,2
7,1
6,3
2,3
1,4


Sometimes people will use mutate to create a variable and then use that
new variable to filter, but you can just put that variable definition as
the filter condition:

In [146]:
hamsters %>% 
  mutate(more_than_5_hamsters_OR_2_cages = hamsters > 5 | hamster_cages > 3) %>% 
  filter(more_than_5_hamsters_OR_2_cages)

name,gender,hamsters,hamster_cages,more_than_5_hamsters_OR_2_cages
<chr>,<chr>,<dbl>,<dbl>,<lgl>
Amy,female,7,1,True
Jen,female,6,3,True
Jeremy,male,1,4,True


In [147]:
hamsters %>% 
  filter(hamsters > 5 | hamster_cages > 3)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Amy,female,7,1
Jen,female,6,3
Jeremy,male,1,4


Maybe you want the person with the fewest number of hamsters by gender:

In [148]:
hamsters %>% 
  arrange(hamsters) %>% # can you figure out why we need this arrange?
  group_by(gender) %>% 
  summarise(fewest_hamsters = first(hamsters))

gender,fewest_hamsters
<chr>,<dbl>
female,5
male,1


If we want to keep the entire row, we can use the “slice” function to
slice out the first row:

In [149]:
hamsters %>% 
  arrange(hamsters) %>% # can you figure out why we need this arrange?
  group_by(gender) %>% 
  slice(1)

name,gender,hamsters,hamster_cages
<chr>,<chr>,<dbl>,<dbl>
Megan,female,5,2
Jeremy,male,1,4


# Resources

This [cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf)
is a great way of seeing all of the functions at once. 
* When working on an assignment involving dplyr, it's a good idea to **print it out** work with it on your desk.

These sites have thoughtful writing and exercises on more advanced features of dplyr:
* [R for data science](http://r4ds.had.co.nz/transform.html)
* [stats545](http://stat545.com/block010_dplyr-end-single-table.html)
* [Simon Ejdemyr’s website](https://stanford.edu/~ejdemyr/r-tutorials/modifying-data/)