In [1]:
library(dplyr)
library(tidyr)
library(fivethirtyeight)

"package 'dplyr' was built under R version 3.4.3"
Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

"package 'fivethirtyeight' was built under R version 3.4.4"

In [None]:
data(package="fivethirtyeight")

In [2]:
data_oxford <- comma_survey

### care about the subject ID, and a few other variables

In [4]:
data_oxford_20<-
dplyr::select(data_oxford,respondent_id,
heard_oxford_comma, data_singular_plural)

### work with the first 20 data points 

In [5]:
data_oxford_20<-slice(data_oxford_20,c(1:20))

### `gather()`: wide-form to long-form

In [7]:
data_oxford_long<-gather(data = data_oxford_20,
"question","answer",2:3)

### `spread()`: long-form to wide-form

* spread()    
distributes a pair of key-value columns into a field of
cells. Thus, the “keys” become separate columns, making the data
more “wide.”    
* spread()   
takes three optional arguments addition to data, key, and
value:    
* fill =   
if the combinations of valuables result in non-existent data,
then puts an NA in the cell.    
* convert =   
if the value column contains different data types,
convert will convert strings to doubles, integers, factors, etc.    
* drop =   
controls how spread() handles factors in the key column    

In [8]:
data_oxford_wide<-spread(data = data_oxford_long,
key = question, value = answer)

## Pipe (Putting it together: creating a pipeline)

In [10]:
data_oxford %>% select(respondent_id,heard_oxford_comma, 
                       data_singular_plural) %>% 
                        slice(c(1:20)) %>% 
                        gather("question","answer",2:3) %>%
                        arrange(respondent_id) 

respondent_id,question,answer
3292644552,heard_oxford_comma,False
3292644552,data_singular_plural,True
3292648325,heard_oxford_comma,False
3292648325,data_singular_plural,True
3292653724,heard_oxford_comma,True
3292653724,data_singular_plural,False
3292692304,heard_oxford_comma,True
3292692304,data_singular_plural,False
3292702854,heard_oxford_comma,True
3292702854,data_singular_plural,True


## More on `Pipe operator: %>%`

http://genomicsclass.github.io/book/pages/dplyr_tutorial.html

In [None]:
library(dplyr)
library(EDAWR)

In [5]:
p <- group_by(pollution, size)

In [6]:
summarise(p, mean = mean(amount), sum = sum(amount), n = n())

size,mean,sum,n
large,55.33333,166,3
small,28.66667,86,3


* Each variable must have its own column.
* Each observation must have its own row.
* Each value must have its own cell.