In [1]:
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.3     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.6     [32m✔[39m [34mdplyr  [39m 1.0.4
[32m✔[39m [34mtidyr  [39m 1.1.2     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.4.0     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



In [2]:
(x <- c(a = 1, b = 2, c = 3))

Warm-up:

1. Create a named vector of the ages of some of your family members
2. Create various subsets of this vector using numeric vectors
3. Try conditional subsetting (i.e. all ages above or below a certain value)
4. Now subset using a character of explicit names
5. What is the type of this vector?

In [6]:
(family_ages <- c(me = 29, sibling = 40, dad = 82, wife = 30, grandmother = 98))

In [7]:
#subsetting using a numeric vector
family_ages[c(1,3)]

In [8]:
#conditional subsetting
family_ages[family_ages > 40]

In [9]:
family_ages > 40

In [11]:
#subsetting using the names
family_ages[c("me", "dad", "sibling")]

In [12]:
#find the type of a vector
typeof(family_ages)

In [16]:
(family_relationships = c(me = "Mike", dad = "Mike", mom = "Ana"))
typeof(family_relationships)

In [18]:
names(family_ages)

There are a variety of functions that we can apply to vectors
- ```sum()```: Sum of elements
- ```prod()```: Product of elements
- ```mean()```: Mean of elements
- ```sd()```: Standard deviation of the elements
- ```var()```: Variation of the elements
- ```median()```: Median of elements
- ```min()```: Minimum
- ```max()```: Maximum
- ```range()```: Range of the values
- ```summary()```: Summary statistics for the vector
- ```unique()```: Returns a vector with the unique values
- ```which()```: From a logical statement, returns the indices

In [19]:
x <- c(1, 2, 3, 4, 5, 10, 20)
summary(x)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   2.500   4.000   6.429   7.500  20.000 

In [22]:
# returns a two element vector
range(x)

In [23]:
# which indices of x correspond to values greater than 6?
which(x > 6)

In [24]:
# the following are equivalent
x[x > 6]
x[which(x > 6)]

In [26]:
# remember to remove missing values for certain functions!
mean(c(1, 2, NA, 4), na.rm = TRUE)

In [27]:
# we can directly alter the elements of the list
# this changes the local variable x
x[1] <- 10
x

### Arithmetic operations and recycling
- Standard Linear Algebra operations work as expected
 * Adding two vectors of the same length
 * Scaling a vector
- Products and Divsion occur element-wise
- Vectors of different lengths can be *recycled*
 * The smaller vector is repeated until it is the necessary size
 * For vectors of length one, this is similar to *broadcasting* in Python

In [32]:
x <- c(1, 2, 3, 4)
y <- c(0, 0, 1, 2)


In [36]:
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
y <- c(1, 2)
z <- c(1, 2, 1, 2, 1, 2, 1, 2, 1)

x*y
x*z

“longer object length is not a multiple of shorter object length”


Attempt

1. Create the following four vectors

```x <- c(1, 2, 3, 4, 5, 6)```

```y <- c(6, 5, 4, 3, 2, 1)```

```z <- c(10, 20) ```

```w <- c(0, .1) ```

2. Add, divide, multiply, and scale the vectors ```x``` and ```y```. Try to guess what the output will be **before** running the code.
3. Do the same for ```z``` and ```w```.
4. Run ```x + 1```. What happened?
5. Run ```x * w``` and ```y + z```. What happened?

In [37]:
x <- c(1, 2, 3, 4, 5, 6)
y <- c(6, 5, 4, 3, 2, 1)
z <- c(10, 20)
w <- c(0, .1)

In [42]:
x + 1
# x + 1 is equivalent to x + c(1, 1, 1, 1, 1, 1)

In [44]:
x * w
y * z

In [46]:
mpg

manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
<chr>,<chr>,<dbl>,<int>,<int>,<chr>,<chr>,<int>,<int>,<chr>,<chr>
audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact
audi,a4,2.8,1999,6,manual(m5),f,18,26,p,compact
audi,a4,3.1,2008,6,auto(av),f,18,27,p,compact
audi,a4 quattro,1.8,1999,4,manual(m5),4,18,26,p,compact
audi,a4 quattro,1.8,1999,4,auto(l5),4,16,25,p,compact
audi,a4 quattro,2.0,2008,4,manual(m6),4,20,28,p,compact


- This is similar to the column operations we used before...
- Use ```[[```...```]]``` to extract a column of the ```mpg``` dataset:

```x <- mpg[["model"]]```

- What is ```x```?

In [50]:
x <- mpg[["model"]]
x

### Lists
- Tibbles are secretly fancy *lists* of vectors
- Lists are ordered collections of elements like vectors, but the elements can be anything!

In [51]:
(myList <- list(1L, 2, "hello"))

In [52]:
c(1L, 2, "hello")

In [53]:
# lists can contain other lists as well
(anotherList <- list(myList, 2.2, "a string "))

In [54]:
# a good way to look at the STRucture of a list
str(anotherList)

List of 3
 $ :List of 3
  ..$ : int 1
  ..$ : num 2
  ..$ : chr "hello"
 $ : num 2.2
 $ : chr "a string "


In [55]:
# like with vectors, [] returns another list
myList[c(2,3)]
anotherList[c(1,2)]

In [56]:
# The [[ ]] operator "pops" out an element of a list

# A list of one element, consisting of a list
anotherList[1]

# a list of three elements
anotherList[[1]]

In [57]:
str(anotherList[1])
str(anotherList[[1]])

List of 1
 $ :List of 3
  ..$ : int 1
  ..$ : num 2
  ..$ : chr "hello"
List of 3
 $ : int 1
 $ : num 2
 $ : chr "hello"


In [58]:
# lists also have lengths
length(anotherList)
length(myList)

In [59]:
# Just like vectors, we can create lists with named elements
myList <- list(a = 1, b = list("hello", "a     string"), c = 2.2)

# The elements can be accessed just like with vectors using []
myList[c('a', 'c')]

# or we can pop out an element using [[ ]]
myList[['b']]

# The $ operator is shorthand for [[ ]]
myList$b

- We can think of tibbles as lists of named vectors

In [60]:
typeof(mpg)
mpg

manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
<chr>,<chr>,<dbl>,<int>,<int>,<chr>,<chr>,<int>,<int>,<chr>,<chr>
audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact
audi,a4,2.8,1999,6,manual(m5),f,18,26,p,compact
audi,a4,3.1,2008,6,auto(av),f,18,27,p,compact
audi,a4 quattro,1.8,1999,4,manual(m5),4,18,26,p,compact
audi,a4 quattro,1.8,1999,4,auto(l5),4,16,25,p,compact
audi,a4 quattro,2.0,2008,4,manual(m6),4,20,28,p,compact


In [None]:
# list(manufactor = c("audi", "audi", ....), model = c("a4", "a4", ...), ...)

The following are equivalent:

```mpg$year```

```mpg[['year']]```

They each return the column vector named 'year'. It is an *integer* vector. We can see this from the table display or if we pass the vector through ```is_integer()```.

In [None]:
mpg %>%
  transmute(total_mpg = cty + hwy)

In [None]:
mpg %>%
  group_by(drv) %>%
  summarize(mean_hwy = mean(hwy))

- The column operations we use for creating new variables are inherited from vector operations and vector functions
- For example, adding two columns is equivalent to adding two vectors (because columns are vectors!)

In [61]:
# tibbles are easy to create from vectors
# Recall: vectors of shorter lengths are recycled
tibble(names = c("me", "myself", "I"),
       ages = 29,
       name_lengths = c(2L, 5L, 1L),
       age_times_nl = ages * name_lengths)

names,ages,name_lengths,age_times_nl
<chr>,<dbl>,<int>,<dbl>
me,29,2,58
myself,29,5,145
I,29,1,29


In [62]:
# a tribble is a different way of creating a tibble
# designed for data entry
tribble(
  ~x, ~y, ~z,
  #--|--|----
  "a", 2, 3.6,
  "b", 1, 8.5
)

x,y,z
<chr>,<dbl>,<dbl>
a,2,3.6
b,1,8.5
