# Tidyverse (part 2)

### Pipes

Pipes are supplied by package magrittR, and are (normally) loaded with library(tidyverse). The main innovation is the "pipe" operator, that allows to "chain" commands without having to nest brackets or use intermediate variables. This results in (arguably) much more readable code.

In [None]:
library(tidyverse)
library(readxl)

sazxlFile <- "./data/sazava.xls"

sazava_tbl<- read_xls(sazxlFile)

A common situation is that you have to perform several operations on your data. Consider, for instance, the following case:

In [None]:
sazava_tbl[sazava_tbl[,"SiO2"]<55,"Al2O3"]

This can also be written using tibble operations :

In [None]:
select(filter(sazava_tbl,SiO2<55),Al2O3)

Note, incidentally, that tibbles use "data masking", meaning that most of the times you can indifferently quote or not the variable names:

In [None]:
select(filter(sazava_tbl,SiO2<55),Al2O3)
select(filter(sazava_tbl,SiO2<55),"Al2O3")

This feature is great when working inline, but may become annoying when programming. And it is not always consistent, so sometimes deciding whether to quote or not to quote is a matter of trial and error. See, for instance,
https://stackoverflow.com/questions/65671975/tibbles-and-data-defined-column-names/65672042#65672042

As we all know, piling up operators may lead to clumsy and unreadable code :

In [None]:
sazava_tbl[sazava_tbl[,"SiO2"]<55,"Al2O3"] / sazava_tbl[sazava_tbl[,"SiO2"]<55,"CaO"]*2

So the usual cure is to play with intermediate variables:

In [None]:
idx <- sazava_tbl[,"SiO2"]<55
al <- sazava_tbl[idx,"Al2O3"]
ca <- sazava_tbl[idx,"CaO"]

al/ca*2

This is still a bit unwieldy, and ends up polluting the workspace with lots of intermediate variables ... which has been known to cause trouble. This is were the pipe comes in handy.

The pipe is simply a function that connects its left-hand side and right-hand side. The output of the lhs function becomes the (first) input of the rhs function - so pipes work with any function that takes a sensible first operator (not only tidyverse functions).

Therefore, the above command can be recast as follows:

In [None]:
sazava_tbl %>% filter(SiO2<55) %>% select(Al2O3)

introducing the `mutate` command, that calculates a new variable :

In [None]:
sazava_tbl %>% filter(SiO2<55) %>% select(Al2O3,CaO) %>% mutate(AlCa = Al2O3/CaO*2)

This is exactly identical to the following:

In [None]:
intermediate1 <- filter(sazava_tbl,SiO2<55)
intermediate2 <- select(intermediate1,Al2O3,CaO)
intermediate3 <- mutate(intermediate2,AlCa = Al2O3/CaO*2)
intermediate3

Assigment can be done using the slighlty uncommon variant of the assignment operator, `->`:

In [None]:
sazava_tbl %>% filter(SiO2<55) %>% select(Al2O3,CaO) %>% mutate(AlCa = Al2O3/CaO*2) -> result

Or in the more common form

In [None]:
result <- sazava_tbl %>% filter(SiO2<55) %>% select(Al2O3,CaO) %>% mutate(AlCa = Al2O3/CaO*2)

for simple replacement, one may use bidirectional pipe of magrittR (which is **not** loaded directly by `library(tidyverse)`, you need to load magrittr manually to access the more evolved pipes - of which there are several types, not covered here )

In [None]:
library(magrittr)
result %<>% select(AlCa)
result

Pipes are also newline-friendly, so you can write very legible code :

In [None]:
sazava_tbl %>% 
  filter(SiO2<55) %>% 
  select(Al2O3,CaO) %>% 
  mutate(AlCa = Al2O3/CaO*2) 

Finally, in pipe chains, `.` can be used as a shorthand to refer to the "current" variable that gets passed through the pipe. So a neat way to assign the result of a pipe is

In [None]:
sazava_tbl %>% 
  filter(SiO2<55) %>% 
  select(Al2O3,CaO) %>% 
  mutate(AlCa = Al2O3/CaO*2) %>%
  {.} -> result

... which does nothing else than the previous versions, but in a very clean way (you see what gets into the pipe, and what comes out)