apply styler

lorenzwalthert · Feb 20, 2021 · d861f38 · d861f38
1 parent 4082218
commit d861f38
Show file tree

Hide file tree

Showing 26 changed files with 581 additions and 527 deletions.
diff --git a/content/blog/2020/corrr-0-4-3/index.Rmd b/content/blog/2020/corrr-0-4-3/index.Rmd
@@ -68,8 +68,8 @@ We can create a `cor_df` object containing the pairwise correlations between a f
 ```{r message = FALSE}
 library(palmerpenguins)
 
-penguins_cor <- penguins %>% 
-  select(bill_length_mm, bill_depth_mm, flipper_length_mm) %>% 
+penguins_cor <- penguins %>%
+  select(bill_length_mm, bill_depth_mm, flipper_length_mm) %>%
   correlate()
 
 penguins_cor
@@ -80,7 +80,7 @@ penguins_cor
 Previously, the default behavior of `rplot()` was that the variables were displayed in alphabetical order in the output. This was an artifact of using `ggplot2` and inheriting its behavior. The new default is to retain the ordering of variables in the input data: 
 
 ```{r message = FALSE}
-rplot(penguins_cor) 
+rplot(penguins_cor)
 ```
 
 If alphabetical ordering is desired, set `.order` to "alphabet": 
@@ -116,14 +116,14 @@ cov_df
 The resulting data frame behaves just like one returned by `correlate()`, except that it is populated with covariance values rather than correlations. This means we still have access to all corrr's other tooling when working with it. We can still use `shave()` for example to remove duplication, which will set the upper triangle of values to `NA`.
 
 ```{r}
-cov_df %>% 
+cov_df %>%
   shave()
 ```
 
 Similarly, we can still use `stretch()` to get the resulting data frame into a longer format:
 
 ```{r}
-cov_df %>% 
+cov_df %>%
   stretch()
 ```
 
@@ -132,7 +132,7 @@ The first part of the name ("colpair_") comes from the fact that we are comparin
 As such, any function passed to `colpair_map()` must accept a vector for both its first and second arguments. To illustrate, let's say we wanted to run a series t-tests to see which of our variables are significantly related to one another. We can write a function to do so as follows:
 
 ```{r}
-calc_ttest_p_value <- function(vec_a, vec_b){
+calc_ttest_p_value <- function(vec_a, vec_b) {
   t.test(vec_a, vec_b)$p.value
 }
 ```

diff --git a/content/blog/2020/dbplyr-2-0-0/index.Rmd b/content/blog/2020/dbplyr-2-0-0/index.Rmd
@@ -56,40 +56,40 @@ dbplyr now supports all relevant features added in dplyr 1.0.0:
 -   `across()` is now translated into individual SQL statements.
 
     ```{r}
-    lf <- lazy_frame(g = 1, a = 1, b = 2, c = 3)
-    lf %>% 
-      group_by(g) %>% 
-      summarise(across(everything(), mean, na.rm = TRUE))
+lf <- lazy_frame(g = 1, a = 1, b = 2, c = 3)
+lf %>%
+  group_by(g) %>%
+  summarise(across(everything(), mean, na.rm = TRUE))
     ```
 
 -   `rename()` and `select()` support dplyr tidyselect syntax, apart from predicate functions which can't easily work on computed queries.
     You can now use `rename_with()` to programmatically rename columns.
 
     ```{r}
-    lf <- lazy_frame(x1 = 1, x2 = 2, x3 = 3, y1 = 4, y2 = 3)
-    lf %>% select(starts_with("x") & !"x3")
-    lf %>% select(ends_with("2") | ends_with("3"))
-    lf %>% rename_with(toupper)
+lf <- lazy_frame(x1 = 1, x2 = 2, x3 = 3, y1 = 4, y2 = 3)
+lf %>% select(starts_with("x") & !"x3")
+lf %>% select(ends_with("2") | ends_with("3"))
+lf %>% rename_with(toupper)
     ```
 
 -   `relocate()` makes it easy to move columns around:
 
     ```{r}
-    lf <- lazy_frame(x1 = 1, x2 = 2, y1 = 4, y2 = 3)
-    lf %>% relocate(starts_with("y"))
+lf <- lazy_frame(x1 = 1, x2 = 2, y1 = 4, y2 = 3)
+lf %>% relocate(starts_with("y"))
     ```
 
 -   `slice_min()`, `slice_max()`, and `slice_sample()` are now supported, and `slice_head()` and `slice_tail()` throw informative error messages (since they don't make sense for databases).
 
     ```{r}
-    lf <- lazy_frame(g = rep(1:2, 5), x = 1:10)
-    lf %>% 
-      group_by(g) %>% 
-      slice_min(x, prop = 0.5)
-
-    lf %>% 
-      group_by(g) %>% 
-      slice_sample(x, n = 10, with_ties = TRUE)
+lf <- lazy_frame(g = rep(1:2, 5), x = 1:10)
+lf %>%
+  group_by(g) %>%
+  slice_min(x, prop = 0.5)
+
+lf %>%
+  group_by(g) %>%
+  slice_sample(x, n = 10, with_ties = TRUE)
     ```
 
     Note that these slices are translated into window functions, and because you can't use a window function directly inside a `WHERE` clause, they must be wrapped in a subquery.
@@ -109,55 +109,59 @@ Here are a few of the most important:
     You can set `na_matches = "na"` to match R's usual join behaviour.
 
     ```{r}
-    df1 <- tibble(x = c(1, 2, NA))
-    df2 <- tibble(x = c(NA, 1), y = 1:2)
-    df1 %>% inner_join(df2, by = "x")
+df1 <- tibble(x = c(1, 2, NA))
+df2 <- tibble(x = c(NA, 1), y = 1:2)
+df1 %>% inner_join(df2, by = "x")
 
-    db1 <- memdb_frame(x = c(1, 2, NA))
-    db2 <- memdb_frame(x = c(NA, 1), y = 1:2)
-    db1 %>% inner_join(db2, by = "x")
+db1 <- memdb_frame(x = c(1, 2, NA))
+db2 <- memdb_frame(x = c(NA, 1), y = 1:2)
+db1 %>% inner_join(db2, by = "x")
 
-    db1 %>% inner_join(db2, by = "x", na_matches = "na")
+db1 %>% inner_join(db2, by = "x", na_matches = "na")
     ```
 
     This translation is powered by the new `sql_expr_matches()` generic, because every database seems to have a slightly different way to express this idea.
     Learn more at <https://modern-sql.com/feature/is-distinct-from>.
 
     ```{r}
-    db1 %>% inner_join(db2, by = "x") %>% show_query()
-    db1 %>% inner_join(db2, by = "x", na_matches = "na") %>% show_query()
+db1 %>%
+  inner_join(db2, by = "x") %>%
+  show_query()
+db1 %>%
+  inner_join(db2, by = "x", na_matches = "na") %>%
+  show_query()
     ```
 
 -   Subqueries no longer include an `ORDER BY` clause.
     This is not part of the formal SQL specification so it has very limited support across databases.
     Now such queries generate a warning suggesting that you move your `arrange()` call later in the pipeline.
 
     ```{r}
-    lf <- lazy_frame(g = rep(1:2, each = 5), x = sample(1:10))
-    lf %>% 
-      group_by(g) %>% 
-      summarise(n = n()) %>% 
-      arrange(desc(n)) %>% 
-      filter(n > 1)
+lf <- lazy_frame(g = rep(1:2, each = 5), x = sample(1:10))
+lf %>%
+  group_by(g) %>%
+  summarise(n = n()) %>%
+  arrange(desc(n)) %>%
+  filter(n > 1)
     ```
 
     As the warning suggests, there's one exception: `ORDER BY` is still generated if a `LIMIT` is present.
     Across databases, this tends to change which rows are returned, but not necessarily their order.
 
     ```{r}
-    lf %>% 
-      group_by(g) %>% 
-      summarise(n = n()) %>% 
-      arrange(desc(n)) %>% 
-      head(5) %>% 
-      filter(n > 1)
+lf %>%
+  group_by(g) %>%
+  summarise(n = n()) %>%
+  arrange(desc(n)) %>%
+  head(5) %>%
+  filter(n > 1)
     ```
 
 -   dbplyr includes built-in backends for Redshift (which only differs from PostgreSQL in a few places) and SAP HANA. These require the development versions of [RPostgres](https://github.com/r-dbi/RPostgres) and [odbc](https://github.com/r-dbi/odbc) respectively.
 
     ```{r}
-    lf <- lazy_frame(x = "a", y = "b", con = simulate_redshift())
-    lf %>% mutate(z = paste0(x, y))
+lf <- lazy_frame(x = "a", y = "b", con = simulate_redshift())
+lf %>% mutate(z = paste0(x, y))
     ```
 
 There are a number of minor changes that affect the translation of individual functions.
@@ -166,23 +170,23 @@ Here are a few of the most important:
 -   All backends now translate `n()` to `count(*)` and support `::`
 
     ```{r}
-    lf <- lazy_frame(x = 1:10)
-    lf %>% summarise(n = dplyr::n())
+lf <- lazy_frame(x = 1:10)
+lf %>% summarise(n = dplyr::n())
     ```
 
 -   PostgreSQL gets translations for lubridate period functions:
 
     ```{r}
-    lf <- lazy_frame(x = Sys.Date(), con = simulate_postgres())
-    lf %>%
-      mutate(year = x + years(1))
+lf <- lazy_frame(x = Sys.Date(), con = simulate_postgres())
+lf %>%
+  mutate(year = x + years(1))
     ```
 
 -   Oracle assumes version 12c is available so we can use a simpler translation for `head()` that works in more places:
 
     ```{r}
-    lf <- lazy_frame(x = 1, con = simulate_oracle())
-    lf %>% head(5)
+lf <- lazy_frame(x = 1, con = simulate_oracle())
+lf %>% head(5)
     ```
 
 ## New logo

diff --git a/content/blog/2020/dplyr-1-0-0-and-vctrs/index.Rmd b/content/blog/2020/dplyr-1-0-0-and-vctrs/index.Rmd
@@ -66,7 +66,7 @@ You might wonder why we can't just copy the behaviour of `c()`. Unfortunately `c
     underlying integer levels. 
 
     ```{r}
-    c(factor("x"), factor("y"))
+c(factor("x"), factor("y"))
     ```
 
 *   It's difficult to implement methods when different classes are involved.
@@ -75,17 +75,17 @@ You might wonder why we can't just copy the behaviour of `c()`. Unfortunately `c
     first being translated.
 
     ```{r}
-    today <- as.Date("2020-03-24")
-    now <- as.POSIXct("2020-03-24 10:34")
-    
-    c(today, now)
-    # (the second value is the 11 Dec 4341727-12-11)
-    class(c(today, now))
-    unclass(c(today, now))
-    
-    c(now, today)
-    class(c(now, today))
-    unclass(c(now, today))
+today <- as.Date("2020-03-24")
+now <- as.POSIXct("2020-03-24 10:34")
+
+c(today, now)
+# (the second value is the 11 Dec 4341727-12-11)
+class(c(today, now))
+unclass(c(today, now))
+
+c(now, today)
+class(c(now, today))
+unclass(c(now, today))
     ```
 
 It's difficult to change how `c()` works because any changes are likely to break some existing code, and base R is committed to backward compatibility. Additionally, `c()` isn't the only way that base R combines vectors. `rbind()` and `unlist()` can also be used to perform a similar job, but return different results. This is not to say that the tidyverse has been any better in the past --- we have used a variety of ad hoc methods, undoubtedly using well more than three different approaches. 
@@ -97,9 +97,9 @@ Given that it's hard to fix the problem in base R, we've come up with our own al
     always get a date-time.
 
     ```{r}
-    vec_c(today, now)
+vec_c(today, now)
 
-    vec_c(now, today)
+vec_c(now, today)
     ```
 
 *   Enrichment: `vec_c(x, y)` should return the richer type, where type `<x>` 
@@ -108,17 +108,17 @@ Given that it's hard to fix the problem in base R, we've come up with our own al
     double, and that combining a date and date-time should return a date-time.
 
     ```{r}
-    vec_c(1, 1.5)
-    vec_c(today, now)
+vec_c(1, 1.5)
+vec_c(today, now)
     ```
 
 *   Consistency: `vec_c(x, y)` should error if `x` and `y` are of fundamentally 
     different types. For example, this implies that combining a string and a
     number or a factor and a date should error.
 
     ```{r, error = TRUE}
-    vec_c("a", 1)
-    vec_c(factor("x"), today)
+vec_c("a", 1)
+vec_c(factor("x"), today)
     ```
 
 ## Errors
@@ -157,8 +157,8 @@ Where possible, we attempt to give you more information to solve the problem. Fo
 
 ```{r, error = TRUE}
 df <- tibble(g = c(1, 2))
-df %>% 
-  group_by(g) %>% 
+df %>%
+  group_by(g) %>%
   mutate(y = if (g == 1) "a" else 1)
 ```
 
@@ -175,14 +175,14 @@ Using vctrs in dplyr also causes two behaviour changes. We hope that these don't
     create a factor with the union of the individual levels:
 
     ```{r}
-    vec_c(factor("x"), factor("y"))
+vec_c(factor("x"), factor("y"))
     ```
 
 *   When combining a factor and a character, dplyr previously warned about
     creating a character vector. It now silently creates a character vector:
 
     ```{r}
-    vec_c("x", factor("y"))
+vec_c("x", factor("y"))
     ```
 
 These changes are motivated more by pragmatism than by theory. Strictly speaking, one should probably consider `factor("red")` and `factor("male")` to be incompatible, but this level of strictness causes much pain because character vectors can usually be used interchangeably with factors.

diff --git a/content/blog/2020/dplyr-1-0-0-colwise/index.Rmd b/content/blog/2020/dplyr-1-0-0-colwise/index.Rmd
@@ -38,21 +38,21 @@ Today, I wanted to talk a little bit about the new `across()` function that make
 It's often useful to perform the same operation on multiple columns, but copying and pasting is both tedious and error prone:
 
 ```{r, eval = FALSE}
-df %>% 
-  group_by(g1, g2) %>% 
+df %>%
+  group_by(g1, g2) %>%
   summarise(a = mean(a), b = mean(b), c = mean(c), d = mean(c))
 ```
 
 You can now rewrite such code using `across()`, which lets you apply a transformation to multiple variables selected with the same syntax as [`select()` and `rename()`](https://www.tidyverse.org/blog/2020/03/dplyr-1-0-0-select-rename-relocate/#select-and-renaming):
 
 ```{r, eval = FALSE}
-df %>% 
-  group_by(g1, g2) %>% 
+df %>%
+  group_by(g1, g2) %>%
   summarise(across(a:d, mean))
 
 # or with a function
-df %>% 
-  group_by(g1, g2) %>% 
+df %>%
+  group_by(g1, g2) %>%
   summarise(across(where(is.numeric), mean))
 ```
 
@@ -74,17 +74,17 @@ Here are a couple of examples of `across()` used with `summarise()`:
 ```{r}
 library(dplyr, warn.conflicts = FALSE)
 
-starwars %>% 
+starwars %>%
   summarise(across(where(is.character), n_distinct))
 
-starwars %>% 
-  group_by(species) %>% 
-  filter(n() > 1) %>% 
+starwars %>%
+  group_by(species) %>%
+  filter(n() > 1) %>%
   summarise(across(c(sex, gender, homeworld), n_distinct))
 
-starwars %>% 
-  group_by(homeworld) %>% 
-  filter(n() > 1) %>% 
+starwars %>%
+  group_by(homeworld) %>%
+  filter(n() > 1) %>%
   summarise(across(where(is.numeric), mean, na.rm = TRUE), n = n())
 ```
 ## Other cool features
@@ -110,13 +110,13 @@ Why did we decide to move away from these functions in favour of `across()`?
     compute the number of rows in each group:
 
     ```{r, eval = FALSE}
-    df %>%
-      group_by(g1, g2) %>% 
-      summarise(
-        across(where(is.numeric), mean), 
-        across(where(is.factor), nlevels),
-        n = n(), 
-      )
+df %>%
+  group_by(g1, g2) %>%
+  summarise(
+    across(where(is.numeric), mean),
+    across(where(is.factor), nlevels),
+    n = n(),
+  )
     ```
 
 2.  `across()` reduces the number of functions that dplyr needs to provide.