tidytable

tidytable is a data frame manipulation library for users who need data.table speed but prefer tidyverse-like syntax.

Installation

Install the released version from CRAN with:

install.packages("tidytable")

Or install the development version from GitHub with:

# install.packages("pak")
pak::pak("markfairbanks/tidytable")

General syntax

tidytable replicates tidyverse syntax but uses data.table in the background. In general you can simply use library(tidytable) to replace your existing dplyr and tidyr code with data.table backed equivalents.

A full list of implemented functions can be found here.

library(tidytable)

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  select(x, y, z) %>%
  filter(x < 4, y > 1) %>%
  arrange(x, y) %>%
  mutate(double_x = x * 2,
         x_plus_y = x + y)
#> # A tidytable: 3 × 5
#>       x     y z     double_x x_plus_y
#>   <int> <int> <chr>    <dbl>    <int>
#> 1     1     4 a            2        5
#> 2     2     5 a            4        7
#> 3     3     6 b            6        9

Applying functions by group

You can use the normal tidyverse group_by()/ungroup() workflow, or you can use .by syntax to reduce typing. Using .by in a function is shorthand for df %>% group_by() %>% some_function() %>% ungroup().

A single column can be passed with .by = z
Multiple columns can be passed with .by = c(y, z)

df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = c(x, y))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

All functions that can operate by group have a .by argument built in. (mutate(), filter(), summarize(), etc.)

The above syntax is equivalent to:

df %>%
  group_by(x, y) %>%
  summarize(avg_z = mean(z)) %>%
  ungroup()
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Both options are available for users, so you can use the syntax that you prefer.

tidyselect support

tidytable allows you to select/drop columns just like you would in the tidyverse by utilizing the tidyselect package in the background.

Normal selection can be mixed with all tidyselect helpers: everything(), starts_with(), ends_with(), any_of(), where(), etc.

df <- data.table(
  a = 1:3,
  b1 = 4:6,
  b2 = 7:9,
  c = c("a", "a", "b")
)

df %>%
  select(a, starts_with("b"))
#> # A tidytable: 3 × 3
#>       a    b1    b2
#>   <int> <int> <int>
#> 1     1     4     7
#> 2     2     5     8
#> 3     3     6     9

A full overview of selection options can be found here.

Using tidyselect in `.by`

tidyselect helpers also work when using .by:

df <- data.table(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>%
  summarize(avg_z = mean(z),
            .by = where(is.character))
#> # A tidytable: 2 × 3
#>   x     y     avg_z
#>   <chr> <chr> <dbl>
#> 1 a     a       1.5
#> 2 b     b       3

Tidy evaluation compatibility

Tidy evaluation can be used to write custom functions with tidytable functions. The embracing shortcut {{ }} works, or you can use enquo() with !! if you prefer:

df <- data.table(x = c(1, 1, 1), y = 4:6, z = c("a", "a", "b"))

add_one <- function(data, add_col) {
  data %>%
    mutate(new_col = {{ add_col }} + 1)
}

df %>%
  add_one(x)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a           2
#> 2     1     5 a           2
#> 3     1     6 b           2

The .data and .env pronouns also work within tidytable functions:

var <- 10

df %>%
  mutate(new_col = .data$x + .env$var)
#> # A tidytable: 3 × 4
#>       x     y z     new_col
#>   <dbl> <int> <chr>   <dbl>
#> 1     1     4 a          11
#> 2     1     5 a          11
#> 3     1     6 b          11

A full overview of tidy evaluation can be found here.

`dt()` helper

The dt() function makes regular data.table syntax pipeable, so you can easily mix tidytable syntax with data.table syntax:

df <- data.table(x = 1:3, y = 4:6, z = c("a", "a", "b"))

df %>%
  dt(, .(x, y, z)) %>%
  dt(x < 4 & y > 1) %>%
  dt(order(x, y)) %>%
  dt(, double_x := x * 2) %>%
  dt(, .(avg_x = mean(x)), by = z)
#> # A tidytable: 2 × 2
#>   z     avg_x
#>   <chr> <dbl>
#> 1 a       1.5
#> 2 b       3

Speed Comparisons

For those interested in performance, speed comparisons can be found here.

Acknowledgements

tidytable is only possible because of the great contributions to R by the data.table and tidyverse teams. data.table is used as the main data frame engine in the background, while tidyverse packages like rlang, vctrs, and tidyselect are heavily relied upon to give users an experience similar to dplyr and tidyr.

Name		Name	Last commit message	Last commit date
Latest commit History 2,612 Commits
.github		.github
R		R
docs		docs
man		man
pkgdown		pkgdown
revdep		revdep
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
cran-comments.md		cran-comments.md
tidytable.Rproj		tidytable.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

tidytable

Installation

General syntax

Applying functions by group

tidyselect support

Using tidyselect in `.by`

Tidy evaluation compatibility

`dt()` helper

Speed Comparisons

Acknowledgements

About

Licenses found

Releases 29

Contributors 11

Languages

License

Licenses found

markfairbanks/tidytable

Folders and files

Latest commit

History

Repository files navigation

tidytable

Installation

General syntax

Applying functions by group

tidyselect support

Using tidyselect in .by

Tidy evaluation compatibility

dt() helper

Speed Comparisons

Acknowledgements

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 29

Contributors 11

Languages

Using tidyselect in `.by`

`dt()` helper