Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README_files/figure-gfm Initial commit Mar 24, 2019
.gitignore Initial commit Mar 24, 2019
README.Rmd Initial commit Mar 24, 2019
README.md Initial commit Mar 24, 2019
results.csv Initial commit Mar 24, 2019
table-shapes.Rproj Initial commit Mar 24, 2019
table.graffle Initial commit Mar 24, 2019
table.png Initial commit Mar 24, 2019

README.md

Pivot function names

On 2019-03-22, I tweeted about a survey to help me pick names for the new pivot functions in the dev version of tidyr.

In the survey, I showed a picture of two tables containing the same data, and asked participants to describe their relative shapes. This document describes the results.

Table A has four columns (id, x, y, z) and 2 rows. Table B has three columns (id, n, x) and six rows

library(googlesheets)
library(tidyverse)

# This googlesheet is public if you want to do your own analysis
key <- gs_key("1Do5R1k5sEZrwU0N1KmIjKaapHDrf7eYLdIlcGNx-MsI")
results <- googlesheets::gs_read(key, col_types = list())
names(results) <- c("timestamp", "table_a", "table_b")
head(results)
#> # A tibble: 6 x 3
#>   timestamp         table_a                     table_b                    
#>   <chr>             <chr>                       <chr>                      
#> 1 3/22/2019 10:30:… Table A is tidier than Tab… Table B is deeper than Tab…
#> 2 3/22/2019 10:31:… Table A is wider than Tabl… Table B is taller than Tab…
#> 3 3/22/2019 10:32:… Table A is wider than Tabl… Table B is taller than Tab…
#> 4 3/22/2019 10:33:… Table A is wider than Tabl… Table B is longer than Tab…
#> 5 3/22/2019 10:34:… Table A is wider than Tabl… Table B is longer than Tab…
#> 6 3/22/2019 10:37:… Table A is shallower than … Table B is narrower than T…

nrow(results)
#> [1] 2649

# Capture for posterity
write_csv(results, "results.csv")

Table A -> Table B

Wider is the clear winner with ~80% of responses.

table_a <- results %>% 
  filter(!is.na(table_a)) %>% 
  mutate(top3 = table_a %>% fct_lump(3) %>% fct_infreq() %>% fct_rev()) %>%
  count(top3) %>% 
  mutate(prop = n / sum(n))

table_a %>% 
  ggplot(aes(top3, prop)) +
  geom_col() +
  scale_y_continuous(labels = scales::percent) +
  labs(
    x = NULL,
    y = "Percent of responses"
  ) + 
  coord_flip()

There were a wide range of write in respones. The most popular included concise, compact, condense, denser.

results %>% 
  mutate(
    table_a = table_a %>% 
      str_remove("Table A is ") %>% 
      str_remove(" than Table B") %>% 
      str_trunc(50)
  ) %>% 
  count(table_a, sort = TRUE) %>% 
  print(n = Inf)
#> # A tibble: 99 x 2
#>    table_a                                                n
#>    <chr>                                              <int>
#>  1 wider                                               2121
#>  2 shorter                                              328
#>  3 shallower                                             90
#>  4 Compact                                                4
#>  5 more compact                                           4
#>  6 Denser                                                 3
#>  7 <NA>                                                   2
#>  8 Condensed                                              2
#>  9 fatter than table B                                    2
#> 10 Horizontal                                             2
#> 11 smaller                                                2
#> 12 wider and shorter                                      2
#> 13 %>% select(x, y, z) %>%                                1
#> 14 3D                                                     1
#> 15 "a \"condensed\" version of Table B"                   1
#> 16 a gathered Table B                                     1
#> 17 A is a condense of B                                   1
#> 18 A is more collective than B                            1
#> 19 A is the pivot shape of B                              1
#> 20 A is the unpivot of B                                  1
#> 21 buxom                                                  1
#> 22 By individual / by observation                         1
#> 23 cases by variable, i.e. wider                          1
#> 24 cleaner                                                1
#> 25 coiled up compared to table B                          1
#> 26 colum heavy                                            1
#> 27 column-heavy as compared to Table B                    1
#> 28 Compact. Far fewer data cells required in table A.     1
#> 29 concise version table B                                1
#> 30 Concise, Row Complete, completeR, conciseR             1
#> 31 condensed than table B                                 1
#> 32 Denormslised                                           1
#> 33 Dense                                                  1
#> 34 denser than table B.                                   1
#> 35 distinct, indexed on ID, fact table                    1
#> 36 expanded by columns                                    1
#> 37 Fater                                                  1
#> 38 fatter                                                 1
#> 39 Flatter                                                1
#> 40 For this specific example, I would say shorter....     1
#> 41 Funkier                                                1
#> 42 horizontal vs Table B is vertical                      1
#> 43 horizontal, or column-wise                             1
#> 44 implicit                                               1
#> 45 In A each observation repeats only once                1
#> 46 Keep it as gather/spread.                              1
#> 47 Less melted                                            1
#> 48 more aggregated                                        1
#> 49 More compact                                           1
#> 50 more compact / succinct / concise                      1
#> 51 more compact and easy to read                          1
#> 52 more compact than table B                              1
#> 53 More compact.                                          1
#> 54 more compressed.                                       1
#> 55 more concentrated than table B                         1
#> 56 more concise                                           1
#> 57 more condensed                                         1
#> 58 More dense (more info per cell)                        1
#> 59 more spread                                            1
#> 60 notidy                                                 1
#> 61 pedantic                                               1
#> 62 pivot                                                  1
#> 63 Pivoted                                                1
#> 64 pivoted on “n” as its columns                          1
#> 65 Reshape                                                1
#> 66 row_condensed                                          1
#> 67 Russia                                                 1
#> 68 Short fat                                              1
#> 69 short-form of Table B.                                 1
#> 70 short-wide                                             1
#> 71 Shorter and fatter                                     1
#> 72 shorter and fatter (wider)                             1
#> 73 shorter and wider                                      1
#> 74 shorter and wider but not shallower                    1
#> 75 shorter and wider than B                               1
#> 76 skimpy                                                 1
#> 77 spreaded on b                                          1
#> 78 squash                                                 1
#> 79 Squatter                                               1
#> 80 stacked by id compared to Table B                      1
#> 81 Stout                                                  1
#> 82 "Table \"A\" covers less area than Table \"B\""        1
#> 83 Table 1 is better organized for the human inter...     1
#> 84 Table A has grouped/summarized information from...     1
#> 85 Table A has more columns but less rows                 1
#> 86 table A is of 2 dimensions, while B is of 1 in ...     1
#> 87 Table A looks like a waffle                            1
#> 88 Table A spreads more horizontally than table B         1
#> 89 Table B is CR(Column to Row) transformation of ...     1
#> 90 Table B is taller than Table A                         1
#> 91 Table is wide table b is long                          1
#> 92 Tabular                                                1
#> 93 Te same table, but different presentation or da...     1
#> 94 the before picture of Table B.                         1
#> 95 the unstacked form of Table B                          1
#> 96 They are similar, containing the same information      1
#> 97 thicc while Table B is a sticc                         1
#> 98 tidier                                                 1
#> 99 unique on one variable and is wider                    1

Table B -> Table A

Longer is the clear winner with ~70% of responses. Given the number of people who suggested taller to me, I had expected it to come in much higher. Interestingly narrower is much less common than shorter, it’s equivalent above.

table_b <- results %>% 
  filter(!is.na(table_b)) %>% 
  mutate(top3 = table_b %>% fct_lump(3) %>% fct_infreq() %>% fct_rev()) %>%
  count(top3) %>% 
  mutate(prop = n / sum(n))

table_b %>% 
  ggplot(aes(top3, prop)) +
  geom_col() +
  scale_y_continuous(labels = scales::percent) +
  labs(
    x = NULL,
    y = "Percent of responses"
  ) + 
  coord_flip()

There were a wide range of write in respones. The most popular included expanded and skinnier.

results %>% 
  mutate(
    table_b = table_b %>% 
      str_remove("Table B is ") %>% 
      str_remove(" than Table A") %>% 
      str_trunc(50)
  ) %>% 
  count(table_b, sort = TRUE) %>% 
  print(n = Inf)
#> # A tibble: 96 x 2
#>    table_b                                                      n
#>    <chr>                                                    <int>
#>  1 longer                                                    1844
#>  2 taller                                                     419
#>  3 narrower                                                   171
#>  4 deeper                                                     116
#>  5 <NA>                                                         7
#>  6 Vertical                                                     2
#>  7 %>% filter(n=='x'|n=='y'|n=='z') %>%                         1
#>  8 2d                                                           1
#>  9 a detailed version of table A                                1
#> 10 a panel, Table A is not                                      1
#> 11 a spread Table A                                             1
#> 12 "an \"expanded\" version of Table A"                         1
#> 13 Atomic                                                       1
#> 14 B is a sublimate of A                                        1
#> 15 B is more individual than A                                  1
#> 16 B is the itemized shape of A                                 1
#> 17 B is the pivot of A                                          1
#> 18 bigger                                                       1
#> 19 By attribute / by key-value                                  1
#> 20 Chile                                                        1
#> 21 clearer than table B.                                        1
#> 22 combonation of variable is unique and is longer              1
#> 23 Down                                                         1
#> 24 Expanded                                                     1
#> 25 expanded by rows                                             1
#> 26 Expanded.                                                    1
#> 27 expansive than table A                                       1
#> 28 explicit                                                     1
#> 29 Extended                                                     1
#> 30 flatter than table A (i.e The value column is t...           1
#> 31 gaunt                                                        1
#> 32 Groovier                                                     1
#> 33 "I think if I hadn't heard of \"wide\" and \"long\",..."     1
#> 34 Keep it as gather/spread.                                    1
#> 35 less compact                                                 1
#> 36 Less dense                                                   1
#> 37 long and narrow                                              1
#> 38 longer and narrower                                          1
#> 39 Looser                                                       1
#> 40 melter                                                       1
#> 41 more diluted than table A                                    1
#> 42 more fragmented or elemental                                 1
#> 43 more likely to be drafted in the NBA. Jk, it's ...           1
#> 44 More melted                                                  1
#> 45 more repetitive / verbose / redundant                        1
#> 46 More slender                                                 1
#> 47 more stretched out.                                          1
#> 48 more verbose                                                 1
#> 49 more vertically stacked than table A                         1
#> 50 narrow-long                                                  1
#> 51 narrower and longer                                          1
#> 52 Needs more rows for each observation                         1
#> 53 Normalised                                                   1
#> 54 Not deeper                                                   1
#> 55 outrolled                                                    1
#> 56 pivot                                                        1
#> 57 pivoted with “n” in its rows                                 1
#> 58 portrait vs. Table A is landscape                            1
#> 59 redundundant                                                 1
#> 60 repeated observation/measurement of cases by va...           1
#> 61 row heavy                                                    1
#> 62 row_expanded                                                 1
#> 63 row-heavy as compared to Table A                             1
#> 64 skinnier.                                                    1
#> 65 Skinnier/skinny                                              1
#> 66 Skinny                                                       1
#> 67 Slack                                                        1
#> 68 Sparser                                                      1
#> 69 Sparser than table A                                         1
#> 70 stack                                                        1
#> 71 Stacked                                                      1
#> 72 stretched by n compared to Table A                           1
#> 73 stretched out compared to Table A                            1
#> 74 "Table \"B\" has creates options than Table \"A\""           1
#> 75 Table A is RC transformation of Table B                      1
#> 76 Table B has more rows but less columns                       1
#> 77 Take B is skinnier                                           1
#> 78 Tall skinny                                                  1
#> 79 taller and thinner                                           1
#> 80 Taller and thinner                                           1
#> 81 the after photo of Table A                                   1
#> 82 the long-form of Table A.                                    1
#> 83 the stacked form of Table A                                  1
#> 84 Thinner                                                      1
#> 85 thinner and longer                                           1
#> 86 Tidier                                                       1
#> 87 tidy                                                         1
#> 88 transaction table, not indexed on ID                         1
#> 89 transposed of table A                                        1
#> 90 Unbind variables for each id                                 1
#> 91 Unpivoted                                                    1
#> 92 Vectorial                                                    1
#> 93 verbose                                                      1
#> 94 vertical                                                     1
#> 95 wider                                                        1
#> 96 worse                                                        1

Conclusion

The new functions will be called pivot_wider() and pivot_longer(): these are not the most natural names for everyone, but they are are the most popular by a large margin. I like pivot because it suggests the form of the underlying operation (a pivoting or rotation), and it is evocative to excel users.

A few alternatives that were suggested, considered, and rejected:

  • VERB_long()/VERB_wide(): not obvious whether they take long/wide data or return long/wide data.

  • VERB_to_long()/VERB_to_wider(): implies that long and wide are absolute terms. I don’t think it makes sense to talk about long or wide form data; you can only say one form is longer or wider than another form.

  • to_long()/to_wide(): isn’t a verb, and implies that there’s only one operation that makes data longer/wider. The next version of tidyr will also contain functions that unnest list-columns of vectors, and that verb (name TBA) also needs directional suffixes.

  • reshape_SHAPE: too much potential for confusion with the existing base::resahpe()

  • gather()/spread(): while some people clearly liked these functions they were not memorable to a large number of people I talked to.

I appreciate the enthusiasm that people have for naming functions!

You can’t perform that action at this time.