Feedback regarding column superscripts #582

DavisVaughan · 2022-08-19T13:28:39Z

I have some feedback about the column superscripts / abbreviations. I feel like it actually adds more burden onto the programmer when working interactively.

Say you are exploring the flights data for the first time.

library(nycflights13)

flights
#> # A tibble: 336,776 × 19
#>     year month   day dep_time sched_de…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier
#>    <int> <int> <int>    <int>      <int>   <dbl>   <int>   <int>   <dbl> <chr>  
#>  1  2013     1     1      517        515       2     830     819      11 UA     
#>  2  2013     1     1      533        529       4     850     830      20 UA     
#>  3  2013     1     1      542        540       2     923     850      33 AA     
#>  4  2013     1     1      544        545      -1    1004    1022     -18 B6     
#>  5  2013     1     1      554        600      -6     812     837     -25 DL     
#>  6  2013     1     1      554        558      -4     740     728      12 UA     
#>  7  2013     1     1      555        600      -5     913     854      19 B6     
#>  8  2013     1     1      557        600      -3     709     723     -14 EV     
#>  9  2013     1     1      557        600      -3     838     846      -8 B6     
#> 10  2013     1     1      558        600      -2     753     745       8 AA     
#> # … with 336,766 more rows, 9 more variables: flight <int>, tailnum <chr>,
#> #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
#> #   minute <dbl>, time_hour <dttm>, and abbreviated variable names
#> #   ¹sched_dep_time, ²dep_delay, ³arr_time, ⁴sched_arr_time, ⁵arr_delay

If this is the first time you've seen this data, then you are probably going to take a look at the column names. But the abbreviations make it very hard to figure out what each column is named.

Take sched...4 for example. I have no idea what this column means, and there are multiple sched_* columns, so I have to spend quite a long time mapping the 4 in the superscript down to the 4sched_arr_time in the footer.

To me it feels like this is actively getting in the way of my analysis.

Another example:

I've worked with this data quite a bit, and was attempting to write mutate(flights, speed = air_time / distance, .before = sched_dep_time) but couldn't quite remember the full name of the sched_dep_time column. I had to:

Find it in the column header
Figure out the superscript number
Map that superscript number to a superscript in the footer

That is a lot of work just to figure out a column name, where previously it was one step.

My personal preference is to have no superscripts / abbreviations at all

The text was updated successfully, but these errors were encountered:

hadley · 2022-08-19T13:51:48Z

One place to start would be to massively bump up pillar.min_title_chars. I'd suggest 20 would be more appropriate.

Another principle might be to only abbreviate to a unique prefix, but I suspect would be a quite a bit of work to implement, so probably not worth it.

krlmlr · 2022-08-19T13:58:16Z

We had pillar.min_title_chars at 15, I don't mind changing the default.

There is a tension between "seeing more columns" and "being able to copy-paste column names". Showing the columns in the footer allows both.

@DavisVaughan: I hear you, but isn't this mostly a matter of setting pillar.min_title_chars to a convenient value? Are you still experiencing the same difficulties with options(pillar.min_title_chars = 15) or even options(pillar.min_title_chars = Inf) ?

DavisVaughan · 2022-08-19T13:58:19Z

I didn't know about that global option until now, but I agree that it is too low. 25 feels good to me personally.

Beyond just reducing the number of abbreviated columns, I think upping this option would also have the benefit of greatly reducing the chances for ambiguity that I mentioned earlier. i.e. if it printed out at least the first 25 characters then you'd be way less likely to run into situations where you have arr_t…³ and arr_d…⁵ or even two columns that say arr_

If we met in the middle at Hadley's suggestion of 20, then that seems good enough for me 😄

(I will probably still set this to Inf personally)

topepo · 2022-08-19T15:50:29Z

I found the superscripts jarring and not really necessary when people can use names(), str(), View() etc to get more information. I didn't see the point of this change and it was substantial enough to warrant discussion for users of tibbles.

As I said in another issue, it would be good to not increase the vertical space used just to print a tibble. Less is more.

krlmlr · 2022-08-19T16:36:10Z

Thanks for your inputs.

@DavisVaughan: How wide is your terminal typically (getOption("width"))? If you can afford the horizontal space, of course you don't need to squeeze column names.

@topepo: By default the footer is now seven lines tall at most. There is an option that controls this. (Also discussing off-band.)

DavisVaughan · 2022-08-19T17:00:49Z

@krlmlr it seems like I typically have it sit right at 80

krlmlr · 2022-08-19T17:09:52Z

One downside of options(pillar.min_title_chars = Inf) : if you have a column with a long name, other columns are omitted -- only shown as names in the footer. With a lot of white (or black) space that could otherwise be data.

In the poll at https://twitter.com/krlmlr/status/1518145876258082820, a vast majority was in favor of lowering the (then) default of 15 to 8 or even lower. I chose 5.

Users can customize to their liking. For tests, should testthat use other defaults perhaps?

DavisVaughan · 2022-08-19T17:16:56Z

if you have a column with a long name, other columns are omitted -- only shown as names in the footer

that does not and has never bothered me. But I can see how some people might want to use that space, which is why I'd be ok with 20

a vast majority was in favor of lowering the (then) default of 15 to 8 or even lower. I chose 5.

After dealing with this for a little while I am convinced that twitter is wrong 😛

DavisVaughan · 2022-08-19T17:17:19Z

FYI it is documented as 3 even though the default is indeed 5!

pillar/R/options.R

Line 107 in 5a5619d

#' title, default: `3`. Column titles may be truncated up to that width to

krlmlr · 2022-08-19T17:22:47Z

There's a PR that fixes this.

The worst that happens is if a dataset where you don't control the column names has columns of length > 15. You see a ton of white space, and still don't see the full column name. That's where the abbreviations came from. And with the abbreviations there's also the opportunity to show a bit more data.

20 means "3-4 columns" if you have expressive column names. Instead of 10-15 columns with the current default (on width 80).

If we change the default now, we should check how many packages would have to update their snapshot tests. (And perhaps help those packages with a GitHub action that opens a PR. I have one for my packages.)

wurli · 2022-10-12T12:53:01Z

A bit of extra feedback. Currently abbreviated columns are listed in the same paragraph as omitted column names - I think an extra linebreak here would improve readability. They also appear after omitted column names, not before. I'd suggest swapping the order here; this would more closely reflect the data structure and, IMO, would be more intuitive for end users.

E.g. we currently have this:

library(purrr)

words <- c("appropriate", "department", "difference", "environment", "experience")
cols <- do.call(paste, c(expand.grid(words, words), sep = "_"))
wide_data <- cols |> set_names() |> map_dfc(~ sample(letters, 25))
 
# Current behaviour:
wide_data 
#> # A tibble: 25 × 25
#>    approp…¹ depar…² diffe…³ envir…⁴ exper…⁵ appro…⁶ depar…⁷ diffe…⁸ envir…⁹ exper…˟ appro…˟ depar…˟ diffe…˟
#>    <chr>    <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 q        x       n       c       h       y       l       o       m       w       v       z       j      
#>  2 a        j       c       m       t       q       k       z       r       v       y       s       n      
#>  3 e        h       j       r       j       s       p       i       p       t       b       n       w      
#>  4 x        m       y       i       a       k       y       q       w       o       h       a       s      
#>  5 s        r       u       p       v       w       a       p       q       l       s       m       f      
#>  6 i        k       q       q       f       n       d       r       h       x       j       d       x      
#>  7 d        s       e       h       e       m       w       t       o       b       p       x       b      
#>  8 z        i       g       t       x       g       o       d       i       r       l       b       z      
#>  9 p        v       v       e       d       h       z       n       y       p       g       y       e      
#> 10 c        o       b       g       r       o       r       a       s       h       x       e       o      
#> # … with 15 more rows, 12 more variables: environment_difference <chr>, experience_difference <chr>,
#> #   appropriate_environment <chr>, department_environment <chr>, difference_environment <chr>,
#> #   environment_environment <chr>, experience_environment <chr>, appropriate_experience <chr>,
#> #   department_experience <chr>, difference_experience <chr>, environment_experience <chr>,
#> #   experience_experience <chr>, and abbreviated variable names ¹appropriate_appropriate,
#> #   ²department_appropriate, ³difference_appropriate, ⁴environment_appropriate, ⁵experience_appropriate,
#> #   ⁶appropriate_department, ⁷department_department, ⁸difference_department, ⁹environment_department, …
#> # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names


# Proposed behaviour would be something like this:
wide_data
#> # A tibble: 25 × 25
#>    approp…¹ depar…² diffe…³ envir…⁴ exper…⁵ appro…⁶ depar…⁷ diffe…⁸ envir…⁹ exper…˟ appro…˟ depar…˟ diffe…˟
#>    <chr>    <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 q        x       n       c       h       y       l       o       m       w       v       z       j      
#>  2 a        j       c       m       t       q       k       z       r       v       y       s       n      
#>  3 e        h       j       r       j       s       p       i       p       t       b       n       w      
#>  4 x        m       y       i       a       k       y       q       w       o       h       a       s      
#>  5 s        r       u       p       v       w       a       p       q       l       s       m       f      
#>  6 i        k       q       q       f       n       d       r       h       x       j       d       x      
#>  7 d        s       e       h       e       m       w       t       o       b       p       x       b      
#>  8 z        i       g       t       x       g       o       d       i       r       l       b       z      
#>  9 p        v       v       e       d       h       z       n       y       p       g       y       e      
#> 10 c        o       b       g       r       o       r       a       s       h       x       e       o      
#> # … with 15 more rows
#> # • abbreviated variables: ¹appropriate_appropriate, ²department_appropriate, ³difference_appropriate, 
#> #   ⁴environment_appropriate, ⁵experience_appropriate, ⁶appropriate_department, ⁷department_department,
#> #   ⁸difference_department, ⁹environment_department, …
#> # • 12 more variables: environment_difference <chr>, experience_difference <chr>,
#> #   appropriate_environment <chr>, department_environment <chr>, difference_environment <chr>,
#> #   environment_environment <chr>, experience_environment <chr>, appropriate_experience <chr>,
#> #   department_experience <chr>, difference_experience <chr>, environment_experience <chr>,
#> #   experience_experience <chr>

DavisVaughan · 2022-10-12T13:28:07Z

Oh I really like that idea!

hadley · 2022-10-12T13:49:28Z

Me too

krlmlr · 2022-10-16T13:25:19Z

I like bullets (#542), we need to find a way to reduce the impact on the tidymodels team.

DavisVaughan · 2022-10-17T12:59:23Z

If it is just about updating snapshot tests, I think the tidymodels team is going to be more than willing to update their snapshots in favor of some improvements with the superscripts/footers here.

Part of #582

hadley mentioned this issue Oct 17, 2022

Draft: bullets in footer #542

Closed

krlmlr mentioned this issue Oct 30, 2022

feat: Use bullets to format details #617

Merged

hadley added a commit that referenced this issue Nov 23, 2022

Bump min_title_chars back up

3e00cd4

Part of #582

hadley mentioned this issue Nov 23, 2022

feat: Bump min_title_chars back up #620

Merged

krlmlr closed this as completed in #617 Dec 4, 2022

github-actions bot locked and limited conversation to collaborators Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback regarding column superscripts #582

Feedback regarding column superscripts #582

DavisVaughan commented Aug 19, 2022 •

edited

Loading

hadley commented Aug 19, 2022 •

edited

Loading

krlmlr commented Aug 19, 2022

DavisVaughan commented Aug 19, 2022 •

edited

Loading

topepo commented Aug 19, 2022

krlmlr commented Aug 19, 2022

DavisVaughan commented Aug 19, 2022

krlmlr commented Aug 19, 2022

DavisVaughan commented Aug 19, 2022 •

edited

Loading

DavisVaughan commented Aug 19, 2022

krlmlr commented Aug 19, 2022

wurli commented Oct 12, 2022 •

edited

Loading

DavisVaughan commented Oct 12, 2022

hadley commented Oct 12, 2022

krlmlr commented Oct 16, 2022

DavisVaughan commented Oct 17, 2022

Feedback regarding column superscripts #582

Feedback regarding column superscripts #582

Comments

DavisVaughan commented Aug 19, 2022 • edited Loading

hadley commented Aug 19, 2022 • edited Loading

krlmlr commented Aug 19, 2022

DavisVaughan commented Aug 19, 2022 • edited Loading

topepo commented Aug 19, 2022

krlmlr commented Aug 19, 2022

DavisVaughan commented Aug 19, 2022

krlmlr commented Aug 19, 2022

DavisVaughan commented Aug 19, 2022 • edited Loading

DavisVaughan commented Aug 19, 2022

krlmlr commented Aug 19, 2022

wurli commented Oct 12, 2022 • edited Loading

DavisVaughan commented Oct 12, 2022

hadley commented Oct 12, 2022

krlmlr commented Oct 16, 2022

DavisVaughan commented Oct 17, 2022

DavisVaughan commented Aug 19, 2022 •

edited

Loading

hadley commented Aug 19, 2022 •

edited

Loading

DavisVaughan commented Aug 19, 2022 •

edited

Loading

DavisVaughan commented Aug 19, 2022 •

edited

Loading

wurli commented Oct 12, 2022 •

edited

Loading