Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor data_color() so that it executes faster #576

Merged
merged 4 commits into from
Jun 22, 2020

Conversation

rich-iannone
Copy link
Member

The data_color() function was seen to be extremely slow. Reason for this was a combination of missed opportunities for vectorization, overuse of loops, and longer than necessary routes to an otherwise simple addition of _styles rows.

Below is reproducible code that would otherwise take 5-8 s before the changes in this PR (it now takes about 0.15 seconds):

library(gt)
library(tidyverse)
library(scales)
library(paletteer)

sza %>%
  dplyr::filter(latitude == 20) %>%
  dplyr::select(-latitude) %>%
  dplyr::filter(!is.na(sza)) %>%
  tidyr::spread(key = "tst", value = sza) %>%
  gt(rowname_col = "month") %>%
  fmt_missing(
    columns = TRUE,
    missing_text = ""
  ) %>%
  data_color(
    columns = TRUE,
    colors = scales::col_numeric(
      palette = paletteer::paletteer_d(
        palette = "ggsci::orange_material"
      ) %>% as.character(),
      domain = c(0, 90),
      na.color = "#FFFFFF"
    )
  )

Fixes: #543

R/data_color.R Outdated Show resolved Hide resolved
R/data_color.R Show resolved Hide resolved
@jcheng5
Copy link
Member

jcheng5 commented May 8, 2020

Let's take a dependency on htmltools >= 1.1.0 and remove the workaround for the scales bug.

Copy link
Member

@jcheng5 jcheng5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

@rich-iannone
Copy link
Member Author

Turns out, we already have a min version requirement of 1.1.0 on scales, so, no change required in DESCRIPTION.

@rich-iannone
Copy link
Member Author

rich-iannone commented May 8, 2020

Just confirmed that removing this code:

        if (length(colors) > 1) {
          nlvl <-
            if (is.factor(data_vals)) {
              nlevels(data_vals)
            } else {
              nlevels(factor(data_vals))
            }
          if (length(colors) > nlvl) {
            colors <- colors[seq_len(nlvl)]
          }
        }

does NOT result in non-interpolated colors with scales 1.1.0 (unit test that checks the colors correction code fails).

Code for reproducing:

tbl <-
  countrypops %>%
  dplyr::filter(country_name == "Mongolia") %>%
  dplyr::select(-contains("code")) %>%
  tail(10)

tbl[1, ] <- NA

tbl %>%
  gt() %>%
  data_color(
    columns = vars(country_name),
    colors = c("red", "orange", "green", "blue")
  )

With correction code retained, we get the correct colors (subsetting):

correct_colors

With that correction code removed, we get an interpolation of colors:

incorrect_colors

@rich-iannone rich-iannone merged commit 321bfb9 into master Jun 22, 2020
@rich-iannone rich-iannone deleted the data-color-fixes branch June 22, 2020 18:44
rich-iannone added a commit that referenced this pull request Jun 22, 2020
* master:
  Refactor `data_color()` so that it executes faster (#576)
  Update R-CMD-check.yaml (#599)
  Release gt 0.2.1 (#588)
  Ensure that row ordering doesn't affect summary row calculations (#556)
  Update failing example (#586)
  Remove test of scales behaviour
  Squelch warnings from tibble 3.0.0/3.0.1 (#557)
  Bump cache on pkgdown.yaml for GH workflow (#570)
  Update GH Actions workflow for R CMD check (#568)
  Update Description of package to be less confusing (#569)
rich-iannone added a commit that referenced this pull request Jun 22, 2020
* master:
  Restore row striping option in stub cells (`row.striping.include_stub = TRUE`) (#564)
  Refactor `data_color()` so that it executes faster (#576)
  Update R-CMD-check.yaml (#599)
  Release gt 0.2.1 (#588)
  Ensure that row ordering doesn't affect summary row calculations (#556)
  Update failing example (#586)
  Remove test of scales behaviour
rich-iannone added a commit that referenced this pull request Jun 22, 2020
* master:
  Add `scale_values` arg to `fmt_percent()` (#565)
  Restore row striping option in stub cells (`row.striping.include_stub = TRUE`) (#564)
  Refactor `data_color()` so that it executes faster (#576)
  Update R-CMD-check.yaml (#599)
  Release gt 0.2.1 (#588)
rich-iannone added a commit that referenced this pull request Jun 22, 2020
* master:
  Fix issues with defining column widths in `cols_width()` (#561)
  Add `scale_values` arg to `fmt_percent()` (#565)
  Restore row striping option in stub cells (`row.striping.include_stub = TRUE`) (#564)
  Refactor `data_color()` so that it executes faster (#576)
  Update R-CMD-check.yaml (#599)
  Release gt 0.2.1 (#588)
  Ensure that row ordering doesn't affect summary row calculations (#556)
  Update failing example (#586)
  Remove test of scales behaviour
  Squelch warnings from tibble 3.0.0/3.0.1 (#557)
  Bump cache on pkgdown.yaml for GH workflow (#570)
  Update GH Actions workflow for R CMD check (#568)
  Update Description of package to be less confusing (#569)
rich-iannone added a commit that referenced this pull request Oct 13, 2020
* master:
  Rewrite of RTF building functions and `as_rtf()` (#638)
  v0.2.2 Release Candidate (#629)
  Settable font options (#591)
  Add options for sig figs / inclusion of trailing dec marks (#546)
  Fix issues with defining column widths in `cols_width()` (#561)
  Add `scale_values` arg to `fmt_percent()` (#565)
  Restore row striping option in stub cells (`row.striping.include_stub = TRUE`) (#564)
  Refactor `data_color()` so that it executes faster (#576)
  Update R-CMD-check.yaml (#599)
  Release gt 0.2.1 (#588)
  Ensure that row ordering doesn't affect summary row calculations (#556)
rich-iannone added a commit that referenced this pull request Oct 25, 2020
* master: (30 commits)
  Spanner alignment correction (#662)
  PEN currency fix (#663)
  Fix for `gtsave()` when saving an image and specifying a `path` value (#592)
  Rewrite of RTF building functions and `as_rtf()` (#638)
  v0.2.2 Release Candidate (#629)
  Settable font options (#591)
  Add options for sig figs / inclusion of trailing dec marks (#546)
  Fix issues with defining column widths in `cols_width()` (#561)
  Add `scale_values` arg to `fmt_percent()` (#565)
  Restore row striping option in stub cells (`row.striping.include_stub = TRUE`) (#564)
  Refactor `data_color()` so that it executes faster (#576)
  Update R-CMD-check.yaml (#599)
  Release gt 0.2.1 (#588)
  Ensure that row ordering doesn't affect summary row calculations (#556)
  Update failing example (#586)
  Remove test of scales behaviour
  Squelch warnings from tibble 3.0.0/3.0.1 (#557)
  Bump cache on pkgdown.yaml for GH workflow (#570)
  Update GH Actions workflow for R CMD check (#568)
  Update Description of package to be less confusing (#569)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The data_color() function is slow
2 participants