Cache styling #538

lorenzwalthert · 2019-08-15T07:28:24Z

Closes #320.

Goal

The goal is to make styler remember what it styled so the pre-commit hook in https://github.com/lorenzwalthert/pre-commit-hooks is much faster and functions like styler::style_pkg() are much faster if run on already styled files.

Requirements

The requirements are:

must work on all plattforms and must not be error-prone.
the user should be able to manage it (know its size and location, deactivate,
clear, delete)
should be as close to the functionality "once seen, by style_text(), Addin or
style_file() remembered forever".
The cache should be default be enabled, but the required packages should be in
Suggests to keep the core installation lightweight. If R.cache is not
installed, we should issue a warning message, deactivate the caching feature
for the current R session and ask the user to install the dependencies and or
permanently disable the feature in their .Rprofile with
usethis::edit_r_profile().
For this reason, the styling should also work when R.cache is not installed,
which requires every R.cache call to be wrapped in a conditional.
The advanced user should be able to understand how R.cache was used to
implement the caching.

Conceputal

In this PR we introduce caching to styler. We follow the approach outlined in
#320 (comment), which basically is:

check if text to style is in cache.
if not, style it and add styled text to cache.

The approach has the following advantages:

Very simple to implement.
API agnostic. Works for files, text, Addin because it operates on the text
level, which is quite low level.
Because approach is path and modification time independent, it can cache the
same content in different locations, including renaming, copies in multiple
places as well as moving files. Can cache multiple versions of a file, e.g. on
different branches, when going back and forth in the git history.
Cache remains very small because no actual code is cached.

The cache must be styler version specific because if not, updated styling rules
won't be applied.

Implementation

We use R.cache to power the caching.
We use R options to manage it, with additional functional wrappers to modify
the options.
There must be one cache per styler version and for testing purposes, the user
must be able to specify a cache maunally (mainly to be able to delete the
cache of the tests without deleting the cache he uses as a user).
To not convolute the R.cache caching directory, we use a dir under the
caching root that corresponds to /styler/cache_name, where cache_name is
the use specified cache name, defaulting to the installed version number from
DESCRIPTION.
We modify .travis.yaml to also test behavior if R.cache is not installed.

Todo:

codecov-io · 2019-08-29T21:39:38Z

Codecov Report

Merging #538 into master will decrease coverage by 0.37%.
The diff coverage is 82%.

@@            Coverage Diff             @@
##           master     #538      +/-   ##
==========================================
- Coverage   90.83%   90.46%   -0.38%     
==========================================
  Files          43       45       +2     
  Lines        1801     1898      +97     
==========================================
+ Hits         1636     1717      +81     
- Misses        165      181      +16

Impacted Files	Coverage Δ
R/io.R	`84.21% <ø> (ø)`	⬆️
R/ui-styling.R	`100% <ø> (ø)`
R/zzz.R	`0% <0%> (ø)`	⬆️
R/addins.R	`0% <0%> (ø)`	⬆️
R/transform-files.R	`100% <100%> (ø)`	⬆️
R/utils-cache.R	`100% <100%> (ø)`
R/ui-caching.R	`100% <100%> (ø)`
R/communicate.R	`42.3% <12.5%> (-47.7%)`	⬇️
R/rules-line-break.R	`100% <0%> (ø)`	⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d8470b...a00f166. Read the comment docs.

lorenzwalthert · 2019-09-10T12:03:01Z

With the latest two commits, we use my fork of R.cache to store empty files for caching, as suggested in HenrikBengtsson/R.cache#37, beacuse otherwise, each cached code expression needs 4KB on disk.

krlmlr · 2019-09-22T21:24:13Z

Is this creating one file per expression, or one file per file processed?

lorenzwalthert · 2019-09-22T21:27:19Z

One per file. I thought we could also do one per expression, it would give an additional speed boost in case you have big expressions and modify just one. The implementation would also be more complex though. I think as of now, it's pretty slim (excluding all the cache management tools that most people will never touch). However, there is (obviously) a trade-off with file size as long as caching costs us one block, e.g. 4KB memory on macOS.

krlmlr · 2019-09-23T09:08:53Z

Can we do one cache object per R project?

lorenzwalthert · 2019-09-23T09:22:03Z

Do you mean RStudio Projects associated with a working directory?

krlmlr · 2019-09-23T09:47:17Z

Project = package, analysis project, RStudio project -- in the sense of {here} or usethis::proj_get().

lorenzwalthert · 2019-09-23T16:01:44Z

I am not sure I understand what you mean. I don't think we should convolute the project directory with a styler cache to make the caching directory specific. What do you think would be the advantages of this approach? Advantages of a central cache:

Can be removed very easily, not scattered in many places.
Not yet another file to gitignore, .Rbuildignore or whatever in every project repo.
The same code in different places can be cached easily.

Also, I think it would not help solving the problem with the block size. Also, if possible, I'd like to use R.cache because:

CRAN policy is comparatively string on writing to anywhere on the file system.
Offload implementation of cache to another package keeps complexity low, e.g. initialization, error handling.

krlmlr

Good idea to make this package easier to use in practice!

krlmlr · 2019-09-23T16:43:43Z

DESCRIPTION

@@ -27,7 +27,8 @@ Imports:
    tibble (>= 1.4.2),
    tools,
    withr (>= 1.0.0),
-    xfun (>= 0.1)
+    xfun (>= 0.1),
+    R.cache (>= 0.13.0.9000)


Did you mean to add {R.cache} to "Suggests"? Leaving it in "Imports" seems easier.

I see it was only added in the very last commit cb2dc17363dfaa39f4025a1895ffed86556e5ceb when I added my own fork as a remote dependency. Probably caused by unintentional usethis::use_dep(). Do you think it's unnecessary to keep it in Suggests? It needs some extra handling in other places, I agree.

setdiff(miniCRAN::pkgDep("R.cache"), miniCRAN::pkgDep("styler")) #> [1] "R.cache" "R.methodsS3" "R.oo" "R.utils"

^{Created on 2019-09-23 by the reprex package (v0.3.0)}

NEWS.md

R/ui-caching.R

krlmlr · 2019-09-23T16:49:07Z

R/transform-files.R

+    )
+    should_use_cache <- cache_is_activated()
+    use_cache <- is_cached && should_use_cache
+    if (!use_cache) {


Add an early return like if (use_cache) return(text) ?

R/utils-cache.R

krlmlr · 2019-09-23T16:56:14Z

R/transform-files.R

+      R.cache::findCache(key = hash_standardize(text), dir = cache_dir)
+    )
+    should_use_cache <- cache_is_activated()
+    use_cache <- is_cached && should_use_cache


Suggested change

use_cache <- is_cached && should_use_cache

use_cache <- should_use_cache && is_cached(text)

with a suitable is_cached() function avoids calling hash_standardize() if not needed.

lorenzwalthert · 2019-09-23T17:58:03Z

Thanks for reviewing @krlmlr. Feels like the good old times are back 🎉 .

krlmlr · 2019-10-01T13:57:34Z

Do we need to include the style in the hash key, so that we recompute when the style changes?

lorenzwalthert · 2019-10-13T10:25:43Z

I guess we should. The question is how. Because currently, the version of styler determines which cache to use by default. Also, if people have their own style guide they supply, we should probably also hash that function and include it in the hash, leaving the R.cache directory structure as is, i.e.

~.R.cache/styler/:version/:hash(transformer, text)

In addition, if we NULL mapped to the empty file in HenrikBengtsson/R.cache#37, this reduces space consumption drastiaclly so we can also cache on the top-level expression level, further improving speed. By top-level, I mean the number of expressions that have 0 as a parent. Example with two expressions:

x <- 3
gg <- function() {
  if (TRUE) {
    f(x, na = 3)
  }
}

still failing: mixed case

hack: convert list with envs to text, otherwise it does not work

… return different things. for reference 43219ixmypi.

krlmlr · 2019-12-01T12:04:04Z

I tweaked a bit, works for me with {dm}.

How can we communicate to the caller that a cached values is used? I think for this we need to get rid of transform_utf8() and intercept at an earlier stage.
1. Read file
2. Check if cached
3. If not, style
4. If changed, cache and write
Can we map empty files to NULL in {R.cache}?

lorenzwalthert · 2019-12-01T16:55:29Z

Can we map empty files to NULL in {R.cache}?

Tracked in HenrikBengtsson/R.cache#37.

How can we communicate to the caller that a cached values is used?

I am not sure this is necessary because it seems quite cumbersome to implement. You mean in the console output or also in the data frame returned invisibly by style_pkg() and friends? For consistency, I think we'd need to do that in both or none. The reason we check if text is cached in make_transformer() is because all style API functions (like the Addins, style_text(), etc.) all use this function.

Also, if you want to read in only once (i.e. transform_utf8_one()) and use a functional approach (i.e. not storing this information in an environment), it would require adapting the code in many places and pass back the information about whether or not the cache was used. it would have to be propagated up the chain in this order I think:

transform_file() <- transform_code() <- transform_utf8() <- transform_utf8_one() <- make_transformer()

Creating file manually instead of mapping NULL to empty file

lorenzwalthert · 2019-12-05T10:11:06Z

I think we should not communicate to the user if the cache was used, at least not in this PR. It involves refactoring quite a few things and the PR is already getting too large. Please open another issue if you think it's important. If something is wrong, the user can always turn off the cache completely.

lorenzwalthert · 2019-12-05T10:11:57Z

Seems that with 36606a2, we are able to consume 0 KB instead of block size per cached value, so that one is resolved.

krlmlr · 2019-12-05T11:26:38Z

Plus the inode, but these are cheap.

Let's see how it works without notifying the user.

lorenzwalthert · 2019-12-15T11:37:23Z

There are two remaining issues with this PR:

transformers should have a version, so we don't need to cache based on text of transformers (as this won't work in general for third party style guides) and revise documentation for it. This affects the mlr style guide. This is discussed in issue Make caching work more robust for third-party style guides #571.
Caching could be improved and treated in issue Cache on expression level #570.

We will merge this and open new issues for these tasks.

lorenzwalthert changed the title ~~Support caching of styling~~ Cache styling Aug 15, 2019

lorenzwalthert force-pushed the caching branch 4 times, most recently from 320ec21 to bfb5053 Compare August 16, 2019 15:54

lorenzwalthert marked this pull request as ready for review August 22, 2019 17:26

lorenzwalthert force-pushed the caching branch 4 times, most recently from a5899f8 to 5ab8389 Compare August 22, 2019 21:22

lorenzwalthert mentioned this pull request Sep 2, 2019

Map NULL to empty file to consume zero memory instead of block size HenrikBengtsson/R.cache#37

Closed

lorenzwalthert force-pushed the caching branch from f6a8923 to cb2dc17 Compare September 22, 2019 18:16

lorenzwalthert mentioned this pull request Sep 22, 2019

Release scheduling #440

Closed

krlmlr approved these changes Sep 23, 2019

View reviewed changes

lorenzwalthert mentioned this pull request Sep 26, 2019

Don't write to the test directory during testing #548

Merged

lorenzwalthert mentioned this pull request Oct 16, 2019

Styler is slow #558

Open

lorenzwalthert force-pushed the caching branch 3 times, most recently from 3479b98 to e1adb92 Compare October 22, 2019 19:40

lorenzwalthert added 12 commits November 1, 2019 14:20

chache size should be 0 with shallow option

f2b16a5

move R.cache to suggest as initially thought

427a1c0

replace cache_derive_name() with constant

2996302

random roxygenize

9326f44

capsule line break conversion in a function

0a81566

make cache work when text is supplied with embedded line breaks

c0e1292

still failing: mixed case

need pkg qualifier for example with internal function

c2c8e63

some version that seems to work

6993203

hack: convert list with envs to text, otherwise it does not work

make commit to keep reproducible example where hashes of transformers…

4f3ab57

… return different things. for reference 43219ixmypi.

document edge cases

172ad9a

add test

5efc656

again remove xfun reference

d013cea

lorenzwalthert force-pushed the caching branch from a7eae8f to d013cea Compare November 1, 2019 13:26

krlmlr and others added 2 commits December 1, 2019 12:40

Merge branch 'master' into caching

4fb19f9

Tweaks

abb204e

lorenzwalthert added 2 commits December 4, 2019 22:07

use upstream R.cache for caching

36606a2

Creating file manually instead of mapping NULL to empty file

new documentation roxygen version

c6c3aff

lorenzwalthert added 4 commits December 13, 2019 22:52

don't cat info on cached

6d3cd88

R.cache version we depend on now on CRAN

6f15cc8

Merge branch 'master' into caching

1db8262

more info

0d41f9b

lorenzwalthert merged commit 6a03236 into r-lib:master Dec 15, 2019

lorenzwalthert deleted the caching branch December 15, 2019 11:37

lorenzwalthert mentioned this pull request Jan 6, 2020

styler too slow for large script and blocks lsp REditorSupport/languageserver#141

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache styling #538

Cache styling #538

lorenzwalthert commented Aug 15, 2019 •

edited

Loading

codecov-io commented Aug 29, 2019 •

edited

Loading

lorenzwalthert commented Sep 10, 2019 •

edited

Loading

krlmlr commented Sep 22, 2019

lorenzwalthert commented Sep 22, 2019

krlmlr commented Sep 23, 2019

lorenzwalthert commented Sep 23, 2019

krlmlr commented Sep 23, 2019

lorenzwalthert commented Sep 23, 2019

krlmlr left a comment

krlmlr Sep 23, 2019

lorenzwalthert Sep 23, 2019

krlmlr Sep 23, 2019

krlmlr Sep 23, 2019

lorenzwalthert commented Sep 23, 2019

krlmlr commented Oct 1, 2019

lorenzwalthert commented Oct 13, 2019 •

edited

Loading

krlmlr commented Dec 1, 2019

lorenzwalthert commented Dec 1, 2019

lorenzwalthert commented Dec 5, 2019 •

edited

Loading

lorenzwalthert commented Dec 5, 2019 •

edited

Loading

krlmlr commented Dec 5, 2019

lorenzwalthert commented Dec 15, 2019 •

edited

Loading

	use_cache <- is_cached && should_use_cache
	use_cache <- should_use_cache && is_cached(text)

Cache styling #538

Cache styling #538

Conversation

lorenzwalthert commented Aug 15, 2019 • edited Loading

Goal

Requirements

Conceputal

Implementation

codecov-io commented Aug 29, 2019 • edited Loading

Codecov Report

lorenzwalthert commented Sep 10, 2019 • edited Loading

krlmlr commented Sep 22, 2019

lorenzwalthert commented Sep 22, 2019

krlmlr commented Sep 23, 2019

lorenzwalthert commented Sep 23, 2019

krlmlr commented Sep 23, 2019

lorenzwalthert commented Sep 23, 2019

krlmlr left a comment

Choose a reason for hiding this comment

krlmlr Sep 23, 2019

Choose a reason for hiding this comment

lorenzwalthert Sep 23, 2019

Choose a reason for hiding this comment

krlmlr Sep 23, 2019

Choose a reason for hiding this comment

krlmlr Sep 23, 2019

Choose a reason for hiding this comment

lorenzwalthert commented Sep 23, 2019

krlmlr commented Oct 1, 2019

lorenzwalthert commented Oct 13, 2019 • edited Loading

krlmlr commented Dec 1, 2019

lorenzwalthert commented Dec 1, 2019

lorenzwalthert commented Dec 5, 2019 • edited Loading

lorenzwalthert commented Dec 5, 2019 • edited Loading

krlmlr commented Dec 5, 2019

lorenzwalthert commented Dec 15, 2019 • edited Loading

lorenzwalthert commented Aug 15, 2019 •

edited

Loading

codecov-io commented Aug 29, 2019 •

edited

Loading

lorenzwalthert commented Sep 10, 2019 •

edited

Loading

lorenzwalthert commented Oct 13, 2019 •

edited

Loading

lorenzwalthert commented Dec 5, 2019 •

edited

Loading

lorenzwalthert commented Dec 5, 2019 •

edited

Loading

lorenzwalthert commented Dec 15, 2019 •

edited

Loading