Spread out cache values into different storr namespaces #129

wlandau-lilly · 2017-11-01T23:42:27Z

wlandau-lilly · 2017-11-02T19:33:51Z

Current namespaces:

namespace	description
`build_times`	build times of each target and import. Includes user, elapsed, and system times. Each entry here is a data frame.
`config`	each entry here is an element of the master internal configuration produced by `config`
`depends`	the dependency hash: the hash of the collective `storr` hashes of all the dependencies of each target/import.
`filemtime`	the system modification times of files at the time they are built or imported. Useful for figuring out whether it is even worth the time to rehash a file (see #4).
`functions`	For imported functions, the actual raw value of the function read by `readd()`. In the `objects` namespace, the stored object for functions is the un-vectorized, deparsed function body as text. This makes sure the correct function is reproducibly tracked, and changes to whitespace and comments are ignored.
`imported`	a new namespace in 4.4.0 indicating whether each object was imported or built as a target. Will migrate to this instead of the `$imported` flag in the `objects` namespace. It is progress on #126.
`objects`	default namespace, contains the list object that is reproducibly tracked for each target. Contains the actual value of the built target/import, plus metadata like object type and an `"imported"` flag. For functions, a collective hash of the dependencies is also stored so that imported objects nested within functions are reproducibly tracked.
`progress`	stores the build progress of each target: `"finished"`, `"in progress"`, `"failed"`. Unlisted targets were not attempted (yet).
`session`	contains the `sessionInfo()` of the last call to `make()`.
`target_attempts`	names of the targets marked to be built in the current `make()`. Used as a parallelism-agnostic mechanism for telling whether a target is up to date.

Planned namespaces:

namespace	description
`build_times`	same as before
`config`	same as before
`commands`	the workflow plan data frame command that built the target, if applicable. Storing this separately from `depends` will give us part of #131.
`depends`	the long hash of the output from `lightly_parallelize(X = the_appropriate_dependencies, FUN = cache$get_hash) %>% unlist %>% unname`. Major difference from before: the workflow plan `command` is not factored into the computation.
`depends_debug`	named vector of hashes from `get_cache()$get_hash(..., namespace = 'reproducibly_tracked')`. Stripping the names away and hashing should evaluate to the hash in `depends`. A new `debug` argument to `make()` will trigger the use of the `depends_debug` namespace. It is expensive in time and storage, but essential for debugging how things are reproducibly tracked.
`file_modification_times`	migrate from `filemtime`
`imported`	same as above
`progress`	same as before
`readd`	The value of that should be read from the cache on a call to `readd(target)`. This namespace will generally share values with `reproducibly_track` via richfitz/storr#56 to avoid duplication of data in the cache. But for imported functions, the value stored will be the de-vectorized/deparsed/tidied function body text and the dependency hash.
`reproducibly_tracked`	object to be reproducibly tracked. Changes to the data here should trigger downstream (re)builds.
`session`	same as before
`target_attempts`	same as before
`type`	From the `$type` field of the old `objects` namespace: indicator of whether the target/import is a function, file, or generic object.

Migration of namespaces via migrate():

to	from	how
`commands`	`config`	copy over the stored workflow plan command for each target
`depends`	`objects`	This part is the trickiest and most sensitive. 1. Use the `dependency_hash()` function and supporting functions of an earlier drake (4.3.0) to figure out which targets are outdated. Be sure to quarantine the computation in a fresh environment. 2. Walk through the workflow graph and compute brand new `depends` hashes for everything. 3. For all originally outdated targets, mangle the new `depends` hashes.
`file_modification_times`	`filemtime`	simple copies
`imported`	`objects`	copy over the `$imported` list element.
`readd`	`functions`	simple copies
`readd`	`objects`	For non-functions only, copy over the `$value` list element
`reproducibly_tracked`	`objects`	Copy over the `value` list element. For functions, this is a simple copy. For non-functions, only the hash should be transferred via richfitz/storr#56 to avoid duplication of data.
`type`	`objects`	copy over the `$type` list element

wlandau-lilly · 2017-11-03T18:10:04Z

migrate():

Use old code to identify the outdated targets.
Lightly parallelize: for each target/import:
- readd() it to get the actual value.
- Call store_target() to automatically put it in the right place in the cache. Use a dummy hash_list for this stage.
Get the hash_list() of everything and store the depends hashes that mark everything as up to date.
For the otudated targets, mangle the depends hashes.

This seems straightforward and testable.

wlandau-lilly · 2017-11-05T05:58:55Z

I decided to fix and merge #129. It is the right decision in the long term. Plus, migrate() seems to be working well. I unit-tested it and tried it out on a couple of large real projects.

The namespaces are a bit different than in earlier comments.

wlandau-lilly added difficulty: advanced undecided: may or may not fix and removed undecided: may or may not fix labels Nov 1, 2017

wlandau-lilly mentioned this issue Nov 2, 2017

Consider different levels of checks #131

Closed

wlandau-lilly added the type: new feature label Nov 2, 2017

This was referenced Nov 2, 2017

Last call for suggestions before CRAN release 4.4.0 #132

Closed

Corrupted storr when killing making process #126

Closed

wlandau-lilly changed the title ~~Spread out cache values into different storr namespaces~~ Spread out cache values into different storr namespaces Nov 3, 2017

wlandau-lilly added status: priority and removed SENSITIVE CHANGES labels Nov 3, 2017

wlandau-lilly mentioned this issue Nov 3, 2017

Back compatibility and #129 #134

Closed

wlandau-lilly closed this as completed Nov 5, 2017

wlandau-lilly mentioned this issue Nov 13, 2017

Huge number of files in .drake/keys .drake/data #154

Closed

wlandau-lilly mentioned this issue Nov 20, 2017

Odd changes in dependency hashes #161

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spread out cache values into different storr namespaces #129

Spread out cache values into different storr namespaces #129

wlandau-lilly commented Nov 1, 2017 •

edited

wlandau-lilly commented Nov 2, 2017 •

edited

wlandau-lilly commented Nov 3, 2017

wlandau-lilly commented Nov 5, 2017

Spread out cache values into different storr namespaces #129

Spread out cache values into different storr namespaces #129

Comments

wlandau-lilly commented Nov 1, 2017 • edited

wlandau-lilly commented Nov 2, 2017 • edited

wlandau-lilly commented Nov 3, 2017

wlandau-lilly commented Nov 5, 2017

wlandau-lilly commented Nov 1, 2017 •

edited

wlandau-lilly commented Nov 2, 2017 •

edited