Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign two keys/namespaces to the same value without duplicating effort #56

Closed
wlandau-lilly opened this issue Nov 2, 2017 · 9 comments

Comments

@wlandau-lilly
Copy link
Contributor

Is this possible without recaching, resaving, rehashing, reserializing? I can think of a hack:

library(storr)
x <- storr_rds("my_storr")
big <- mtcars
hash <- x$set(key = "big", value = big, namespace = "tables")
x$driver$set_hash(key = "big", namespace = "values", hash = hash)
obj1 <- x$get(key = "big", namespace = "tables")
obj2 <- x$get(key = "big", namespace = "values")
identical(obj1, obj2)
## TRUE

But I do not know if this is the right thing to do because it involves manually mucking around with the hash. Please let me know if further abstraction is already supported, planned for future development, or beyond the scope of storr.

@richfitz
Copy link
Owner

This seems like a reasonable thing to support - the lower level bits will all be there. Just need to think about the interface. If you could describe your ideal interface that would be useful (along with a verb). Perhaps

duplicate = function(key_src, key_dest, namespace_src = self$default_namespace, namespace_dest = self$default_namespace) ...

@wlandau-lilly
Copy link
Contributor Author

wlandau-lilly commented Nov 28, 2017

It would also be helpful to pass multiple namespaces to the $set() and $del() methods.

@wlandau-lilly
Copy link
Contributor Author

How about alias() rather than duplicate()? Strictly speaking, "alias" does not have the right meaning as a verb, but I think the intent is clear enough. Whatever you decide, I would prefer a name that reassures the user that storr is making a shallow copy rather than a deep one.

I would also suggest the following default arguments.

alias <- function(
  key_src,
  key_dest = key_src,
  namespace_src = self$default_namespace,
  namespace_dest = namespace_src
){
  ...
}

richfitz added a commit that referenced this issue Nov 29, 2017
richfitz added a commit that referenced this issue Nov 29, 2017
@richfitz
Copy link
Owner

Some preliminary work here: https://github.com/richfitz/storr/compare/i56_duplicate

I've gone with duplicate because in my mind alias implies an ongoing mirroring. In the docs I've discussed the shallowness. Thanks for the suggestion of defaults - I've used in the current definition.

It would also be helpful to pass multiple namespaces to the $set() and $del() methods.

You can with mset and del (the set and get methods operate on single keys only). If it's not clear can you open another issue and we can get the docs improved

@wlandau-lilly
Copy link
Contributor Author

Looks like the right idea. The default arguments of duplicate() emphasize duplication across keys, not namespaces, which I guess fits most use cases. I may be one of the few who use namespaces so heavily.

I do see now that $del() can delete a single key over multiple namespaces. $mset(), on the other hand, does not appear to be what I had hoped.

x <- storr_rds("test")
x$mset(key = "a", value = 1, namespace = c("n1", "n2"))
## Error: 'value' must have 2 elements (recieved 1)
x$mset(key = "a", value = 1:2, namespace = c("n1", "n2")) # No error

I would like to assign the same data to the same key over multiple namespaces without duplicating the (potentially large) data.

@richfitz
Copy link
Owner

I see. The restriction that value must have the same number of elements as the key/value set implies is to reduce the amount of magic that happens with recycling. If I relax that then it becomes ambiguous how value is to be unpacked. I could support a new fill method (or argument to mset) which would take always a single value.

The value will never actually be duplicated, even in R. Because of R's copy-on-write semantics,

x <- ...something large...
y <- list(x, x, x)

only one copy of x exists. But storr will still serialise and hash x three times and that will be slow. But it will find out that it's the same and store only a single copy.

@wlandau-lilly
Copy link
Contributor Author

Hmm... I wish there were a way to avoid recycling altogether. As you say, the rehashing is slow, and the data is duplicated in memory before it is stored once.

@richfitz richfitz added this to the CRAN 1.1.3 milestone Dec 13, 2017
@richfitz
Copy link
Owner

Please check the new "fill" method on the i56_duplicate branch - this will hopefully do what you need

@wlandau-lilly
Copy link
Contributor Author

Thanks, Rich! I dug through it, and it does exactly what I want. I also think the term fill is perfect. I vote to close this issue when i56_duplicate is merged with master or develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants