Skip to content

Commit

Permalink
Merge pull request #98 from mlr-org/feat/custom-hash-info
Browse files Browse the repository at this point in the history
feat(calculate_hash): allow to customize hash input
  • Loading branch information
sebffischer committed Feb 1, 2024
2 parents def77b0 + 847dd93 commit 0070570
Show file tree
Hide file tree
Showing 4 changed files with 88 additions and 33 deletions.
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@ S3method(discard,default)
S3method(distinct_values,default)
S3method(distinct_values,factor)
S3method(distinct_values,logical)
S3method(hash_input,"function")
S3method(hash_input,data.table)
S3method(hash_input,default)
S3method(insert_named,"NULL")
S3method(insert_named,data.frame)
S3method(insert_named,data.table)
Expand Down Expand Up @@ -82,6 +85,7 @@ export(formulate)
export(get_private)
export(get_seed)
export(has_element)
export(hash_input)
export(ids)
export(imap)
export(imap_chr)
Expand Down
63 changes: 43 additions & 20 deletions R/calculate_hash.R
Original file line number Diff line number Diff line change
@@ -1,17 +1,11 @@
#' @title Calculate a Hash for Multiple Objects
#'
#' @description
#' Calls [digest::digest()] to calculate the hash for all objects provided.
#'
#' The following operations are performed to make hashing more robust:
#' * If an object is a [function()], the formals and the body are hashed separately.
#' This ensures that the bytecode or parent environment are not be included
#' in the hash.
#' * If an object is a [data.table::data.table()], the data.table is converted to a
#' regular list. This ensures that keys and indices are not included in the hash.
#'
#' Note that this only applies to top level objects, these transformations are not done
#' recursively.
#' Calls [digest::digest()] using the 'xxhash64' algorithm after applying [`hash_input`] to each object.
#' To customize the hashing behaviour, you can overwrite [`hash_input`] for specific classes.
#' For `data.table` objects, [`hash_input`] is applied to all columns, so you can overwrite [`hash_input`] for
#' columns of a specific class.
#' Objects that don't have a specific method are hashed as is.
#'
#' @param ... (any)\cr
#' Objects to hash.
Expand All @@ -21,13 +15,42 @@
#' @examples
#' calculate_hash(iris, 1, "a")
calculate_hash = function(...) {
digest(lapply(list(...), function(x) {
if (is.function(x)) {
list(formals(x), as.character(body(x)))
} else if (is.data.table(x)) {
as.list(x)
} else {
x
}
}), algo = "xxhash64")
digest(lapply(list(...), hash_input), algo = "xxhash64")
}

#' Hash Input
#'
#' Returns the part of an object to be used to calculate its hash.
#'
#' @param x (any)\cr
#' Object for which to retrieve the hash input.
#' @export
hash_input = function(x) {
UseMethod("hash_input")
}

#' @describeIn hash_input
#' The formals and the body are returned in a `list()`.
#' This ensures that the bytecode or parent environment are not included.
#' in the hash.
#' @export
hash_input.function = function(x) {
list(formals(x), as.character(body(x)))
}

#' @describeIn hash_input
#' The data.table is converted to a regular list and `hash_input()` is applied to all elements.
#' The conversion to a list ensures that keys and indices are not included in the hash.
#' @export
#' @method hash_input data.table
hash_input.data.table = function(x) {
lapply(as.list(x), hash_input)
}

#' @describeIn hash_input
#' Returns the object as is.
#' @export
hash_input.default = function(x) {
x
}

18 changes: 5 additions & 13 deletions man/calculate_hash.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

36 changes: 36 additions & 0 deletions man/hash_input.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 0070570

Please sign in to comment.