Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve validation step info to string for hashing #511

Merged
merged 19 commits into from
Dec 3, 2023

Conversation

yjunechoe
Copy link
Collaborator

@yjunechoe yjunechoe commented Dec 2, 2023

Summary

This PR introduces a new internal function hash_validation_step() which ensures that functions (in preconditions) and quosures (vars() in value/left/right) are resolved to character for the purposes of hashing. In turn, digest::sha1() is guaranteed to always receive a character vector - this resolves issues with environment hashing as described in #509.

Currently, the new hashing implementation replaces the previous one (vs. being opt-in).

Pending

  • Backwards compatibility for older agents in the multiagent: we would need to re-hash agents' steps for the purposes of alignment.

  • More tests to check that this new method does not create false positives in step alignments. Alignment should respect identity up to yaml-equivalence.

@yjunechoe yjunechoe linked an issue Dec 2, 2023 that may be closed by this pull request
@yjunechoe yjunechoe marked this pull request as draft December 2, 2023 20:07
@yjunechoe
Copy link
Collaborator Author

Also dplyr::storms turns out to be one of those data that get regularly updated. Since updates to dplyr can make snaps outdated (therefore causing tests to fail unexpectedly), I've commented that out in the draft_validation tests (so the PR will also resolve the testthat GHA failing on main).

@yjunechoe yjunechoe marked this pull request as ready for review December 3, 2023 17:20
@yjunechoe
Copy link
Collaborator Author

I think this should do the trick!

Some notes on the implementation:

  • Hashing logic is factored out into hash_validation_step(), which resolves everything to string before digest::sha1() is called. Current implementation appends "-v0.12" to the hash.
  • An agent's steps can be re-hashed via another new internal function rehash_agent(), which iterates the above function over the rows of the agent's $validation_set. This returns an identical agent that only changes in $validation_set$sha1. It can be used in isolation to repair or update an agent's hash for validation steps:
agent1 <- create_agent(small_table) %>%
  rows_distinct() %>% 
  interrogate()
agent1$validation_set$sha1
#> [1] "adac947b3a26f7df4bfd374b5fff4e67b4190f8b-v0.12"
agent1$validation_set$sha1 <- "bad hash"
agent2 <- rehash_agent(agent1)
agent2$validation_set$sha1
#> [1] "adac947b3a26f7df4bfd374b5fff4e67b4190f8b-v0.12"
  • create_multiagent() now rehashes all agents it receives, if it detects any bad or conflicting hash versions. Currently, it's either that all hash are v0.12 (doesn't trigger rehashing) or not (triggers rehash). We could expose an argument to let the user control the rehashing method, but maybe not necessary to introduce that at this stage (we only have 2 options and one is clearly better).
agent1$validation_set$sha1
#> [1] "bad hash"
agent2$validation_set$sha1
#> [1] "adac947b3a26f7df4bfd374b5fff4e67b4190f8b-v0.12"
multiagent <- create_multiagent(agent1, agent2)
multiagent$agents[[1]]$validation_set$sha1
#> [1] "adac947b3a26f7df4bfd374b5fff4e67b4190f8b-v0.12"
multiagent$agents[[2]]$validation_set$sha1
#> [1] "adac947b3a26f7df4bfd374b5fff4e67b4190f8b-v0.12"

get_multiagent_report(multiagent, display_mode = "wide")

image


All in all, this fixes both the alignment issue (#509) and the performance issue (#508). There are no user visible changes beyond "bugfix in multiagent alignment", so the news item for this PR just reports that.

Copy link
Member

@rich-iannone rich-iannone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rich-iannone
Copy link
Member

This is really great work! Feel free to merge any time.

@yjunechoe yjunechoe merged commit 44493f0 into rstudio:main Dec 3, 2023
13 checks passed
@yjunechoe yjunechoe deleted the hash-character branch December 3, 2023 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validation steps sometimes do not align in the multiagent
2 participants