Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation steps sometimes do not align in the multiagent #509

Closed
yjunechoe opened this issue Nov 27, 2023 · 3 comments · Fixed by #511
Closed

Validation steps sometimes do not align in the multiagent #509

yjunechoe opened this issue Nov 27, 2023 · 3 comments · Fixed by #511
Assignees

Comments

@yjunechoe
Copy link
Collaborator

Problem

In the multiagent, we expect validation steps to align when they are equivalent in form (i.e., identical function calls). For example, four agents below all share the col_vals_not_null(c) step and the hash for this step is shared, regardless of when/where the step is added to the agent:

library(pointblank)
agent <- create_agent(small_table)

agent_global1 <- agent %>% 
  col_vals_not_null(c) %>% 
  interrogate()
agent_global2 <- agent %>% 
  col_vals_not_null(c) %>% 
  interrogate()
agent_local1 <- local({
  agent %>% 
    col_vals_not_null(c) %>% 
    interrogate()
})
agent_local2 <- local({
  agent %>% 
    col_vals_not_null(c) %>% 
    interrogate()
})
create_multiagent(agent_global1, agent_global2, agent_local1, agent_local2) %>% 
  get_multiagent_report(display_mode = "wide")

image

However, the same step defined in different environments fail to align when they involve special data structures that record the environment. This is the case for:

  1. Functions passed to preconditions, when the function is defined inline (e.g., using the magrittr syntax . %>% ...)
agent_local_precondition <- local({
  small_table %>% 
    create_agent() %>% 
    col_vals_not_null(c, preconditions = . %>% identity()) %>% 
    interrogate()
})
agent_global_precondition <- small_table %>% 
  create_agent() %>% 
  col_vals_not_null(c, preconditions = . %>% identity()) %>% 
  interrogate()
create_multiagent(agent_local_precondition, agent_global_precondition) %>% 
  get_multiagent_report(display_mode = "wide")

image

  • Quosures (i.e., vars()) passed to values
agent_local_vars <- local({
  small_table %>% 
    create_agent() %>% 
    col_vals_gt(c, value = vars(a)) %>% 
    interrogate()
})
agent_global_vars <- small_table %>% 
  create_agent() %>% 
  col_vals_gt(c, value = vars(a)) %>% 
  interrogate()
create_multiagent(agent_local_vars, agent_global_vars) %>% 
  get_multiagent_report(display_mode = "wide")

image

This raises two concerns:

  1. Hashing can be overly strict, in ways that may be unpredictable for the user (especially in real world scenarios where agents are defined months or years apart in different workspaces and setups).
  2. Hashing the environment can cause severe performance problems when large objects are involved (Cryptic performance bug with hashing #508)

Possible remedy

  1. Resolve such complex data structures to string before they're passed to digest::sha1(), to avoid hashing the environment
  2. Build in a (opt-in or fallback) mechanism for backwards compatibility to account for (1), such as by having the multiagent re-generate the hash for every step for the purposes of alignment
@yjunechoe
Copy link
Collaborator Author

I had a chance to dig into this a bit more, and I realize that there might also be a bug in test-get_multiagent_report.R. On main, the multiagent with 1st and 3rd agent returns:

create_multiagent(agent_1, agent_3) %>%
  get_multiagent_report(display_mode = "wide")

image

With environment-insensitive hashing, we get 2 additional steps aligned:

image

As I suspected, the newly aligned steps are those involving vars() in values/left/right:

# Steps 5 and 6 of agent 1
col_vals_equal(vars(d), vars(d), na_pass = TRUE)
col_vals_between(vars(c), left = vars(a), right = vars(d), na_pass = TRUE)

# Steps 4 and 6 of agent 3
col_vals_equal(
  vars(d), vars(d),
  na_pass = TRUE
)
col_vals_between(
  vars(c),
  left = vars(a), right = vars(d),
  na_pass = TRUE
)

I think this is a bug but just wanted to get a second pair of eyes to confirm this.


Relatedly: should we introduce environment-insensitive hashing as an opt-in alternative hashing method, or should that become the new default for future agents?

Regardless of what we decide on above, it should be trivial to re-hash pre-existing agents for correct alignment since x_write_disk() saves out the entire $validation_set . On this note, I think it'd also be nice if we could append the current pkg version to the hash every time we make changes to the hashing implementations. So every hash created with this new env-insensitive hashing can end with _v0.12 (or whatever the next pointblank release would be), and we can use that to automatically trigger re-hashing of steps across agents if they span across different hashing implementations.

@rich-iannone
Copy link
Member

Great work here! I kind of think that environment-insensitive hashing should be the new default. Combined with version info in the hash (great idea by the way), I don’t think users would run into too many problems with these changes.

We should also talk about the next release!

@yjunechoe
Copy link
Collaborator Author

Thanks! Will go ahead with that in the linked PR. And, yes - let me know what we'd need for the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants