Validation steps sometimes do not align in the multiagent #509

yjunechoe · 2023-11-27T14:41:49Z

Problem

In the multiagent, we expect validation steps to align when they are equivalent in form (i.e., identical function calls). For example, four agents below all share the col_vals_not_null(c) step and the hash for this step is shared, regardless of when/where the step is added to the agent:

library(pointblank)
agent <- create_agent(small_table)

agent_global1 <- agent %>% 
  col_vals_not_null(c) %>% 
  interrogate()
agent_global2 <- agent %>% 
  col_vals_not_null(c) %>% 
  interrogate()
agent_local1 <- local({
  agent %>% 
    col_vals_not_null(c) %>% 
    interrogate()
})
agent_local2 <- local({
  agent %>% 
    col_vals_not_null(c) %>% 
    interrogate()
})
create_multiagent(agent_global1, agent_global2, agent_local1, agent_local2) %>% 
  get_multiagent_report(display_mode = "wide")

However, the same step defined in different environments fail to align when they involve special data structures that record the environment. This is the case for:

Functions passed to preconditions, when the function is defined inline (e.g., using the magrittr syntax . %>% ...)

agent_local_precondition <- local({
  small_table %>% 
    create_agent() %>% 
    col_vals_not_null(c, preconditions = . %>% identity()) %>% 
    interrogate()
})
agent_global_precondition <- small_table %>% 
  create_agent() %>% 
  col_vals_not_null(c, preconditions = . %>% identity()) %>% 
  interrogate()
create_multiagent(agent_local_precondition, agent_global_precondition) %>% 
  get_multiagent_report(display_mode = "wide")

Quosures (i.e., vars()) passed to values

agent_local_vars <- local({
  small_table %>% 
    create_agent() %>% 
    col_vals_gt(c, value = vars(a)) %>% 
    interrogate()
})
agent_global_vars <- small_table %>% 
  create_agent() %>% 
  col_vals_gt(c, value = vars(a)) %>% 
  interrogate()
create_multiagent(agent_local_vars, agent_global_vars) %>% 
  get_multiagent_report(display_mode = "wide")

This raises two concerns:

Hashing can be overly strict, in ways that may be unpredictable for the user (especially in real world scenarios where agents are defined months or years apart in different workspaces and setups).
Hashing the environment can cause severe performance problems when large objects are involved (Cryptic performance bug with hashing #508)

Possible remedy

Resolve such complex data structures to string before they're passed to digest::sha1(), to avoid hashing the environment
Build in a (opt-in or fallback) mechanism for backwards compatibility to account for (1), such as by having the multiagent re-generate the hash for every step for the purposes of alignment

The text was updated successfully, but these errors were encountered:

yjunechoe · 2023-12-02T19:40:43Z

I had a chance to dig into this a bit more, and I realize that there might also be a bug in test-get_multiagent_report.R. On main, the multiagent with 1st and 3rd agent returns:

create_multiagent(agent_1, agent_3) %>%
  get_multiagent_report(display_mode = "wide")

With environment-insensitive hashing, we get 2 additional steps aligned:

As I suspected, the newly aligned steps are those involving vars() in values/left/right:

# Steps 5 and 6 of agent 1
col_vals_equal(vars(d), vars(d), na_pass = TRUE)
col_vals_between(vars(c), left = vars(a), right = vars(d), na_pass = TRUE)

# Steps 4 and 6 of agent 3
col_vals_equal(
  vars(d), vars(d),
  na_pass = TRUE
)
col_vals_between(
  vars(c),
  left = vars(a), right = vars(d),
  na_pass = TRUE
)

I think this is a bug but just wanted to get a second pair of eyes to confirm this.

Relatedly: should we introduce environment-insensitive hashing as an opt-in alternative hashing method, or should that become the new default for future agents?

Regardless of what we decide on above, it should be trivial to re-hash pre-existing agents for correct alignment since x_write_disk() saves out the entire $validation_set . On this note, I think it'd also be nice if we could append the current pkg version to the hash every time we make changes to the hashing implementations. So every hash created with this new env-insensitive hashing can end with _v0.12 (or whatever the next pointblank release would be), and we can use that to automatically trigger re-hashing of steps across agents if they span across different hashing implementations.

rich-iannone · 2023-12-02T20:26:37Z

Great work here! I kind of think that environment-insensitive hashing should be the new default. Combined with version info in the hash (great idea by the way), I don’t think users would run into too many problems with these changes.

We should also talk about the next release!

yjunechoe · 2023-12-02T21:34:35Z

Thanks! Will go ahead with that in the linked PR. And, yes - let me know what we'd need for the release.

yjunechoe added the Type: ☹︎ Bug label Nov 27, 2023

yjunechoe assigned rich-iannone Nov 27, 2023

yjunechoe mentioned this issue Nov 27, 2023

Cryptic performance bug with hashing #508

Closed

yjunechoe mentioned this issue Dec 2, 2023

Resolve validation step info to string for hashing #511

Merged

yjunechoe linked a pull request Dec 2, 2023 that will close this issue

Resolve validation step info to string for hashing #511

Merged

yjunechoe assigned yjunechoe and unassigned rich-iannone Dec 2, 2023

yjunechoe closed this as completed in #511 Dec 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation steps sometimes do not align in the multiagent #509

Validation steps sometimes do not align in the multiagent #509

yjunechoe commented Nov 27, 2023

yjunechoe commented Dec 2, 2023

rich-iannone commented Dec 2, 2023

yjunechoe commented Dec 2, 2023

Validation steps sometimes do not align in the multiagent #509

Validation steps sometimes do not align in the multiagent #509

Comments

yjunechoe commented Nov 27, 2023

Problem

Possible remedy

yjunechoe commented Dec 2, 2023

rich-iannone commented Dec 2, 2023

yjunechoe commented Dec 2, 2023