Better handling for bad column selections #497

yjunechoe · 2023-10-30T20:12:28Z

I think the skipping behavior with columns = NULL on the col_vals_*()/col_is_*() functions is new. I tried setting columns = NULL with main (before the 2 PRs) and got an ugly error, which is undocumented and shouldn't really happen when using an agent.

If you're up for it, this would be another chance for modernization of the API to use more tidyselect semantics. Right now in main, columns = NULL results in a single, skipped validation step. That's better than before but I'm not certain it's what we want. We have an active argument that is better meant for that, where we could supply a logical or a function that should resolve to a logical (which I believe is in the same/similar scope as a function invoked as preconditions; the helper function is has_columns() and this is the PR that introduces it: #240). This work (should you choose to accept it!) can all be deferred to a non-documentation PR.

Here are some examples, and I'll summarize what happens with main:

create_agent(exibble) |>
  col_vals_lt(columns = NULL, value = 5) |>
  interrogate()

^ skips. Probably not ideal, you get a weird autobrief ("Expect that values in NA (computed column) ... ")

create_agent(exibble) |>
  col_vals_lt(columns = starts_with("z"), value = 5) |>
  interrogate()

^ same result as before but more likely for the user to hit. It can be dangerous safety-wise to use tidyselect because the match can be arbitrary. Users should be advised to pair with a has_columns() check in the active argument.

create_agent(exibble) |>
  col_vals_lt(columns = num,  value = 5, active = ~ . %>% has_columns(num)) |>
  interrogate()

^ this ought to work but tidyselect is not yet ported to active/has_columns(). So we get a console error with interrogate() (the error is not captured internally); we don't get a full validation report.

create_agent(exibble) |>
  col_vals_lt(columns = num, value = 5, active = ~ . %>% has_columns("num")) |>
  interrogate()

create_agent(exibble) |>
  col_vals_lt(columns = num, value = 5, active = ~ . %>% has_columns(vars(num))) |>
  interrogate()

^ The above two cases (with string-based column names or with vars()) work as before.

create_agent(exibble) |>
  col_vals_lt(columns = starts_with("z"), value = 5, active = ~ . %>% has_columns(starts_with("z"))) |>
  interrogate()

^ doesn't work, but closer to a safety net that is useful.

This is a lot to sort though, so let me know if more work in this direction is worthwhile!

Originally posted by @rich-iannone in #496 (comment)

The text was updated successfully, but these errors were encountered:

yjunechoe · 2023-10-30T20:24:06Z

Sorry @rich-iannone for accidentally introducing some weird behavior during tidyselect integration 😅. But these are good to know! I agree that this is a good opportunity to modernize this while we're at it.

Just one follow-up Q: could we move forward treating columns = NULL the same as create_agent(exibble) |> col_vals_lt(columns = starts_with("z") and 💥 at report? Both would be a case of "you probably didn't mean to select 0 columns so we'll gently nudge you to specify columns or pair this selection attempt with active"

I'll think through some of these edgecases more and try to tackle this in a separate PR (after merging the documentation PR and getting tidyselect for has_columns()). I'll use this issue to dump some thoughts that come up in the meantime

rich-iannone · 2023-10-30T20:31:05Z

It's all good! There's so many moving parts here and you've navigated all of this really well. I do think the in-report💥 with those very cases would be ideal. We could even alter the error message that appears when hovering over the 💥 in this case. Thinking about this more deeply, I remember that serially() also offers pretty good safety measures for this sort of thing (e.g., check that column exists, then validate that column) but it's not great for checking multiple columns after the preliminary tests.

Thanks for getting further in the weeds on this part of the package!

yjunechoe · 2023-11-02T14:26:37Z

Closed by #499

yjunechoe self-assigned this Oct 30, 2023

This was referenced Oct 30, 2023

Documentation and general cleanup for #493, #495 #496

Merged

Bad column selections fail gracefully at interrogation #499

Merged

yjunechoe changed the title ~~Better handling for columns = NULL~~ Better handling for bad column selections Nov 1, 2023

yjunechoe closed this as completed Nov 2, 2023

yjunechoe mentioned this issue Nov 2, 2023

tidyselect integration in has_columns() #500

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling for bad column selections #497

Better handling for bad column selections #497

yjunechoe commented Oct 30, 2023

yjunechoe commented Oct 30, 2023

rich-iannone commented Oct 30, 2023

yjunechoe commented Nov 2, 2023

Better handling for bad column selections #497

Better handling for bad column selections #497

Comments

yjunechoe commented Oct 30, 2023

yjunechoe commented Oct 30, 2023

rich-iannone commented Oct 30, 2023

yjunechoe commented Nov 2, 2023