Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling for bad column selections #497

Closed
yjunechoe opened this issue Oct 30, 2023 · 3 comments
Closed

Better handling for bad column selections #497

yjunechoe opened this issue Oct 30, 2023 · 3 comments
Assignees

Comments

@yjunechoe
Copy link
Collaborator

I think the skipping behavior with columns = NULL on the col_vals_*()/col_is_*() functions is new. I tried setting columns = NULL with main (before the 2 PRs) and got an ugly error, which is undocumented and shouldn't really happen when using an agent.

If you're up for it, this would be another chance for modernization of the API to use more tidyselect semantics. Right now in main, columns = NULL results in a single, skipped validation step. That's better than before but I'm not certain it's what we want. We have an active argument that is better meant for that, where we could supply a logical or a function that should resolve to a logical (which I believe is in the same/similar scope as a function invoked as preconditions; the helper function is has_columns() and this is the PR that introduces it: #240). This work (should you choose to accept it!) can all be deferred to a non-documentation PR.

Here are some examples, and I'll summarize what happens with main:

create_agent(exibble) |>
  col_vals_lt(columns = NULL, value = 5) |>
  interrogate()

^ skips. Probably not ideal, you get a weird autobrief ("Expect that values in NA (computed column) ... ")

create_agent(exibble) |>
  col_vals_lt(columns = starts_with("z"), value = 5) |>
  interrogate()

^ same result as before but more likely for the user to hit. It can be dangerous safety-wise to use tidyselect because the match can be arbitrary. Users should be advised to pair with a has_columns() check in the active argument.

create_agent(exibble) |>
  col_vals_lt(columns = num,  value = 5, active = ~ . %>% has_columns(num)) |>
  interrogate()

^ this ought to work but tidyselect is not yet ported to active/has_columns(). So we get a console error with interrogate() (the error is not captured internally); we don't get a full validation report.

create_agent(exibble) |>
  col_vals_lt(columns = num, value = 5, active = ~ . %>% has_columns("num")) |>
  interrogate()
create_agent(exibble) |>
  col_vals_lt(columns = num, value = 5, active = ~ . %>% has_columns(vars(num))) |>
  interrogate()

^ The above two cases (with string-based column names or with vars()) work as before.

create_agent(exibble) |>
  col_vals_lt(columns = starts_with("z"), value = 5, active = ~ . %>% has_columns(starts_with("z"))) |>
  interrogate()

^ doesn't work, but closer to a safety net that is useful.

This is a lot to sort though, so let me know if more work in this direction is worthwhile!

Originally posted by @rich-iannone in #496 (comment)

@yjunechoe yjunechoe self-assigned this Oct 30, 2023
@yjunechoe
Copy link
Collaborator Author

Sorry @rich-iannone for accidentally introducing some weird behavior during tidyselect integration 😅. But these are good to know! I agree that this is a good opportunity to modernize this while we're at it.

Just one follow-up Q: could we move forward treating columns = NULL the same as create_agent(exibble) |> col_vals_lt(columns = starts_with("z") and 💥 at report? Both would be a case of "you probably didn't mean to select 0 columns so we'll gently nudge you to specify columns or pair this selection attempt with active"

I'll think through some of these edgecases more and try to tackle this in a separate PR (after merging the documentation PR and getting tidyselect for has_columns()). I'll use this issue to dump some thoughts that come up in the meantime

@rich-iannone
Copy link
Member

It's all good! There's so many moving parts here and you've navigated all of this really well. I do think the in-report💥 with those very cases would be ideal. We could even alter the error message that appears when hovering over the 💥 in this case. Thinking about this more deeply, I remember that serially() also offers pretty good safety measures for this sort of thing (e.g., check that column exists, then validate that column) but it's not great for checking multiple columns after the preliminary tests.

Thanks for getting further in the weeds on this part of the package!

@yjunechoe yjunechoe changed the title Better handling for columns = NULL Better handling for bad column selections Nov 1, 2023
@yjunechoe
Copy link
Collaborator Author

Closed by #499

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants