Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would a verify_if make sense? #40

Closed
dpprdan opened this issue Sep 15, 2016 · 4 comments
Closed

Would a verify_if make sense? #40

dpprdan opened this issue Sep 15, 2016 · 4 comments

Comments

@dpprdan
Copy link

dpprdan commented Sep 15, 2016

Consider the following data.frame

df  <- mtcars  %>% tibble::rownames_to_column(.)  %>%  dplyr::select(cyl, rowname, vs:carb)

Suppose I want to verify whether all numeric/double columns are really integers. For one column I could do:

verify(df, all(cyl == floor(cyl)))

(See http://stackoverflow.com/a/10114392)

But if I want to verify this for multiple columns (I guess) I would have to use the (IMHO) rather verbose:

verify(df, all(which(sapply(df, is.numeric)) == floor(which(sapply(df, is.numeric)))))

So wouldn't a verify_if, along the lines of dplyr's select_if and mutate_if, be a good idea? :)
Or am I missing something?

verify_if(df, is.numeric, all(cyl == floor(cyl)))
@tonyfischetti
Copy link
Owner

In the hypothetical code in the last line of your comment, it would just check cyl for each column that is numeric.
If there were a function for that, it couldn't be a form of verify. verify is more flexible than the other assertions and doesn't have to work on data.frames at all. Is assert not ok for this use case?

is_int <- function(x) x == floor(x)

df %>% assert(is_int, cyl, vs:carb)

or

df %>% assert_(is_int, names(df)[sapply(df, is.numeric)])

@dpprdan
Copy link
Author

dpprdan commented Sep 28, 2016

My hypothetical code is nonsense of course since it wasn't my intention to just check cyl, but all numeric columns in the data_frame (of which I ideally do not want to provide the names).

So my use-case would be to check if all numeric columns (which are currently e.g. stored as double) are really all integers before converting them to integers. And all that without providing the column names.

The column name part disqualifies your first solution. Your second solution is fine. It just does not completely adhere to the tidyverse (or magrittr pipe) conventions of not having to specify the data_frame several times, I guess. So what about:

df %>% assert_if(is.numeric, is_int)

But really,

df %>% assert_(is_int, names(df)[sapply(df, is.numeric)])

is fine!

@tonyfischetti
Copy link
Owner

tonyfischetti commented Sep 28, 2016

Here are some other alternatives!

df %>% assert_(., is_int, names(.)[sapply(., is.numeric)])

and

df %>% select_if(., is.numeric) %>% assert(is_int, everything())

By careful with the last one, though–it'll make data the new smaller data frame with only numeric columns.

I consider this good enough for now because we're working on some other cool features. I'll reinvestigate potentially adding *_if functions afterwards. Is it ok with you if I close this issue for now?

@dpprdan
Copy link
Author

dpprdan commented Sep 29, 2016

Of course, this certainly isn't mission-critical.
Thanks for considering it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants