-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added the parameter "columns" to remove_constant #458
base: main
Are you sure you want to change the base?
Conversation
The new parameter "columns" to `remove_constant` specifies which columns to check. The default is to check all columns. But since we don't know initially the names of the columns, it is expressed inversely as "c()". I would prefer that "columns" could be the second parameter (instead of the last) so it would be possible to write ```r df1 %>% remove_constant(c("col2", "col3")) ``` instead of having to write ```r df1 %>% remove_constant(columns = c("col2", "col3")) ``` But adding a second parameter could break compatibility for some users if they had code like ```r df1 %>% remove_constant(TRUE) ``` meaning ```r df1 %>% remove_constant(na.rm = TRUE) ```
Thanks for the PR. I agree that the feature is useful. I have two comments on the code:
And, please do keep it as the last argument to prevent an unnecessary break to backward compatibility. (My bias is to name all arguments in my code because sometimes people don't keep the order the same, but that's just me.) |
Thank you both, for the PR and for the review!
I'm not able to look at this right now but I see the discussion about
testing variable selection. If this uses tidyselect specifications, which
is ideal, that has its own handling of invalid column names that we could
piggyback on rather than reinvent. I recently added column selection to
adorn functions in janitor, as an example.
…On Fri, Sep 10, 2021, 7:47 PM Bill Denney ***@***.***> wrote:
Thanks for the PR. I agree that the feature is useful.
I have two comments on the code:
1. Please ensure that it is an error if the column name or number is
not in the input dataset. People make typographical errors, and making it
an error to have an invalid column name will prevent that error. (And, add
a test for that error with both numbers and names.)
2. Please only run the tests for uniqueness on the selected column
names rather than on all columns. When working with bigger datasets, that
can make a difference in runtime.
And, please do keep it as the last argument to prevent an unnecessary
break to backward compatibility. (My bias is to name all arguments in my
code because sometimes people don't keep the order the same, but that's
just me.)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#458 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZYDEH353AGPYK3ABBA6NDUBKKJVANCNFSM5D2IKPQQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I would also like to suggest using the tidyverse select functinslity rather than word column names.
Thanks,
Jon
…--
Jonathan Zadra, PhD (he/him)
Director, Data Science
Sorenson Impact Center
David Eccles School of Business, University of Utah
www.sorensonimpact.com
(801) 581-4815
On Sep 10, 2021, 15:53 -1000, Sam Firke ***@***.***>, wrote:
Thank you both, for the PR and for the review!
I'm not able to look at this right now but I see the discussion about
testing variable selection. If this uses tidyselect specifications, which
is ideal, that has its own handling of invalid column names that we could
piggyback on rather than reinvent. I recently added column selection to
adorn functions in janitor, as an example.
On Fri, Sep 10, 2021, 7:47 PM Bill Denney ***@***.***> wrote:
> Thanks for the PR. I agree that the feature is useful.
>
> I have two comments on the code:
>
> 1. Please ensure that it is an error if the column name or number is
> not in the input dataset. People make typographical errors, and making it
> an error to have an invalid column name will prevent that error. (And, add
> a test for that error with both numbers and names.)
> 2. Please only run the tests for uniqueness on the selected column
> names rather than on all columns. When working with bigger datasets, that
> can make a difference in runtime.
>
> And, please do keep it as the last argument to prevent an unnecessary
> break to backward compatibility. (My bias is to name all arguments in my
> code because sometimes people don't keep the order the same, but that's
> just me.)
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#458 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ABZYDEH353AGPYK3ABBA6NDUBKKJVANCNFSM5D2IKPQQ>
> .
> Triage notifications on the go with GitHub Mobile for iOS
> <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
> or Android
> <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
>
>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I am finally reviewing this (sorry). I agree with the above comments: this is a useful PR, thank you! Please:
I know this has been dormant for a while. I'll leave it open in case you want to finish it at some point. |
The new parameter "columns" to
remove_constant
specifies which columns to check.The default is to check all columns.
But since we don't know initially the names of the columns, it is expressed inversely as "c()".
I would prefer that "columns" could be the second parameter (instead of the last) so it would be possible to write
instead of having to write
But adding a second parameter could break compatibility for some users if they had code like
meaning