Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking off tabyl and adorn_ into their own package? #240

Open
sfirke opened this issue Sep 21, 2018 · 11 comments
Open

Breaking off tabyl and adorn_ into their own package? #240

sfirke opened this issue Sep 21, 2018 · 11 comments
Labels
seeking comments Users and any interested parties should please weigh in - this is in a discussion phase!

Comments

@sfirke
Copy link
Owner

sfirke commented Sep 21, 2018

There are a few open issues suggesting improvements to adorn_ functions like having them use the dplyr::select helpers to specify columns. I think those should be addressed as part of a broader rethinking of the tabyl API. And once that's on the table, I've come to agree that tabyl and its friends belong in their own package. janitor can be purely for data cleaning and focus on new functions like #50.

Note: I have seen dreams discussed of a "grammar of tables" package, I hope someone builds that package but tabyl will still just focus on counts and things you do to tables of counts.

Before making a new repo and going down that path: what do people think? What's most weighing on my mind is, if someone wants to load janitor and together, would we need to rename tabyl and its friends to avoid namespace conflicts? Naming things is hard 😩

@sfirke sfirke added the seeking comments Users and any interested parties should please weigh in - this is in a discussion phase! label Sep 21, 2018
@juba
Copy link
Contributor

juba commented Dec 3, 2018

It seems to me that having separate janitor and tabyl packages would be a good idea, as there seems to be two "families" of functions in the package. I'm not sure I understand what the namespaces conflicts would be : are there functions that would have to be in both packages ?

@jzadra
Copy link
Contributor

jzadra commented Dec 3, 2018

I agree. When I think of janitor, I think of all the awesome cleaning functions. tabyl is great, but it seems artificial to shoehorn them in together.

@billdenney
Copy link
Collaborator

I also agree that they feel separate. The cleaning functions seem like they fit the name janitor more than the tabyl functions.

@sfirke
Copy link
Owner Author

sfirke commented Feb 14, 2019

I am on board with this. @juba glad you asked as this is still my biggest mental obstacle: what to call the new package & the main function (currently tabyl)? I would want to be careful with how to deprecate janitor::tabyl.

What behavior should happen? I guess calls to janitor::tabyl should print a warning saying "don't use janitor::tabyl, use tidytable::ttable instead" and then that would free up tidytable::ttable to have whatever breaking changes are deemed appropriate?

I don't know that I like "tidytable" or "ttable", I'm just using those as examples. If janitor didn't exist I'd call the package and function tabyl but I'm afraid that users will call tabyl() and depending on what got loaded and in what order, different functions will be called (tabyl::tabyl or janitor::tabyl) and it would be a headache for them.

@billdenney
Copy link
Collaborator

For the mechanics of splitting, I think that there are two good options:

Drop the functions from janitor and have a message on load that the functions have moved to a new package.

Drop the functions from janitor and import and reexport the new package’s functions within janitor.

In my opinion, the best method is to do both of the above at the beginning. Then some time later, stop reexporting. Then some time later, drop the message.

@juba
Copy link
Contributor

juba commented Feb 18, 2019

I think I agree with @billdenney suggestion. With this solution you don't duplicate the tabyl functions, you can call the new package tabyl if you wish, and users should not be directly affected by the change. With an on load message you can alert them of the new package and by importing and reexporting the transition can be smoothed for a while.

@billdenney
Copy link
Collaborator

billdenney commented Aug 8, 2019

@sfirke, I was thinking of generating the separate packages soon. The process I was thinking of is:

  • Triage all current pull requests so that the split can be clean.
  • Make two branches-- one that is janitor-only and one that is tabyl-only
  • Push the new tabyl-only repository into a new repository (it looks like that would be simple, and you'd keep the full history: https://stackoverflow.com/questions/2227062/how-do-i-move-a-git-branch-out-into-its-own-repository)
  • Push the new tabyl repository to CRAN ASAP so that the reimport would be straight-forward.
  • Once the new tabyl package was on CRAN, push the new janitor package (2.0?) to CRAN.

The asks for you in this process would be:

  • Could you please triage the PRs?
  • Could you please create a new tabyl repository?

Edit with a suggestion: There are a few deprecated functions. I'd suggest that those be made defunct with this major change to the package.

@sfirke
Copy link
Owner Author

sfirke commented Aug 10, 2019 via email

@sfirke
Copy link
Owner Author

sfirke commented Aug 10, 2019 via email

@billdenney
Copy link
Collaborator

There is lots to say here:

  1. I emailed you about being a maintainer. As a related note to that, does it make sense to keep tabyl or should the goal be to ask to incorporate it into another package? (My philosophy: When I write something new, usually my goal is to try to minimize the number of CRAN packages by making a contribution to another package where the functionality is a good fit, often as a subtle modification to what is already there. That's part of how I became a contributor to janitor.)
    1. A quick search (https://dabblingwithdata.wordpress.com/2017/12/20/my-favourite-r-package-for-frequency-tables/) suggests that summarytools could be a good fit for the functionality, if we go this route.
  2. For function naming, that isn't my strength. I tend to prefer maintaining backward compatibility as much as is reasonable for packages. So, I lean toward keeping the name (or at most deprecating it for something else but not a full break).
  3. For issue triage, that is simpler because issues can be moved between repositories (https://help.github.com/en/articles/transferring-an-issue-to-another-repository). My main concern was pull requests since this would be a major change in the code base and I'd guess that almost none of the PRs would cleanly apply after the switch.

@gvelasq
Copy link

gvelasq commented Sep 13, 2019

I'm a huge fan of {janitor} and would support the conscious uncoupling of tabyl() and the adorn_() family into a separate package. I've also seen a similar idea discussed in {skimr} but I'm not sure if that's in the works. I've also noticed that RStudio's GitHub-only {gt} is very much a "grammar of tables," however it does not seem to focus on pretty printing of tables in the R console.

I'd like to suggest the package name {tidytab} and function name tab() for the new package — these are what I chose for my GitHub-only package of the same name, which I built as a hobby. I came to R from Stata where tabulate (which can be shortened down to ta or tab) is key to console-driven exploratory data analysis, and my goal with {tidytab} was to port it to R with additional functionality such as:

  1. Use of tidyselect helpers which is already implemented in {tidytab};
  2. Pretty printing of tables in the console with block drawing characters and color (grey for block drawing characters, red for NA, although hopefully in the future the colors could be informed by rstudioapi::getThemeInfo());
  3. Ports of the very helpful tab1() and tab2() functions from Stata for rapid tabulation of a set of variables into either one- or two-way tables;
  4. Tabulation of named and unnamed single vectors (a future plan is to also combine two vectors of the same length into 2x2 tables);
  5. Flat tables in ftab() for those that don't like 2x2 contingency tables, which also allows n-way tabulation of more than two variables.
  6. Table wrapping in the R console.

I had previously contributed to {statar} where we also implemented pretty printing of tables in the R console, but I eventually built my own package since I think there's quite a bit of additional table-specific functionality that I would like to see.

I would be happy to help with the effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
seeking comments Users and any interested parties should please weigh in - this is in a discussion phase!
Projects
None yet
Development

No branches or pull requests

5 participants