Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does dependency on snakecase::to_any_case risk breakage in the future? #154

Closed
sfirke opened this issue Dec 12, 2017 · 4 comments
Closed

Comments

@sfirke
Copy link
Owner

sfirke commented Dec 12, 2017

Hi Malte (@Tazinho),

The issue I just opened and tagged you in made me wonder whether introducing the to_any_case dependency risks breaking changes out of my control in the future to janitor::clean_names() 🤔

If you added something that changes the default behavior of to_any_case, it would be a breaking changes for janitor, if backwards compatibility can't be specified as an option.

Do you have a stance or approach to this for the snakecase package? Specifically, do you think it's likely that the default behavior will change, and do you have specific plans for such changes? And if so, would there be a way I could adapt janitor to preserve prior behavior if desired, like the case = old_janitor argument I've introduced to clean_names()?

Thanks for considering 😃

@Tazinho
Copy link
Contributor

Tazinho commented Dec 12, 2017

Hi Sam,

I can‘t say for sure, but I guess there will be (very small) changes. As long as you only support the case argument (and also in general), clean_names() is relatively safe.

The goal is to generalise as much as possible and optimise the interface for high level control. I want to wait myself a little bit more (maybe middle or end of 2018; depending on my time) to see what is exactly needed.

I think the issue with the numbers can currently only be handled via case = „none“ or protect = „\\d“, which both have big disadvantages in this case. Also I think that "email1" and "email_1" are both reasonable. However, this should be relatively easy, I am just thinking how to make this very elegant...maybe via a parsing option (this could be a new default parsing option within clean_names and then introduce a small breaking change). Another big issue are the parsing options in general. I think most of them achieve not what is actually needed, what is in my opinion a robust parsing in also ambiguous mixed case situations like „importIdUScanada“. It might be more robust to implement an abbreviations character argument for example abr = c(„US“, „GER“), which surrounds lonestanding upperletter groups that are matched with underscores before the general parsing and then just go with the first (default) parsing option. This change wouldn't affect clean_names() if you don't decide to include this argument later.

Of course the long time goal is to be stable, but currently I use snakecase via piping it into dput and fixing small things manually and then hardcoding namechanges. In the long term I hope to minimize the need for this, which should also be in the sense of the clean_names() usecase.

@Tazinho
Copy link
Contributor

Tazinho commented Dec 13, 2017

Just thought a bit more about it.
I could handle everything in one via the two changes from above:

  • connected digit groups won't be separated from letters by an underscore
  • introduce the abr argument, which should take a character vector of regular expressions so that everything what is matched will be surrounded by underscores. These matches could also include digits via "\\d"

The only missing aspect would then the opportunity to suppress underscores only from one side.
Currently protect suppresses underscores around a match from both sides and I think it might be good to introduce sth. like protect_left and protect_right, but not sure about it.

However, what do you think in general about the changes above. I think after these there shouldn't be any changes regarding 'clean_names()'. I am quite busy currently, but if it helps you, I might become a bit more motivated :)

@sfirke
Copy link
Owner Author

sfirke commented Jan 1, 2018

Thanks for laying this out. None of that sounds risky to the stability of clean_names. I appreciate you sketching out the high-level plan for future changes to to_any_case. With regards to how numbers are separated, I wrote more in #153 but in short I think I could do it safely and efficiently with regex post-processing on my end, and can work with the current behavior of to_any_case.

@sfirke sfirke closed this as completed Jan 1, 2018
@Tazinho
Copy link
Contributor

Tazinho commented Jan 21, 2018

I just submitted to CRAN. In the next update to CRAN, the new parsing_option (6) will be added.
From then on the package should be stable (the numbers of the parsing options might change, since two of them are not necessary; and maybe with one more exeption).

There is room for adding a completely customizable parsing via supplying a sequence of regular expressions: Tazinho/snakecase#91
I think that this would be a super cool and elegant solution for many other usecases, which involve parsing. However, this might be too advanced and maybe confusing for most beginners.
I guess I will revisit this point after thinking some weeks/months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants