Does dependency on snakecase::to_any_case risk breakage in the future? #154

sfirke · 2017-12-12T03:20:38Z

Hi Malte (@Tazinho),

The issue I just opened and tagged you in made me wonder whether introducing the to_any_case dependency risks breaking changes out of my control in the future to janitor::clean_names() 🤔

If you added something that changes the default behavior of to_any_case, it would be a breaking changes for janitor, if backwards compatibility can't be specified as an option.

Do you have a stance or approach to this for the snakecase package? Specifically, do you think it's likely that the default behavior will change, and do you have specific plans for such changes? And if so, would there be a way I could adapt janitor to preserve prior behavior if desired, like the case = old_janitor argument I've introduced to clean_names()?

Thanks for considering 😃

The text was updated successfully, but these errors were encountered:

Tazinho · 2017-12-12T07:53:51Z

Hi Sam,

I can‘t say for sure, but I guess there will be (very small) changes. As long as you only support the case argument (and also in general), clean_names() is relatively safe.

The goal is to generalise as much as possible and optimise the interface for high level control. I want to wait myself a little bit more (maybe middle or end of 2018; depending on my time) to see what is exactly needed.

I think the issue with the numbers can currently only be handled via case = „none“ or protect = „\\d“, which both have big disadvantages in this case. Also I think that "email1" and "email_1" are both reasonable. However, this should be relatively easy, I am just thinking how to make this very elegant...maybe via a parsing option (this could be a new default parsing option within clean_names and then introduce a small breaking change). Another big issue are the parsing options in general. I think most of them achieve not what is actually needed, what is in my opinion a robust parsing in also ambiguous mixed case situations like „importIdUScanada“. It might be more robust to implement an abbreviations character argument for example abr = c(„US“, „GER“), which surrounds lonestanding upperletter groups that are matched with underscores before the general parsing and then just go with the first (default) parsing option. This change wouldn't affect clean_names() if you don't decide to include this argument later.

Of course the long time goal is to be stable, but currently I use snakecase via piping it into dput and fixing small things manually and then hardcoding namechanges. In the long term I hope to minimize the need for this, which should also be in the sense of the clean_names() usecase.

Tazinho · 2017-12-13T11:28:56Z

Just thought a bit more about it.
I could handle everything in one via the two changes from above:

connected digit groups won't be separated from letters by an underscore
introduce the abr argument, which should take a character vector of regular expressions so that everything what is matched will be surrounded by underscores. These matches could also include digits via "\\d"

The only missing aspect would then the opportunity to suppress underscores only from one side.
Currently protect suppresses underscores around a match from both sides and I think it might be good to introduce sth. like protect_left and protect_right, but not sure about it.

However, what do you think in general about the changes above. I think after these there shouldn't be any changes regarding 'clean_names()'. I am quite busy currently, but if it helps you, I might become a bit more motivated :)

sfirke · 2018-01-01T20:21:43Z

Thanks for laying this out. None of that sounds risky to the stability of clean_names. I appreciate you sketching out the high-level plan for future changes to to_any_case. With regards to how numbers are separated, I wrote more in #153 but in short I think I could do it safely and efficiently with regex post-processing on my end, and can work with the current behavior of to_any_case.

Tazinho · 2018-01-21T10:12:26Z

I just submitted to CRAN. In the next update to CRAN, the new parsing_option (6) will be added.
From then on the package should be stable (the numbers of the parsing options might change, since two of them are not necessary; and maybe with one more exeption).

There is room for adding a completely customizable parsing via supplying a sequence of regular expressions: Tazinho/snakecase#91
I think that this would be a super cool and elegant solution for many other usecases, which involve parsing. However, this might be too advanced and maybe confusing for most beginners.
I guess I will revisit this point after thinking some weeks/months.

Tazinho mentioned this issue Dec 12, 2017

clean_names() appends underscore before trailing number #153

Closed

sfirke closed this as completed Jan 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does dependency on snakecase::to_any_case risk breakage in the future? #154

Does dependency on snakecase::to_any_case risk breakage in the future? #154

sfirke commented Dec 12, 2017

Tazinho commented Dec 12, 2017 •

edited

Tazinho commented Dec 13, 2017 •

edited

sfirke commented Jan 1, 2018

Tazinho commented Jan 21, 2018 •

edited

Does dependency on snakecase::to_any_case risk breakage in the future? #154

Does dependency on snakecase::to_any_case risk breakage in the future? #154

Comments

sfirke commented Dec 12, 2017

Tazinho commented Dec 12, 2017 • edited

Tazinho commented Dec 13, 2017 • edited

sfirke commented Jan 1, 2018

Tazinho commented Jan 21, 2018 • edited

Tazinho commented Dec 12, 2017 •

edited

Tazinho commented Dec 13, 2017 •

edited

Tazinho commented Jan 21, 2018 •

edited