New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clean_names() appends underscore before trailing number #153
Comments
Is this related to Tazinho/snakecase#76? |
yes, it's related and thanks for the feedback. I was wondering when this pops up^^. I personally prefer the underscore, but I would like to support both options. So, I am setting underscores around everything, which is not a letter, at the beginning of parsing. As mentioned here #154 I could for example add another parsing option for this case, but would like to wait a bit and then make a overhaul of the snakecasae pkg (of course with respect to dependencies). If you have ideas how to handle this situation best within snakecase or you see other chances for improvement, pls let me know. |
@Tazinho I thought about this some more and wonder if the simplest option is for me to take the result of I'd be taking the stance in Thus |
Anyone want to verify that indeed this regex is exactly what's needed here? Removes underscores followed by numbers.
|
I think you need to add that before the underscore is not a number. Otherwise var1_120 becomes var1120. |
I was thinking about that and figured there was no output from
to_any_case() that could be two numbers separated by an underscore. Like
"var1_23" should be impossible to generate.
Is there a possible input to to_any_case that will yield such a result?
…On Jan 2, 2018 2:35 AM, "Tazinho" ***@***.***> wrote:
I think you have to add that before the underscore is not a number.
Otherwise var1_120 becomes var1120.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#153 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHOBkNHq81MZ5nHIC3ka6-JdZ7oFFazxks5tGdw6gaJpZM4Q-WtT>
.
|
Yes, numbers which are not directly next to each other in the input should also be separated in the output, no matter of the case argument:
For the cases which have an empty string as separator like the camel cases, it looks like they have the behaviour that you prefer
There are artificial ways to get around this via the
Or the
|
oh ho I have an idea that works with This seems more minimal than what I propose above, which removes all underscores followed by numbers. This will leave alone numbers separated by spaces or other characters in the incoming raw names. But it handles the use case which I've found annoying, where a name like How about that? |
So your suggestion would be to leave numbers just as is? I could add this via an extra parsing option and I also think this would make sense. |
Yep that's it. If that behavior could be implemented within the framework
of to_any_case, great! Or, if it feels out of place or too niche or tricky
for to_any_case, I think I can implement it as described above.
…On Jan 20, 2018 8:00 PM, "Tazinho" ***@***.***> wrote:
So your suggestion would be to leave numbers just as is?
Like email1 stays email1 and email2___3bla becomes email2_3bla and
email_2_4 becomes email_2_4 ?
I could add this via an extra parsing option and I also think this would
make sense.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#153 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHOBkEeUnGEvnmOmC2WChFZBGdfUpzqsks5tMoxBgaJpZM4Q-WtT>
.
|
Will be
Thanks a lot for the suggestion. I really like this solution!! |
For the example you show, what I had in mind for those inputs would be Also, timing-wise: I think if this is in the next CRAN version of |
Your suggestion might be a good default. However, since there are not standardisations for names and I would like to stay with snakecase more general, I would like to just give the options surround with underscores (parsing_option 1) or leave surroundings in the output the same as in the input (currently parsing_option 6). If I would understand you correct, than you would need the latter one and then add on your side the "_" or whatever is supplied as output separator after digits, which are followed by (for example) letters. As the next changes shouldn't be relevant to @strengejacke, I can submit to CRAN again, when you want me to. |
I must read this in detail later to see whether this affects my defaults when using snakecase or not... |
Excellent. Okay, I'll work from the dev version of |
@sfirke From v 0.8.3, which is currently on the github devversion-01 branch and will be on master soon parsing_option 5 and 6 will become parsing_option 3 and 4. Old 3 and 4 are deleted since they are more or less nonsense... |
Note to self: work from snakecase pkg v. 0.9 as described here: Tazinho/snakecase#106 (comment) |
Summarizing what I think is my conclusion on how to approach this. Background Desired behavior Instead I think spacing should be retained based on the input, as is the behavior of
There's no way to tell if the user wants Implementation Other notes to the future |
I like that this matches the original behavior of the "simple" |
My raw data has column name
email1
(sigh) and clean_names is converting it toemail_1
. I'd preferemail1
, that underscore seems excessive.@Tazinho I'm stepping through the code for
clean_names
and it looks like this enterssnakecase::to_any_case
asemail1
. When I runsnakecase::to_any_case("email1")
I getemail_1
. What do you think about this returningemail_1
vs.email1
? It's not a huge deal to me either way but my preference would be not to insert an underscore before a trailing number.The text was updated successfully, but these errors were encountered: